density_clustering
Apply a density based clustering to centroids in a gdf.
Note
This is a wrapper for a scikit-learn density clustering algorithms adapted to geodataframes.
Note
Allowed clustering methods are:
dbscan
(sklearn.cluster.DBSCAN)hdbscan
(sklearn.cluster.HDBSCAN)optics
(sklearn.cluster.OPTICS)adbscan
(esda.adbscan.ADBSCAN)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf
|
GeoDataFrame
|
Input geo dataframe with a properly set geometry column. |
required |
eps
|
float
|
The maximum distance between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of gdf within a cluster. |
350.0
|
min_samples
|
int
|
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself. |
30
|
method
|
str
|
The clustering method to be used. Allowed: ("dbscan", "adbscan", "optics"). |
'dbscan'
|
num_processes
|
int
|
The number of parallel processes. None means 1. -1 means using all processors. |
1
|
**kwargs
|
Dict[str, Any]
|
Arbitrary key-word arguments passed to the clustering methods. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If illegal method is given or input |
Returns:
Name | Type | Description |
---|---|---|
labels |
ndarray
|
An array of cluster labels for each centroid in the gdf. Noise points are labeled as -1. |
Examples:
>>> import pandas as pd
>>> from histolytics.spatial_clust.density_clustering import density_clustering
>>> from histolytics.data import hgsc_cancer_nuclei
>>>
>>> nuc = hgsc_cancer_nuclei()
>>> nuc_imm = nuc[nuc["class_name"] == "neoplastic"]
>>> labels = density_clustering(nuc_imm, eps=250, min_samples=100, method="dbscan")
>>> print(labels)
[-1 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 ...