local_diversity
Compute the diversity of neighboring feature values for every object in a GeoDataFrame.
Note
Neighborhoods are defined by the spatial_weights
object, which can be created
with the fit_graph
function. The function should be applied to the input
GeoDataFrame before using this function.
Note
Allowed diversity metrics:
simpson_index
- for both categorical and real valued neighborhoodsshannon_index
- for both categorical and real valued neighborhoodsgini_index
- for only real valued neighborhoodstheil_index
- for only real valued neighborhoods
Note
If val_cols
is not categorical, the values are binned using mapclassify
.
The bins are then used to compute the diversity metrics. If val_cols
is
categorical, the values are used directly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gdf
|
GeoDataFrame
|
The input GeoDataFrame. |
required |
spatial_weights
|
W
|
Libpysal spatial weights object. |
required |
val_cols
|
Tuple[str, ...]
|
The name of the column in the gdf for which the diversity is computed. You can also pass in a list of columns, in which case the diversity is computed for each column. |
required |
id_col
|
str
|
The unique id column in the gdf. If None, this uses |
None
|
metrics
|
Tuple[str, ...], default=("simpson_index",
|
A Tuple/List of diversity metrics. Allowed metrics: "shannon_index", "simpson_index", "gini_index", "theil_index". |
('simpson_index',)
|
scheme
|
str
|
|
'fisherjenks'
|
k
|
int
|
Number of classes for the classification scheme. Defaults to 5. |
5
|
parallel
|
bool
|
Flag whether to use parallel apply operations when computing the diversities. Defaults to False. |
False
|
num_processes
|
int
|
The number of processes to use when parallel=True. If -1, this will use all available cores. |
1
|
rm_nhood_cols
|
bool
|
Flag, whether to remove the extra neighborhood columns from the result gdf. Defaults to True. |
True
|
col_prefix
|
str
|
Prefix for the new column names. Defaults to None. |
None
|
create_copy
|
bool
|
Flag whether to create a copy of the input gdf or not. Defaults to True. |
True
|
Raises:
Type | Description |
---|---|
ValueError
|
If an illegal metric is given. |
Returns:
Type | Description |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: The input geodataframe with computed diversity metric columns added. |
Examples:
Compute the simpson diversity of cell types in the neighborhood of nuclei
>>> from histolytics.spatial_graph.graph import fit_graph
>>> from histolytics.spatial_agg.local_diversity import local_diversity
>>> from histolytics.data import cervix_nuclei, cervix_tissue
>>> from histolytics.utils.gdf import set_uid
>>>
>>> nuc = cervix_nuclei()
>>> nuc = set_uid(nuc) # ensure unique IDs for nuclei
>>>
>>> # Fit delaunay graph
>>> w, _ = fit_graph(nuc, "delaunay", id_col="uid", threshold=100, use_polars=True)
>>>
>>> # Compute local cell type diversity with simpson index and shannon entropy
>>> nuc = local_diversity(
... nuc,
... w,
... id_col="uid",
... val_cols=["class_name"],
... metrics=["simpson_index"],
... num_processes=6,
>>> )
>>> print(nuc.head(3))
geometry class_name uid uid
0 POLYGON ((940.01 5570.02, 939.01 5573, 939 559... connective 0
1 POLYGON ((906.01 5350.02, 906.01 5361, 908.01 ... connective 1
2 POLYGON ((866 5137.02, 862.77 5137.94, 860 513... squamous_epithel 2
class_name_shannon_index
uid
0 0.636514
1 0.636514
2 1.332179
Source code in src/histolytics/spatial_agg/local_diversity.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
|