Neighborhood Features

Introduction¶

What are Graph-based Neighborhoods? 🕸️ As you've learned from the last spatial graphs tutorial, a graph is the ideal data structure for defining and analyzing neighborhoods. In this context, a spatial graph is created where each nucleus is a node. The links between these nodes (the "edges") represent a spatial relationship, such as physical contact or proximity. This graph structure is what allows us to precisely define a nucleus's neighborhood as the collection of its directly linked neighbors.

What Can We Measure in a Neighborhood? 🔬 For every single-cell neighborhood, Histolytics can compute a variety of metrics that fall into three main categories:

Summary characteristics: These are aggregate statistics that describe the overall properties of the neighborhood. For a specific nucleus, you can compute:

The number of specific neighbors
mean/std/median/min/max/etc of nuclear features of all the neighboring nuclei.

Distances: These can be used to quantify spatial arrangements of nuclei (dense packing vs. loose packing)

The mean/std/median/min/max/etc distance to all neighboring nuclei.

Heterogeneity of neighboring features or cell types: These are measures of diversity within the neighborhood. Available metrics include:

Shannon entropy
Simpson's diversity index
Gini coefficient
Theil index

In this tutorial, we will explore how to compute these neighborhood metrics. We will use a small HGSC dataset as our example.

In [1]:

Copied!





from histolytics.spatial_graph.graph import fit_graph
from histolytics.utils.gdf import set_uid
from histolytics.data import hgsc_cancer_nuclei

nuc = hgsc_cancer_nuclei()
nuc = set_uid(nuc)  # Ensure the GeoDataFrame has a unique identifier

# fit spatial weights using Delaunay triangulation
w, w_gdf = fit_graph(nuc, "delaunay", id_col="uid", threshold=100)

ax = nuc.plot(figsize=(10, 10), column="class_name", aspect=1)
w_gdf.plot(ax=ax, linewidth=1, column="class_name", legend=True, aspect=1)
ax.set_axis_off()
w_gdf.head(5)
from histolytics.spatial_graph.graph import fit_graph
from histolytics.utils.gdf import set_uid
from histolytics.data import hgsc_cancer_nuclei

nuc = hgsc_cancer_nuclei()
nuc = set_uid(nuc)  # Ensure the GeoDataFrame has a unique identifier

# fit spatial weights using Delaunay triangulation
w, w_gdf = fit_graph(nuc, "delaunay", id_col="uid", threshold=100)

ax = nuc.plot(figsize=(10, 10), column="class_name", aspect=1)
w_gdf.plot(ax=ax, linewidth=1, column="class_name", legend=True, aspect=1)
ax.set_axis_off()
w_gdf.head(5)

Out[1]:

	index	focal	neighbor	weight	focal_centroid_x	focal_centroid_y	neighbor_centroid_x	neighbor_centroid_y	focal_class_name	neighbor_class_name	class_name	geometry
0	0	0	1	1.0	1400.037980	1.692484	1386.458579	9.580762	connective	connective	connective-connective	LINESTRING (1400.038 1.692, 1386.459 9.581)
1	1	0	4	1.0	1400.037980	1.692484	1306.059654	2.527988	connective	connective	connective-connective	LINESTRING (1400.038 1.692, 1306.06 2.528)
2	6	1	4	1.0	1386.458579	9.580762	1306.059654	2.527988	connective	connective	connective-connective	LINESTRING (1386.459 9.581, 1306.06 2.528)
3	9	2	3	1.0	1378.296689	170.695478	1318.355274	178.923534	connective	connective	connective-connective	LINESTRING (1378.297 170.695, 1318.355 178.924)
4	13	2	23	1.0	1378.296689	170.695478	1367.454971	219.690997	connective	connective	connective-connective	LINESTRING (1378.297 170.695, 1367.455 219.691)

No description has been provided for this image

Extracting Neighborhood Values¶

At first, before computing any specific metrics, we'll showcase how to extract the neighborhood values and what they look like. This might be useful if you have some custom logic that could be applied to the neighborhood values. You can extract these values using the local_vals-function. We use it here to extract the cell type classes of the neighboring nuclei for each neoplastic nucleus but, in general, you can use it to extract any neighboring features. The extracted neighborhood values will be stored neatly as a new column in the dataframe.

In [2]:

Copied!





from histolytics.spatial_agg.local_values import local_vals
from libpysal.weights import w_subset

# First we need to wrangle the graph data a bit so that we focus only on the nuclei
# that are neoplastic or have neoplastic nuclei in their neighborhood.

# get the indices of nuclei that have neoplastic-to-other link
neo_link_inds = w_gdf[w_gdf["class_name"].str.contains("neoplastic", na=False)][
    "focal"
].unique()

# Subset the spatial weights object with the indices that contain neoplastic-to-other links
w_neo = w_subset(w, neo_link_inds)

# Now we extract the neighborhood cell type classes!
nuc = local_vals(
    nuc, w_neo, val_col="class_name", new_col_name="nhood_classes", id_col="uid"
)

# select all the neoplastic nuclei + the nuclei that are in their neighborhood
nuc_of_interest = nuc[nuc["nhood_classes"].notna()]
nuc_of_interest.head(5)
from histolytics.spatial_agg.local_values import local_vals
from libpysal.weights import w_subset

# First we need to wrangle the graph data a bit so that we focus only on the nuclei
# that are neoplastic or have neoplastic nuclei in their neighborhood.

# get the indices of nuclei that have neoplastic-to-other link
neo_link_inds = w_gdf[w_gdf["class_name"].str.contains("neoplastic", na=False)][
    "focal"
].unique()

# Subset the spatial weights object with the indices that contain neoplastic-to-other links
w_neo = w_subset(w, neo_link_inds)

# Now we extract the neighborhood cell type classes!
nuc = local_vals(
    nuc, w_neo, val_col="class_name", new_col_name="nhood_classes", id_col="uid"
)

# select all the neoplastic nuclei + the nuclei that are in their neighborhood
nuc_of_interest = nuc[nuc["nhood_classes"].notna()]
nuc_of_interest.head(5)

Out[2]:

	geometry	class_name	uid	nhood_classes
uid
92	POLYGON ((1184.75 1008, 1180 1008.01, 1178.01 ...	connective	92	[connective, inflammatory, neoplastic, connect...
95	POLYGON ((1171 1067.02, 1170.01 1068, 1169.01 ...	connective	95	[connective, connective, connective, neoplasti...
108	POLYGON ((1178 1113.01, 1175.01 1115.01, 1172....	connective	108	[connective, connective, neoplastic, neoplasti...
109	POLYGON ((1226 1174.02, 1224.01 1176.01, 1222....	connective	109	[connective, neoplastic, connective, connective]
111	POLYGON ((1208 1211.01, 1203.83 1211.17, 1201....	connective	111	[connective, connective, connective, neoplasti...

Local Neighborhood Summary Characteristics¶

Next, we'll showcase how to compute simple summary statistics for the local neighborhoods. To do this, we will first compute the areas of the nuclei of interest and then the neighborhood mean and standard deviation of these areas. These neighborhood characteristics will be stored as distinct columns in the input dataframe.

This is a simple demonstration on quantifying a neighborhood feature at single-cell resolution. The features could be anything from gene expression levels to morphological characteristics if such single-cell-level data would exist.

In [3]:

Copied!





from histolytics.spatial_agg.local_character import local_character

# Let's compute the areas of the nuclei of interest
nuc_of_interest = nuc_of_interest.assign(area=nuc_of_interest.area)

nuc_of_interest = local_character(
    nuc_of_interest, w_neo, val_cols=["area"], id_col="uid", reductions=("mean", "std")
)
ax = nuc_of_interest.plot(
    column="area_nhood_mean", figsize=(10, 10), aspect=1, legend=True
)
ax.set_axis_off()
nuc_of_interest.head(5)
from histolytics.spatial_agg.local_character import local_character

# Let's compute the areas of the nuclei of interest
nuc_of_interest = nuc_of_interest.assign(area=nuc_of_interest.area)

nuc_of_interest = local_character(
    nuc_of_interest, w_neo, val_cols=["area"], id_col="uid", reductions=("mean", "std")
)
ax = nuc_of_interest.plot(
    column="area_nhood_mean", figsize=(10, 10), aspect=1, legend=True
)
ax.set_axis_off()
nuc_of_interest.head(5)

Out[3]:

	geometry	class_name	uid	nhood_classes	area	area_nhood_mean	area_nhood_std
uid
92	POLYGON ((1184.75 1008, 1180 1008.01, 1178.01 ...	connective	92	[connective, inflammatory, neoplastic, connect...	196.21525	309.868025	195.876801
95	POLYGON ((1171 1067.02, 1170.01 1068, 1169.01 ...	connective	95	[connective, connective, connective, neoplasti...	90.93045	345.139360	183.091297
108	POLYGON ((1178 1113.01, 1175.01 1115.01, 1172....	connective	108	[connective, connective, neoplastic, neoplasti...	415.43035	478.860357	304.110999
109	POLYGON ((1226 1174.02, 1224.01 1176.01, 1222....	connective	109	[connective, neoplastic, connective, connective]	344.17040	488.513488	124.202886
111	POLYGON ((1208 1211.01, 1203.83 1211.17, 1201....	connective	111	[connective, connective, connective, neoplasti...	674.33780	522.122500	244.703536

$No description has been provided for this image$

Local Neighborhood Distances¶

Another aspect of neighborhood analysis is the spatial arrangement of the nuclei. For example, The degree of packing density of nuclei which can be associated to various phenomena such as different tumor growth patterns. The packing density can be assessed by examining the distances in the nuclei neighborhoods.

To find out how densely packed the neighboring nuclei are, we will compute the mean distances between each neoplastic nucleus and its neighbors. The shorter the mean distance, the denser the neighborhoods are. This shows in the visualization below where denser regions are highlighted in red.

In [4]:

Copied!





from histolytics.spatial_agg.local_distances import local_distances


# Compute local neighborhood distances
nuc_of_interest = local_distances(
    nuc_of_interest,
    w_neo,
    id_col="uid",
    reductions=["mean"],
    num_processes=6,
    normalize=True,  # we will quantile normalize for the sake of the visualization
)


ax = nuc_of_interest.plot(
    column="nhood_dists_mean", figsize=(10, 10), aspect=1, legend=True, cmap="Reds_r"
)
ax.set_axis_off()
nuc_of_interest.head(5)
from histolytics.spatial_agg.local_distances import local_distances


# Compute local neighborhood distances
nuc_of_interest = local_distances(
    nuc_of_interest,
    w_neo,
    id_col="uid",
    reductions=["mean"],
    num_processes=6,
    normalize=True,  # we will quantile normalize for the sake of the visualization
)


ax = nuc_of_interest.plot(
    column="nhood_dists_mean", figsize=(10, 10), aspect=1, legend=True, cmap="Reds_r"
)
ax.set_axis_off()
nuc_of_interest.head(5)

Out[4]:

	geometry	class_name	uid	nhood_classes	area	area_nhood_mean	area_nhood_std	nhood_dists_mean
uid
92	POLYGON ((1184.75 1008, 1180 1008.01, 1178.01 ...	connective	92	[connective, inflammatory, neoplastic, connect...	196.21525	309.868025	195.876801	0.750656
95	POLYGON ((1171 1067.02, 1170.01 1068, 1169.01 ...	connective	95	[connective, connective, connective, neoplasti...	90.93045	345.139360	183.091297	0.296588
108	POLYGON ((1178 1113.01, 1175.01 1115.01, 1172....	connective	108	[connective, connective, neoplastic, neoplasti...	415.43035	478.860357	304.110999	0.853018
109	POLYGON ((1226 1174.02, 1224.01 1176.01, 1222....	connective	109	[connective, neoplastic, connective, connective]	344.17040	488.513488	124.202886	0.860892
111	POLYGON ((1208 1211.01, 1203.83 1211.17, 1201....	connective	111	[connective, connective, connective, neoplasti...	674.33780	522.122500	244.703536	0.897638

Local Heterogeneities¶

Next, we will demonstrate how to compute diversity indices for the local neighborhoods of neoplastic nuclei. These indices will help us quantify the compositional variability of the neighborhoods. In principle, the heterogeneity can be computed for any type of numerical or categorical feature. In this demo, we will compute the Shannon diversity index of the cell type categories for the each local neighborhood. The other diversity indices were "simpson_index", "gini_index" and "theil_index"

In [5]:

Copied!





from histolytics.spatial_agg.local_diversity import local_diversity

# Compute local cell type diversity with shannon entropy
nuc_of_interest = local_diversity(
    nuc_of_interest,
    w_neo,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["shannon_index"],
    num_processes=6,
)

# visualize nhood mean areas
ax = nuc_of_interest[nuc_of_interest["class_name"] == "neoplastic"].plot(
    column="class_name_shannon_index",
    aspect=1,
    cmap="Reds",
    legend=True,
    figsize=(10, 10),
)
ax.set_axis_off()
from histolytics.spatial_agg.local_diversity import local_diversity

# Compute local cell type diversity with shannon entropy
nuc_of_interest = local_diversity(
    nuc_of_interest,
    w_neo,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["shannon_index"],
    num_processes=6,
)

# visualize nhood mean areas
ax = nuc_of_interest[nuc_of_interest["class_name"] == "neoplastic"].plot(
    column="class_name_shannon_index",
    aspect=1,
    cmap="Reds",
    legend=True,
    figsize=(10, 10),
)
ax.set_axis_off()

As seen in the visualization, the cell-type heterogeneity is non-zero only at the borders of the tumor nest, where the neoplastic nuclei are linked with the surrounding nuclei in the stroma (e.g. inflammatory and connective nuclei).

Benchmark¶

In this section, we will benchmark the performance of different of the local neighborhood metric functions with a cervix dataset containing ~ 19K nuclei. The local neighborhood functions have varying run-times, depending on what and how many metrics are being computed. These functions are parallelizable, so they can be scaled according to the computational resources available. However, the memory usage can also increase significantly when larger datasets are used with many cores. In general, these functions scale to datasets with +1million segmented nuclei with reasonable run-times.

In [6]:

Copied!





from histolytics.data import cervix_nuclei
from histolytics.utils.gdf import set_uid

nuc = cervix_nuclei()
nuc = set_uid(nuc)
nuc = nuc.assign(area=nuc.area)

w, _ = fit_graph(nuc, method="delaunay", id_col="uid", threshold=50)
nuc
from histolytics.data import cervix_nuclei
from histolytics.utils.gdf import set_uid

nuc = cervix_nuclei()
nuc = set_uid(nuc)
nuc = nuc.assign(area=nuc.area)

w, _ = fit_graph(nuc, method="delaunay", id_col="uid", threshold=50)
nuc

Out[6]:

	geometry	class_name	uid	area
uid
0	POLYGON ((940.01 5570.02, 939.01 5573, 939 559...	connective	0	429.58790
1	POLYGON ((906.01 5350.02, 906.01 5361, 908.01 ...	connective	1	408.46570
2	POLYGON ((866 5137.02, 862.77 5137.94, 860 513...	squamous_epithel	2	369.49285
3	POLYGON ((932 4777.02, 928 4778.02, 922.81 478...	glandular_epithel	3	339.88560
4	POLYGON ((904 5177.02, 902.01 5178.02, 901.01 ...	connective	4	330.89425
...	...	...	...	...
19192	POLYGON ((3015 2614.02, 3013.01 2616, 3013.01 ...	inflammatory	19192	297.37560
19193	POLYGON ((3387 2618.02, 3384.01 2620.02, 3379....	connective	19193	293.85250
19194	POLYGON ((3655 2622.02, 3650.86 2623.75, 3647 ...	inflammatory	19194	320.30200
19195	POLYGON ((3703 2627.02, 3699.9 2629.16, 3696.0...	connective	19195	300.45640
19196	POLYGON ((3199 2623.02, 3194.64 2626.72, 3191....	inflammatory	19196	268.75830

19197 rows × 4 columns

Neighborhood Summary Characteristics¶

In [7]:

Copied!





%%timeit
local_character(
    nuc, w, val_cols=["area"], id_col="uid", reductions=("mean",), num_processes=3
)
%%timeit
local_character(
    nuc, w, val_cols=["area"], id_col="uid", reductions=("mean",), num_processes=3
)

1.89 s ± 26.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]:

Copied!





%%timeit
local_character(
    nuc, w, val_cols=["area"], id_col="uid", reductions=("mean", "std"), num_processes=6
)
%%timeit
local_character(
    nuc, w, val_cols=["area"], id_col="uid", reductions=("mean", "std"), num_processes=6
)

2.14 s ± 33.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Neighborhood Distances¶

In [9]:

Copied!





%%timeit
local_distances(
    nuc,
    w,
    id_col="uid",
    reductions=["mean"],
    num_processes=3,
)
%%timeit
local_distances(
    nuc,
    w,
    id_col="uid",
    reductions=["mean"],
    num_processes=3,
)

3.53 s ± 92.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]:

Copied!





%%timeit
local_distances(
    nuc,
    w,
    id_col="uid",
    reductions=["mean", "std"],
    num_processes=6,
)
%%timeit
local_distances(
    nuc,
    w,
    id_col="uid",
    reductions=["mean", "std"],
    num_processes=6,
)

3.88 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Neighborhood Diversities¶

In [11]:

Copied!





%%timeit
local_diversity(
    nuc,
    w,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["simpson_index"],
    num_processes=3,
)
%%timeit
local_diversity(
    nuc,
    w,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["simpson_index"],
    num_processes=3,
)

4.09 s ± 71.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [12]:

Copied!





%%timeit
local_diversity(
    nuc,
    w,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["simpson_index", "shannon_index"],
    num_processes=6,
)
%%timeit
local_diversity(
    nuc,
    w,
    id_col="uid",
    val_cols=["class_name"],
    metrics=["simpson_index", "shannon_index"],
    num_processes=6,
)

4.16 s ± 41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [ ]: