Graphs & Graph Features

Introduction¶

What are Spatial Graphs? 🕸️ A spatial graph is a representation of spatial data as a network, where individual objects are the nodes and the relationships between them are the links or edges. This transformation allows you to apply powerful graph algorithms to analyze the relationships and structures within your data that aren't apparent from just looking at the segmentation maps.

In Histolytics, after you've segmented your WSI, you can create a spatial graph where:

Nodes: Each segmented object (e.g., a cell or a nucleus) is a node in the graph. These nodes have spatial attributes like their coordinates and extracted features (e.g., area, shape, or marker intensity).

Histolytics provides a fit_graph function which can be used to fit a spatial graph or a spatial weights to your input data. The actual fitting is done with the libpysal package and the fit_graph-function is basically a wrapper around different graph fitting methods with added functionality for example to threshold links that are too long. The allowed spatial weights are:

knn: k-nearest neighbors
delaunay - Delaunay triangulation
distband - Distance band i.e. a distance thresholded knn graph
rel_nhood - Relative neighborhood graph
gabriel - Gabriel graph
voronoi - Voronoi graph

We will be using the delaunay method in this example. Here, we will set a distance threshold for the neighbors to be within 50 microns of the nuclei centroid. The distance unit in the example data is in pixels so 50 microns in pixels of 20x magnified segmentation mask is around 50*2 = 100 pixels.

In [1]:

Copied!





from histolytics.spatial_graph.graph import fit_graph
from histolytics.utils.gdf import set_uid
from histolytics.data import hgsc_cancer_nuclei

nuc = hgsc_cancer_nuclei()
nuc = set_uid(nuc)  # Ensure the GeoDataFrame has a unique identifier

# fit spatial weights using Delaunay triangulation
w, w_gdf = fit_graph(nuc, "delaunay", id_col="uid", threshold=100)
ax = nuc.plot(figsize=(10, 10), column="class_name", aspect=1)
w_gdf.plot(ax=ax, linewidth=1, column="class_name", legend=True, aspect=1)
ax.set_axis_off()
w_gdf.head(5)
from histolytics.spatial_graph.graph import fit_graph
from histolytics.utils.gdf import set_uid
from histolytics.data import hgsc_cancer_nuclei

nuc = hgsc_cancer_nuclei()
nuc = set_uid(nuc)  # Ensure the GeoDataFrame has a unique identifier

# fit spatial weights using Delaunay triangulation
w, w_gdf = fit_graph(nuc, "delaunay", id_col="uid", threshold=100)
ax = nuc.plot(figsize=(10, 10), column="class_name", aspect=1)
w_gdf.plot(ax=ax, linewidth=1, column="class_name", legend=True, aspect=1)
ax.set_axis_off()
w_gdf.head(5)

Out[1]:

	index	focal	neighbor	weight	focal_centroid_x	focal_centroid_y	neighbor_centroid_x	neighbor_centroid_y	focal_class_name	neighbor_class_name	class_name	geometry
0	0	0	1	1.0	1400.037980	1.692484	1386.458579	9.580762	connective	connective	connective-connective	LINESTRING (1400.038 1.692, 1386.459 9.581)
1	1	0	4	1.0	1400.037980	1.692484	1306.059654	2.527988	connective	connective	connective-connective	LINESTRING (1400.038 1.692, 1306.06 2.528)
2	6	1	4	1.0	1386.458579	9.580762	1306.059654	2.527988	connective	connective	connective-connective	LINESTRING (1386.459 9.581, 1306.06 2.528)
3	9	2	3	1.0	1378.296689	170.695478	1318.355274	178.923534	connective	connective	connective-connective	LINESTRING (1378.297 170.695, 1318.355 178.924)
4	13	2	23	1.0	1378.296689	170.695478	1367.454971	219.690997	connective	connective	connective-connective	LINESTRING (1378.297 170.695, 1367.455 219.691)

No description has been provided for this image

Link Counts¶

One of the most straightforward yet powerful features you can extract from a spatial graph is the number of links of a certain type. This simple metric can provide valuable insights of the different cell types and their interactions. The cell-cell link counts are easily extracted from the w_gdf by using pandas:

In [2]:

Copied!

# Link counts
w_gdf.value_counts("class_name")
# Link counts
w_gdf.value_counts("class_name")

Out[2]:

class_name
inflammatory-inflammatory    1347
connective-inflammatory       748
neoplastic-neoplastic         692
connective-connective         582
inflammatory-neoplastic       228
connective-neoplastic          81
Name: count, dtype: int64

Graph Features with NetworkX¶

We can compute a variety of graph features using the networkX library. Some common features include:

Clustering Coefficient: Indicates the degree to which nodes tend to cluster together.
Closeness Centrality: Measures the average length of the shortest path from a node to all other nodes in the graph.
Degree Centrality: Measures the number of connections a node has to other nodes in the graph.
Betweenness Centrality: Measures the extent to which a node lies on paths between other nodes.
Eigenvector Centrality: Measures a node's influence based on the importance of its neighbors.

In biological context, centrality scores can identify nuclei that are highly connected to their neighbors. A nucleus with a high degree centrality is a hub of some local cellular network which could indicate a site of intense biological activity. Some centrality measures, like betweenness centrality, identify nuclei that act as bridges or gatekeepers between different cellular regions. A nucleus with a high betweenness centrality lies on the shortest path between many other nuclei. Scores like eigenvector centrality not only count connections but also consider the "importance" of a nucleus's neighbors. A nucleus with high eigenvector centrality is connected to other highly central nuclei. This can identify cells that are part of a particularly influential network.

In [ ]:

Copied!





import networkx as nx
import warnings

warnings.filterwarnings("ignore")

# Let's fit a graph only to the neoplastic neolei
neo = nuc.loc[nuc["class_name"] == "neoplastic"]
w, w_gdf = fit_graph(neo, "delaunay", id_col="uid", threshold=100)
# Convert libpysal weights to NetworkX graph
G = w.to_networkx()

# Compute graph features
closeness_centrality = nx.closeness_centrality(G)
average_clustering = nx.average_clustering(G)
degree_centrality = nx.degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G, max_iter=1000)

print(f"Average clustering coefficient: {average_clustering:.4f}")
print(
    f"Mean closeness centrality: {sum(closeness_centrality.values()) / len(closeness_centrality):.4f}"
)
print(
    f"Mean degree centrality: {sum(degree_centrality.values()) / len(degree_centrality):.4f}"
)
print(
    f"Mean betweenness centrality: {sum(betweenness_centrality.values()) / len(betweenness_centrality):.4f}"
)
print(
    f"Mean eigenvector centrality: {sum(eigenvector_centrality.values()) / len(eigenvector_centrality):.4f}"
)

# Add degree centrality back to the GeoDataFrame for visualiation purposes
neo["degree_centrality"] = degree_centrality.values()

# Visualize degree centrality
ax = nuc.plot(figsize=(10, 10), aspect=1, legend=False)
ax = neo.plot(
    ax=ax,
    column="degree_centrality",
    cmap="viridis",
    figsize=(10, 10),
    aspect=1,
    legend=True,
    categorical=False,
)
ax.set_title("Degree Centrality")
ax.set_axis_off()
import networkx as nx
import warnings

warnings.filterwarnings("ignore")

# Let's fit a graph only to the neoplastic neolei
neo = nuc.loc[nuc["class_name"] == "neoplastic"]
w, w_gdf = fit_graph(neo, "delaunay", id_col="uid", threshold=100)
# Convert libpysal weights to NetworkX graph
G = w.to_networkx()

# Compute graph features
closeness_centrality = nx.closeness_centrality(G)
average_clustering = nx.average_clustering(G)
degree_centrality = nx.degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G, max_iter=1000)

print(f"Average clustering coefficient: {average_clustering:.4f}")
print(
    f"Mean closeness centrality: {sum(closeness_centrality.values()) / len(closeness_centrality):.4f}"
)
print(
    f"Mean degree centrality: {sum(degree_centrality.values()) / len(degree_centrality):.4f}"
)
print(
    f"Mean betweenness centrality: {sum(betweenness_centrality.values()) / len(betweenness_centrality):.4f}"
)
print(
    f"Mean eigenvector centrality: {sum(eigenvector_centrality.values()) / len(eigenvector_centrality):.4f}"
)

# Add degree centrality back to the GeoDataFrame for visualiation purposes
neo["degree_centrality"] = degree_centrality.values()

# Visualize degree centrality
ax = nuc.plot(figsize=(10, 10), aspect=1, legend=False)
ax = neo.plot(
    ax=ax,
    column="degree_centrality",
    cmap="viridis",
    figsize=(10, 10),
    aspect=1,
    legend=True,
    categorical=False,
)
ax.set_title("Degree Centrality")
ax.set_axis_off()

Average clustering coefficient: 0.4372
Mean closeness centrality: 0.1487
Mean degree centrality: 0.0205
Mean betweenness centrality: 0.0201
Mean eigenvector centrality: 0.0562

Benchmark¶

Different graph algorithms have differences in run-times and memory usage. It is important to consider these factors when choosing an algorithm for a specific task. In this section, we will benchmark the performance of different graph algorithms with a cervix dataset containing ~ 19K nuclei. In general all the different graph algorithms scale up to millions of cells but the run-times can go up to some minutes instead of seconds.

In [4]:

Copied!





from histolytics.data import cervix_nuclei
from histolytics.utils.gdf import set_uid

nuc = cervix_nuclei()
nuc = set_uid(nuc)
nuc
from histolytics.data import cervix_nuclei
from histolytics.utils.gdf import set_uid

nuc = cervix_nuclei()
nuc = set_uid(nuc)
nuc

Out[4]:

	geometry	class_name	uid
uid
0	POLYGON ((940.01 5570.02, 939.01 5573, 939 559...	connective	0
1	POLYGON ((906.01 5350.02, 906.01 5361, 908.01 ...	connective	1
2	POLYGON ((866 5137.02, 862.77 5137.94, 860 513...	squamous_epithel	2
3	POLYGON ((932 4777.02, 928 4778.02, 922.81 478...	glandular_epithel	3
4	POLYGON ((904 5177.02, 902.01 5178.02, 901.01 ...	connective	4
...	...	...	...
19192	POLYGON ((3015 2614.02, 3013.01 2616, 3013.01 ...	inflammatory	19192
19193	POLYGON ((3387 2618.02, 3384.01 2620.02, 3379....	connective	19193
19194	POLYGON ((3655 2622.02, 3650.86 2623.75, 3647 ...	inflammatory	19194
19195	POLYGON ((3703 2627.02, 3699.9 2629.16, 3696.0...	connective	19195
19196	POLYGON ((3199 2623.02, 3194.64 2626.72, 3191....	inflammatory	19196

19197 rows × 3 columns

In [5]:

Copied!

%%timeit
# Delaunay graph
fit_graph(nuc, method="delaunay", id_col="uid", threshold=50)
%%timeit
# Delaunay graph
fit_graph(nuc, method="delaunay", id_col="uid", threshold=50)

2.6 s ± 40.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]:

Copied!

%%timeit
fit_graph(nuc, method="knn", k=5, id_col="uid", threshold=50)
%%timeit
fit_graph(nuc, method="knn", k=5, id_col="uid", threshold=50)

2.24 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]:

Copied!

%%timeit
# distband (run-time also depends on the distance threshold)
fit_graph(nuc, method="distband", id_col="uid", threshold=50)
%%timeit
# distband (run-time also depends on the distance threshold)
fit_graph(nuc, method="distband", id_col="uid", threshold=50)

2.53 s ± 51.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]:

Copied!

%%timeit
fit_graph(nuc, method="gabriel", id_col="uid", threshold=50)
%%timeit
fit_graph(nuc, method="gabriel", id_col="uid", threshold=50)

2.48 s ± 75.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]:

Copied!

%%timeit
fit_graph(nuc, method="voronoi", id_col="uid", threshold=50)
%%timeit
fit_graph(nuc, method="voronoi", id_col="uid", threshold=50)

22.2 s ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [ ]: