Graphs & Graph Features
Introduction¶
What are Spatial Graphs? 🕸️ A spatial graph is a representation of spatial data as a network, where individual objects are the nodes and the relationships between them are the links or edges. This transformation allows you to apply powerful graph algorithms to analyze the relationships and structures within your data that aren't apparent from just looking at the segmentation maps.
In Histolytics, after you've segmented your WSI, you can create a spatial graph where:
Nodes: Each segmented object (e.g., a cell or a nucleus) is a node in the graph. These nodes have spatial attributes like their coordinates and extracted features (e.g., area, shape, or marker intensity).
Histolytics provides a fit_graph
function which can be used to fit a spatial graph or a spatial weights to your input data. The actual fitting is done with the libpysal
package and the fit_graph
-function is basically a wrapper around different graph fitting methods with added functionality for example to threshold links that are too long. The allowed spatial weights are:
knn
: k-nearest neighborsdelaunay
- Delaunay triangulationdistband
- Distance band i.e. a distance thresholded knn graphrel_nhood
- Relative neighborhood graphgabriel
- Gabriel graphvoronoi
- Voronoi graph
We will be using the delaunay
method in this example. Here, we will set a distance threshold for the neighbors to be within 50 microns of the nuclei centroid. The distance unit in the example data is in pixels so 50 microns in pixels of 20x magnified segmentation mask is around 50*2 = 100 pixels.
from histolytics.spatial_graph.graph import fit_graph
from histolytics.utils.gdf import set_uid
from histolytics.data import hgsc_cancer_nuclei
nuc = hgsc_cancer_nuclei()
nuc = set_uid(nuc) # Ensure the GeoDataFrame has a unique identifier
# fit spatial weights using Delaunay triangulation
w, w_gdf = fit_graph(nuc, "delaunay", id_col="uid", threshold=100)
ax = nuc.plot(figsize=(10, 10), column="class_name", aspect=1)
w_gdf.plot(ax=ax, linewidth=1, column="class_name", legend=True, aspect=1)
ax.set_axis_off()
w_gdf.head(5)
index | focal | neighbor | weight | focal_centroid_x | focal_centroid_y | neighbor_centroid_x | neighbor_centroid_y | focal_class_name | neighbor_class_name | class_name | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 1.0 | 1400.037980 | 1.692484 | 1386.458579 | 9.580762 | connective | connective | connective-connective | LINESTRING (1400.038 1.692, 1386.459 9.581) |
1 | 1 | 0 | 4 | 1.0 | 1400.037980 | 1.692484 | 1306.059654 | 2.527988 | connective | connective | connective-connective | LINESTRING (1400.038 1.692, 1306.06 2.528) |
2 | 6 | 1 | 4 | 1.0 | 1386.458579 | 9.580762 | 1306.059654 | 2.527988 | connective | connective | connective-connective | LINESTRING (1386.459 9.581, 1306.06 2.528) |
3 | 9 | 2 | 3 | 1.0 | 1378.296689 | 170.695478 | 1318.355274 | 178.923534 | connective | connective | connective-connective | LINESTRING (1378.297 170.695, 1318.355 178.924) |
4 | 13 | 2 | 23 | 1.0 | 1378.296689 | 170.695478 | 1367.454971 | 219.690997 | connective | connective | connective-connective | LINESTRING (1378.297 170.695, 1367.455 219.691) |
Link Counts¶
One of the most straightforward yet powerful features you can extract from a spatial graph is the number of links of a certain type. This simple metric can provide valuable insights of the different cell types and their interactions. The cell-cell link counts are easily extracted from the w_gdf
by using pandas:
# Link counts
w_gdf.value_counts("class_name")
class_name inflammatory-inflammatory 1347 connective-inflammatory 748 neoplastic-neoplastic 692 connective-connective 582 inflammatory-neoplastic 228 connective-neoplastic 81 Name: count, dtype: int64
Graph Features with NetworkX¶
We can compute a variety of graph features using the networkX
library. Some common features include:
- Clustering Coefficient: Indicates the degree to which nodes tend to cluster together.
- Closeness Centrality: Measures the average length of the shortest path from a node to all other nodes in the graph.
- Degree Centrality: Measures the number of connections a node has to other nodes in the graph.
- Betweenness Centrality: Measures the extent to which a node lies on paths between other nodes.
- Eigenvector Centrality: Measures a node's influence based on the importance of its neighbors.
In biological context, centrality scores can identify nuclei that are highly connected to their neighbors. A nucleus with a high degree centrality is a hub of some local cellular network which could indicate a site of intense biological activity. Some centrality measures, like betweenness centrality, identify nuclei that act as bridges or gatekeepers between different cellular regions. A nucleus with a high betweenness centrality lies on the shortest path between many other nuclei. Scores like eigenvector centrality not only count connections but also consider the "importance" of a nucleus's neighbors. A nucleus with high eigenvector centrality is connected to other highly central nuclei. This can identify cells that are part of a particularly influential network.
import networkx as nx
import warnings
warnings.filterwarnings("ignore")
# Let's fit a graph only to the neoplastic neolei
neo = nuc.loc[nuc["class_name"] == "neoplastic"]
w, w_gdf = fit_graph(neo, "delaunay", id_col="uid", threshold=100)
# Convert libpysal weights to NetworkX graph
G = w.to_networkx()
# Compute graph features
closeness_centrality = nx.closeness_centrality(G)
average_clustering = nx.average_clustering(G)
degree_centrality = nx.degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G, max_iter=1000)
print(f"Average clustering coefficient: {average_clustering:.4f}")
print(
f"Mean closeness centrality: {sum(closeness_centrality.values()) / len(closeness_centrality):.4f}"
)
print(
f"Mean degree centrality: {sum(degree_centrality.values()) / len(degree_centrality):.4f}"
)
print(
f"Mean betweenness centrality: {sum(betweenness_centrality.values()) / len(betweenness_centrality):.4f}"
)
print(
f"Mean eigenvector centrality: {sum(eigenvector_centrality.values()) / len(eigenvector_centrality):.4f}"
)
# Add degree centrality back to the GeoDataFrame for visualiation purposes
neo["degree_centrality"] = degree_centrality.values()
# Visualize degree centrality
ax = nuc.plot(figsize=(10, 10), aspect=1, legend=False)
ax = neo.plot(
ax=ax,
column="degree_centrality",
cmap="viridis",
figsize=(10, 10),
aspect=1,
legend=True,
categorical=False,
)
ax.set_title("Degree Centrality")
ax.set_axis_off()
Average clustering coefficient: 0.4372 Mean closeness centrality: 0.1487 Mean degree centrality: 0.0205 Mean betweenness centrality: 0.0201 Mean eigenvector centrality: 0.0562
Benchmark¶
Different graph algorithms have differences in run-times and memory usage. It is important to consider these factors when choosing an algorithm for a specific task. In this section, we will benchmark the performance of different graph algorithms with a cervix dataset containing ~ 19K nuclei. In general all the different graph algorithms scale up to millions of cells but the run-times can go up to some minutes instead of seconds.
from histolytics.data import cervix_nuclei
from histolytics.utils.gdf import set_uid
nuc = cervix_nuclei()
nuc = set_uid(nuc)
nuc
geometry | class_name | uid | |
---|---|---|---|
uid | |||
0 | POLYGON ((940.01 5570.02, 939.01 5573, 939 559... | connective | 0 |
1 | POLYGON ((906.01 5350.02, 906.01 5361, 908.01 ... | connective | 1 |
2 | POLYGON ((866 5137.02, 862.77 5137.94, 860 513... | squamous_epithel | 2 |
3 | POLYGON ((932 4777.02, 928 4778.02, 922.81 478... | glandular_epithel | 3 |
4 | POLYGON ((904 5177.02, 902.01 5178.02, 901.01 ... | connective | 4 |
... | ... | ... | ... |
19192 | POLYGON ((3015 2614.02, 3013.01 2616, 3013.01 ... | inflammatory | 19192 |
19193 | POLYGON ((3387 2618.02, 3384.01 2620.02, 3379.... | connective | 19193 |
19194 | POLYGON ((3655 2622.02, 3650.86 2623.75, 3647 ... | inflammatory | 19194 |
19195 | POLYGON ((3703 2627.02, 3699.9 2629.16, 3696.0... | connective | 19195 |
19196 | POLYGON ((3199 2623.02, 3194.64 2626.72, 3191.... | inflammatory | 19196 |
19197 rows × 3 columns
%%timeit
# Delaunay graph
fit_graph(nuc, method="delaunay", id_col="uid", threshold=50)
2.6 s ± 40.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
fit_graph(nuc, method="knn", k=5, id_col="uid", threshold=50)
2.24 s ± 35.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
# distband (run-time also depends on the distance threshold)
fit_graph(nuc, method="distband", id_col="uid", threshold=50)
2.53 s ± 51.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
fit_graph(nuc, method="gabriel", id_col="uid", threshold=50)
2.48 s ± 75.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
fit_graph(nuc, method="voronoi", id_col="uid", threshold=50)
22.2 s ± 349 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)