Skip to content

sem2gdf

Convert an semantic segmentation raster mask to a GeoDataFrame.

Note

This function should be applied to semantic tissue segmentation masks.

Parameters:

Name Type Description Default
sem_map ndarray

A semantic segmentation mask. Shape (H, W).

required
xoff int

The x offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
yoff int

The y offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
class_dict Dict[int, str], default=None

A dictionary mapping class indices to class names. e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.

None
min_size int

The minimum size (in pixels) of the polygons to include in the GeoDataFrame.

15
smooth_func Callable

A function to smooth the polygons. The function should take a shapely Polygon as input and return a shapely Polygon.

gaussian_smooth

Returns:

Type Description
GeoDataFrame

gpd.GeoDataFrame: A GeoDataFrame of the raster semantic mask. Contains columns:

- 'id' - the numeric pixel value of the semantic mask,
- 'class_name' - the name of the class (same as id if class_dict is None),
- 'geometry' - the geometry of the polygon.

Examples:

>>> from histolytics.utils.raster import sem2gdf
>>> from histolytics.data import hgsc_cancer_type_mask
>>> # load semantic mask
>>> type_mask = hgsc_cancer_type_mask()
>>> # convert to GeoDataFrame
>>> gdf = sem2gdf(type_mask)
>>> print(gdf.head(3))
        id  class_name                                           geometry
    0   2           2  POLYGON ((850.019 0.45, 850.431 1.58, 851.657 ...
    1   2           2  POLYGON ((1194.01 0.225, 1194.215 0.795, 1194....
    2   1           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
Source code in src/histolytics/utils/raster.py
def sem2gdf(
    sem_map: np.ndarray,
    xoff: int = None,
    yoff: int = None,
    class_dict: Dict[int, str] = None,
    min_size: int = 15,
    smooth_func: Callable = gaussian_smooth,
) -> gpd.GeoDataFrame:
    """Convert an semantic segmentation raster mask to a GeoDataFrame.

    Note:
        This function should be applied to semantic tissue segmentation masks.

    Parameters:
        sem_map (np.ndarray):
            A semantic segmentation mask. Shape (H, W).
        xoff (int):
            The x offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        yoff (int):
            The y offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        class_dict (Dict[int, str], default=None):
            A dictionary mapping class indices to class names.
            e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.
        min_size (int):
            The minimum size (in pixels) of the polygons to include in the GeoDataFrame.
        smooth_func (Callable):
            A function to smooth the polygons. The function should take a shapely Polygon
            as input and return a shapely Polygon.

    returns:
        gpd.GeoDataFrame:
            A GeoDataFrame of the raster semantic mask. Contains columns:

                - 'id' - the numeric pixel value of the semantic mask,
                - 'class_name' - the name of the class (same as id if class_dict is None),
                - 'geometry' - the geometry of the polygon.

    Examples:
        >>> from histolytics.utils.raster import sem2gdf
        >>> from histolytics.data import hgsc_cancer_type_mask
        >>> # load semantic mask
        >>> type_mask = hgsc_cancer_type_mask()
        >>> # convert to GeoDataFrame
        >>> gdf = sem2gdf(type_mask)
        >>> print(gdf.head(3))
                id  class_name                                           geometry
            0   2           2  POLYGON ((850.019 0.45, 850.431 1.58, 851.657 ...
            1   2           2  POLYGON ((1194.01 0.225, 1194.215 0.795, 1194....
            2   1           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
    """
    if class_dict is None:
        class_dict = {int(i): int(i) for i in np.unique(sem_map)[1:]}

    vectorized_data = (
        (value, shape(polygon))
        for polygon, value in shapes(
            sem_map,
            mask=sem_map > 0,
        )
    )

    res = gpd.GeoDataFrame(
        vectorized_data,
        columns=["id", "geometry"],
    )
    res["id"] = res["id"].astype(int)
    res = res.loc[res.area > min_size].reset_index(drop=True)
    res["class_name"] = res["id"].map(class_dict)
    res = res[["id", "class_name", "geometry"]]  # reorder columns

    if xoff is not None:
        res["geometry"] = res["geometry"].translate(xoff, 0)

    if yoff is not None:
        res["geometry"] = res["geometry"].translate(0, yoff)

    if smooth_func is not None:
        res["geometry"] = res["geometry"].apply(smooth_func)

    return res