Skip to content

sem2gdf

Convert an semantic segmentation raster mask to a GeoDataFrame.

Note

This function should be applied to semantic tissue segmentation masks.

Parameters:

Name Type Description Default
sem_map ndarray

A semantic segmentation mask. Shape (H, W).

required
xoff int

The x offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
yoff int

The y offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
class_dict Dict[int, str]

A dictionary mapping class indices to class names. e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.

None
min_size int

The minimum size (in pixels) of the polygons to include in the GeoDataFrame.

15
smooth_func Callable

A function to smooth the polygons. The function should take a shapely Polygon as input and return a shapely Polygon. Defaults to uniform_smooth, which applies a uniform filter. histolytics.utils._filters also provides gaussian_smooth and median_smooth for smoothing.

uniform_smooth

returns: gpd.GeoDataFrame: A GeoDataFrame of the raster semantic mask. Contains columns:

        - 'id' - the numeric pixel value of the semantic mask,
        - 'class_name' - the name of the class (same as id if class_dict is None),
        - 'geometry' - the geometry of the polygon.

Examples:

>>> from histolytics.utils.raster import sem2gdf
>>> from histolytics.data import hgsc_cancer_type_mask
>>> # load semantic mask
>>> type_mask = hgsc_cancer_type_mask()
>>> # convert to GeoDataFrame
>>> gdf = sem2gdf(type_mask)
>>> print(gdf.head(3))
        uid  class_name                                           geometry
    0   2           2  POLYGON ((850.019 0.45, 850.431 1.58, 851.657 ...
    1   2           2  POLYGON ((1194.01 0.225, 1194.215 0.795, 1194....
    2   1           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
Source code in src/histolytics/utils/raster.py
def sem2gdf(
    sem_map: np.ndarray,
    xoff: int = None,
    yoff: int = None,
    class_dict: Dict[int, str] = None,
    min_size: int = 15,
    smooth_func: Callable = uniform_smooth,
) -> gpd.GeoDataFrame:
    """Convert an semantic segmentation raster mask to a GeoDataFrame.

    Note:
        This function should be applied to semantic tissue segmentation masks.

    Parameters:
        sem_map (np.ndarray):
            A semantic segmentation mask. Shape (H, W).
        xoff (int):
            The x offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        yoff (int):
            The y offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        class_dict (Dict[int, str]):
            A dictionary mapping class indices to class names.
            e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.
        min_size (int):
            The minimum size (in pixels) of the polygons to include in the GeoDataFrame.
        smooth_func (Callable):
            A function to smooth the polygons. The function should take a shapely Polygon
            as input and return a shapely Polygon. Defaults to `uniform_smooth`, which
            applies a uniform filter. `histolytics.utils._filters` also provides
            `gaussian_smooth` and `median_smooth` for smoothing.
    returns:
        gpd.GeoDataFrame:
            A GeoDataFrame of the raster semantic mask. Contains columns:

                - 'id' - the numeric pixel value of the semantic mask,
                - 'class_name' - the name of the class (same as id if class_dict is None),
                - 'geometry' - the geometry of the polygon.

    Examples:
        >>> from histolytics.utils.raster import sem2gdf
        >>> from histolytics.data import hgsc_cancer_type_mask
        >>> # load semantic mask
        >>> type_mask = hgsc_cancer_type_mask()
        >>> # convert to GeoDataFrame
        >>> gdf = sem2gdf(type_mask)
        >>> print(gdf.head(3))
                uid  class_name                                           geometry
            0   2           2  POLYGON ((850.019 0.45, 850.431 1.58, 851.657 ...
            1   2           2  POLYGON ((1194.01 0.225, 1194.215 0.795, 1194....
            2   1           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
    """
    # Handle empty semantic mask
    if sem_map.size == 0 or np.max(sem_map) == 0:
        return gpd.GeoDataFrame(columns=["uid", "class_name", "geometry"])

    if class_dict is None:
        class_dict = {int(i): int(i) for i in np.unique(sem_map)[1:]}

    vectorized_data = (
        (value, shape(polygon))
        for polygon, value in shapes(
            sem_map,
            mask=sem_map > 0,
        )
    )

    res = gpd.GeoDataFrame(
        vectorized_data,
        columns=["uid", "geometry"],
    )
    res["uid"] = res["uid"].astype(int)
    res = res.loc[res.area > min_size].reset_index(drop=True)
    res["class_name"] = res["uid"].map(class_dict)
    res = res[["uid", "class_name", "geometry"]]  # reorder columns

    if xoff is not None or yoff is not None:
        res["geometry"] = res["geometry"].translate(
            xoff if xoff is not None else 0, yoff if yoff is not None else 0
        )

    if smooth_func is not None:
        res["geometry"] = res["geometry"].apply(smooth_func)

    return res