Skip to content

inst2gdf

Convert an instance segmentation raster mask to a GeoDataFrame.

Note

This function should be applied to nuclei instance segmentation masks. Nuclei types can be provided with the type_map and class_dict arguments if needed.

Parameters:

Name Type Description Default
inst_map ndarray

An instance segmentation mask. Shape (H, W).

required
type_map ndarray

A type segmentation mask. Shape (H, W). If provided, the types will be included in the resulting GeoDataFrame in column 'class_name'.

None
xoff int

The x offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
yoff int

The y offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
class_dict Dict[int, str], default=None

A dictionary mapping class indices to class names. e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.

None
min_size int

The minimum size (in pixels) of the polygons to include in the GeoDataFrame.

15
smooth_func Callable

A function to smooth the polygons. The function should take a shapely Polygon as input and return a shapely Polygon.

gaussian_smooth

Returns:

Type Description
GeoDataFrame

gpd.GeoDataFrame: A GeoDataFrame of the raster instance mask. Contains columns:

- 'id' - the numeric pixel value of the instance mask,
- 'class_name' - the name or index of the instance class (requires `type_map` and `class_dict`),
- 'geometry' - the geometry of the polygon.

Examples:

>>> from histolytics.utils.raster import inst2gdf
>>> from histolytics.data import hgsc_cancer_inst_mask, hgsc_cancer_type_mask
>>> # load raster masks
>>> inst_mask = hgsc_cancer_inst_mask()
>>> type_mask = hgsc_cancer_type_mask()
>>> # convert to GeoDataFrame
>>> gdf = inst2gdf(inst_mask, type_mask)
>>> print(gdf.head(3))
        id  class_name                                           geometry
    0  135           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
    1  200           1  POLYGON ((817.01 0.225, 817.215 0.804, 817.795...
    2    0           1  POLYGON ((1394.01 0.45, 1394.215 1.58, 1394.79...
Source code in src/histolytics/utils/raster.py
def inst2gdf(
    inst_map: np.ndarray,
    type_map: np.ndarray = None,
    xoff: int = None,
    yoff: int = None,
    class_dict: Dict[int, str] = None,
    min_size: int = 15,
    smooth_func: Callable = gaussian_smooth,
) -> gpd.GeoDataFrame:
    """Convert an instance segmentation raster mask to a GeoDataFrame.

    Note:
        This function should be applied to nuclei instance segmentation masks. Nuclei
        types can be provided with the `type_map` and `class_dict` arguments if needed.

    Parameters:
        inst_map (np.ndarray):
            An instance segmentation mask. Shape (H, W).
        type_map (np.ndarray):
            A type segmentation mask. Shape (H, W). If provided, the types will be
            included in the resulting GeoDataFrame in column 'class_name'.
        xoff (int):
            The x offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        yoff (int):
            The y offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        class_dict (Dict[int, str], default=None):
            A dictionary mapping class indices to class names.
            e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.
        min_size (int):
            The minimum size (in pixels) of the polygons to include in the GeoDataFrame.
        smooth_func (Callable):
            A function to smooth the polygons. The function should take a shapely Polygon
            as input and return a shapely Polygon.

    returns:
        gpd.GeoDataFrame:
            A GeoDataFrame of the raster instance mask. Contains columns:

                - 'id' - the numeric pixel value of the instance mask,
                - 'class_name' - the name or index of the instance class (requires `type_map` and `class_dict`),
                - 'geometry' - the geometry of the polygon.

    Examples:
        >>> from histolytics.utils.raster import inst2gdf
        >>> from histolytics.data import hgsc_cancer_inst_mask, hgsc_cancer_type_mask
        >>> # load raster masks
        >>> inst_mask = hgsc_cancer_inst_mask()
        >>> type_mask = hgsc_cancer_type_mask()
        >>> # convert to GeoDataFrame
        >>> gdf = inst2gdf(inst_mask, type_mask)
        >>> print(gdf.head(3))
                id  class_name                                           geometry
            0  135           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
            1  200           1  POLYGON ((817.01 0.225, 817.215 0.804, 817.795...
            2    0           1  POLYGON ((1394.01 0.45, 1394.215 1.58, 1394.79...
    """

    if type_map is None:
        type_map = inst_map > 0

    types = np.unique(type_map)[1:]

    if class_dict is None:
        class_dict = {int(i): int(i) for i in types}

    inst_maps_per_type = []
    for t in types:
        mask = type_map == t
        vectorized_data = (
            (value, class_dict[int(t)], shape(polygon))
            for polygon, value in shapes(
                inst_map,
                mask=mask,
            )
        )

        res = gpd.GeoDataFrame(
            vectorized_data,
            columns=["id", "class_name", "geometry"],
        )
        res["id"] = res["id"].astype(int)
        inst_maps_per_type.append(res)

    res = pd.concat(inst_maps_per_type)
    res = res.loc[res.area > min_size].reset_index(drop=True)

    if xoff is not None:
        res["geometry"] = res["geometry"].translate(xoff, 0)

    if yoff is not None:
        res["geometry"] = res["geometry"].translate(0, yoff)

    if smooth_func is not None:
        res["geometry"] = res["geometry"].apply(smooth_func)

    return res