Skip to content

inst2gdf

Convert an instance segmentation raster mask to a GeoDataFrame.

Note

This function should be applied to nuclei instance segmentation masks. Nuclei types can be provided with the type_map and class_dict arguments if needed.

Parameters:

Name Type Description Default
inst_map ndarray

An instance segmentation mask. Shape (H, W).

required
type_map ndarray

A type segmentation mask. Shape (H, W). If provided, the types will be included in the resulting GeoDataFrame in column 'class_name'.

None
xoff int

The x offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
yoff int

The y offset. Optional. The offset is used to translate the geometries in the GeoDataFrame. If None, no translation is applied.

None
class_dict Dict[int, str]

A dictionary mapping class indices to class names. e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.

None
min_size int

The minimum size (in pixels) of the polygons to include in the GeoDataFrame.

15
smooth_func Callable

A function to smooth the polygons. The function should take a shapely Polygon as input and return a shapely Polygon. Defaults to uniform_smooth, which applies a uniform filter. histolytics.utils._filters also provides gaussian_smooth and median_smooth for smoothing.

uniform_smooth

Returns:

Type Description
GeoDataFrame

gpd.GeoDataFrame: A GeoDataFrame of the raster instance mask. Contains columns:

- 'id' - the numeric pixel value of the instance mask,
- 'class_name' - the name or index of the instance class (requires `type_map` and `class_dict`),
- 'geometry' - the geometry of the polygon.

Examples:

>>> from histolytics.utils.raster import inst2gdf
>>> from histolytics.data import hgsc_cancer_inst_mask, hgsc_cancer_type_mask
>>> # load raster masks
>>> inst_mask = hgsc_cancer_inst_mask()
>>> type_mask = hgsc_cancer_type_mask()
>>> # convert to GeoDataFrame
>>> gdf = inst2gdf(inst_mask, type_mask)
>>> print(gdf.head(3))
        uid  class_name                                           geometry
    0  135           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
    1  200           1  POLYGON ((817.01 0.225, 817.215 0.804, 817.795...
    2    0           1  POLYGON ((1394.01 0.45, 1394.215 1.58, 1394.79...
Source code in src/histolytics/utils/raster.py
def inst2gdf(
    inst_map: np.ndarray,
    type_map: np.ndarray = None,
    xoff: int = None,
    yoff: int = None,
    class_dict: Dict[int, str] = None,
    min_size: int = 15,
    smooth_func: Callable = uniform_smooth,
) -> gpd.GeoDataFrame:
    """Convert an instance segmentation raster mask to a GeoDataFrame.

    Note:
        This function should be applied to nuclei instance segmentation masks. Nuclei
        types can be provided with the `type_map` and `class_dict` arguments if needed.

    Parameters:
        inst_map (np.ndarray):
            An instance segmentation mask. Shape (H, W).
        type_map (np.ndarray):
            A type segmentation mask. Shape (H, W). If provided, the types will be
            included in the resulting GeoDataFrame in column 'class_name'.
        xoff (int):
            The x offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        yoff (int):
            The y offset. Optional. The offset is used to translate the geometries
            in the GeoDataFrame. If None, no translation is applied.
        class_dict (Dict[int, str]):
            A dictionary mapping class indices to class names.
            e.g. {1: 'neoplastic', 2: 'immune'}. If None, the class indices will be used.
        min_size (int):
            The minimum size (in pixels) of the polygons to include in the GeoDataFrame.
        smooth_func (Callable):
            A function to smooth the polygons. The function should take a shapely Polygon
            as input and return a shapely Polygon. Defaults to `uniform_smooth`, which
            applies a uniform filter. `histolytics.utils._filters` also provides
            `gaussian_smooth` and `median_smooth` for smoothing.

    returns:
        gpd.GeoDataFrame:
            A GeoDataFrame of the raster instance mask. Contains columns:

                - 'id' - the numeric pixel value of the instance mask,
                - 'class_name' - the name or index of the instance class (requires `type_map` and `class_dict`),
                - 'geometry' - the geometry of the polygon.

    Examples:
        >>> from histolytics.utils.raster import inst2gdf
        >>> from histolytics.data import hgsc_cancer_inst_mask, hgsc_cancer_type_mask
        >>> # load raster masks
        >>> inst_mask = hgsc_cancer_inst_mask()
        >>> type_mask = hgsc_cancer_type_mask()
        >>> # convert to GeoDataFrame
        >>> gdf = inst2gdf(inst_mask, type_mask)
        >>> print(gdf.head(3))
                uid  class_name                                           geometry
            0  135           1  POLYGON ((405.019 0.45, 405.43 1.58, 406.589 2...
            1  200           1  POLYGON ((817.01 0.225, 817.215 0.804, 817.795...
            2    0           1  POLYGON ((1394.01 0.45, 1394.215 1.58, 1394.79...
    """
    # handle empty masks
    if inst_map.size == 0 or np.max(inst_map) == 0:
        return gpd.GeoDataFrame(columns=["uid", "class_name", "geometry"])

    if type_map is None:
        type_map = inst_map > 0

    types = np.unique(type_map)[1:]

    if class_dict is None:
        class_dict = {int(i): int(i) for i in types}

    inst_maps_per_type = []
    for t in types:
        mask = type_map == t
        vectorized_data = (
            (value, class_dict[int(t)], shape(polygon))
            for polygon, value in shapes(inst_map, mask=mask)
        )

        res = gpd.GeoDataFrame(
            vectorized_data,
            columns=["uid", "class_name", "geometry"],
        )
        res["uid"] = res["uid"].astype(int)
        inst_maps_per_type.append(res)

    res = pd.concat(inst_maps_per_type)

    # filter out small geometries
    res = res.loc[res.area > min_size].reset_index(drop=True)

    # translate geometries if offsets are provided
    if xoff is not None or yoff is not None:
        res["geometry"] = res["geometry"].translate(
            xoff if xoff is not None else 0, yoff if yoff is not None else 0
        )

    # smooth geometries if a smoothing function is provided
    if smooth_func is not None:
        res["geometry"] = res["geometry"].apply(smooth_func)

    return res