Skip to content

set_uid

Set a unique identifier column to gdf.

Note

by default sets a running index column to gdf as the uid.

Parameters:

Name Type Description Default
gdf GeoDataFrame

Input Geodataframe.

required
start_ix int

The starting index of the id column.

0
id_col str

The name of the column that will be used or set to the id.

'uid'
drop bool

Drop the column after it is added to index.

False

Returns:

Type Description
GeoDataFrame

gpd.GeoDataFrame: The input gdf with a "uid" column added to it.

Examples:

>>> from histolytics.utils.gdf import set_uid
>>> from histolytics.data import hgsc_cancer_nuclei
>>> gdf = hgsc_cancer_nuclei()
>>> gdf = set_uid(gdf, drop=False)
>>> print(gdf.head(3))
                                                geometry  class_name  uid
    uid
    0    POLYGON ((1394.01 0, 1395.01 1.99, 1398 3.99, ...  connective    0
    1    POLYGON ((1391 2.01, 1387 2.01, 1384.01 3.01, ...  connective    1
    2    POLYGON ((1382.99 156.01, 1380 156.01, 1376.01...  connective    2
Source code in src/histolytics/utils/gdf.py
def set_uid(
    gdf: gpd.GeoDataFrame, start_ix: int = 0, id_col: str = "uid", drop: bool = False
) -> gpd.GeoDataFrame:
    """Set a unique identifier column to gdf.

    Note:
        by default sets a running index column to gdf as the uid.

    Parameters:
        gdf (gpd.GeoDataFrame):
            Input Geodataframe.
        start_ix (int):
            The starting index of the id column.
        id_col (str):
            The name of the column that will be used or set to the id.
        drop (bool):
            Drop the column after it is added to index.

    Returns:
        gpd.GeoDataFrame:
            The input gdf with a "uid" column added to it.

    Examples:
        >>> from histolytics.utils.gdf import set_uid
        >>> from histolytics.data import hgsc_cancer_nuclei
        >>> gdf = hgsc_cancer_nuclei()
        >>> gdf = set_uid(gdf, drop=False)
        >>> print(gdf.head(3))
                                                        geometry  class_name  uid
            uid
            0    POLYGON ((1394.01 0, 1395.01 1.99, 1398 3.99, ...  connective    0
            1    POLYGON ((1391 2.01, 1387 2.01, 1384.01 3.01, ...  connective    1
            2    POLYGON ((1382.99 156.01, 1380 156.01, 1376.01...  connective    2
    """
    # if id_col not in gdf.columns:
    gdf = gdf.assign(**{id_col: range(start_ix, len(gdf) + start_ix)})
    gdf = gdf.set_index(id_col, drop=drop)

    return gdf