drop_duplicate_rows
- cocohelper.utils.dataframe.drop_duplicate_rows(df, ignore_columns=None)[source]
Drop duplicates rows of a DataFrame and return a map of merged elements.
Duplicate are defined as rows with the same values except the index. Some columns can be ignored at the end of identifying duplicates.
- Parameters:
df (DataFrame) – input DataFrame.
ignore_columns (Optional[List[str]]) – the columns to ignore for duplicates identification.
- Returns:
The DataFrame without duplicates.
A dict that maps indices of the dropped (merged) elements to the indices of the corresponding kept elements.
- Return type:
Tuple[DataFrame, dict]