COCOHelper

class cocohelper.helper.COCOHelper[source]

Bases: object

Represent a dataset in the COCO format.

To create an instance of COCOHelper is advisable to use the load methods.

Parameters:
  • img_df – DataFrame of images.

  • ann_df – DataFrame of annotations.

  • cat_df – DataFrame of categories, optional.

  • lic_df – DataFrame of licenses, optional.

  • info – Info dict, optional.

  • coco_dir – Root directory of the dataset, optional.

  • paths – COCOHelperPaths, used to customize directory structure.

  • validate – If True, validate the COCO dataset and raise an error if invalid.

Raises:
  • COCOValidationError if the input COCO dataset is not valid. This

  • check is performed only if validate=True.

Method List

_copy_images(target_img_dir)

_read_annotations_file(annotation_file)

Read a COCO json file as a dict.

_remove_unlinked_anns()

Remove annotations that have non-existing image or categories ids.

_validate()

Validate the COCO dataset and raise an error if invalid.

copy([cat_df, img_df, ann_df, lic_df, info, ...])

Copy the dataset and optionally change some dataframes.

drop_duplicate_anns()

Drop duplicate annotations (same values with different index).

drop_duplicate_cats()

Drop duplicate categories (same values with different index).

drop_duplicate_imgs()

Drop duplicate images (same values with different index).

drop_duplicate_licenses()

Drop duplicate licenses (same values with different index).

drop_labelled()

Get a new COCOHelper dataset that does only contain unlabelled images.

drop_unlabelled()

Get a new COCOHelper dataset that does not contain unlabelled images.

filter(cfilter, *[, ann_ids, img_ids, ...])

Get a copy of the dataset with the applied filters.

filter_anns([cfilter, ann_ids, img_ids, ...])

Get a copy of the dataset with filtered annotations.

filter_cats([cfilter, cat_ids, cat_nms, ...])

Get a copy of the dataset with filtered categories.

filter_imgs([cfilter, img_ids, img_nms, ...])

Get a copy of the dataset with filtered images.

filtered_anns([cfilter, ann_ids, img_ids, ...])

Get dataset's annotations after join with categories and images, and potentially after a filtering.

filtered_cats([cfilter, cat_ids, cat_nms, ...])

Get dataset's categories, potentially filtered by the provided filters.

filtered_imgs([cfilter, img_ids, img_nms, ...])

Get dataset's images, after join with annotations and categories and potentially filtered by a filter.

get_ann_sample([ann_id, idx, transform])

Load a single annotation with the corresponding image.

get_img(img_id)

Load the image with img_id as a numpy array.

get_img_sample([img_id, idx, transform])

Load an image with infos and annotations.

load(coco_dir[, ann_fname, ann_dir, ...])

Create a COCOHelper from a COCO dataset stored in a directory.

load_data(annotations, coco_dir[, ...])

load_json(json_annotations_file[, img_dir, ...])

Create COCOHelper from json annotation file of the COCO dataset stored in a directory.

merge(*coco_helper[, drop_duplicates])

Merge different COCO datasets with all categories, images, annotations and licenses merged.

new_info_dict()

Get a generic info dict for COCO format.

save(coco_dir[, fix_img_path, copy_images])

Save the current COCOHelper to a directory.

to_coco()

Convert COCOHelper to pycocotools.COCO

to_json_dataset()

Convert the current COCOHelper to a dict with the same structure of the COCO json file.

write_annotations_file(annotation_file_path)

Save the current COCOHelper as a COCO json file.

Attributes List

anns

Dataframe containing the annotations data of the COCO dataset.

cats

Dataframe containing the categories data of the COCO dataset.

imgs

Dataframe containing the images metadata of the COCO dataset.

info

Dataframe containing extra information of the COCO dataset.

joins

Get a COCOJoins object, that enable easy access to different joins dataset tables.

labelled_imgs

Get only the labelled images as a DataFrame.

licenses

Dataframe containing the licenses of the COCO dataset.

paths

Information about folder and file organization for a COCO dataset.

root_path

Path to the root directory containing the COCO dataset.

unlabelled_imgs

Get only the unlabelled images as a DataFrame.

validator

Get a COCOValidator object, that enable easy access to different validation methods.

Methods Details

_copy_images(target_img_dir)[source]
Parameters:

target_img_dir (Union[str, Path]) –

classmethod _read_annotations_file(annotation_file)[source]

Read a COCO json file as a dict.

Parameters:

annotation_file (str) –

Return type:

dict

_remove_unlinked_anns()[source]

Remove annotations that have non-existing image or categories ids.

_validate()[source]

Validate the COCO dataset and raise an error if invalid.

Return type:

None

copy(cat_df=None, img_df=None, ann_df=None, lic_df=None, info=None, validate=False)[source]

Copy the dataset and optionally change some dataframes.

When changing categories or images, annotations that result as invalid will be removed.

Parameters:
  • cat_df (Optional[DataFrame]) – New category dataframe, optional

  • img_df (Optional[DataFrame]) – New image dataframe, optional

  • ann_df (Optional[DataFrame]) – New annotation dataframe, optional

  • lic_df (Optional[DataFrame]) – New license dataframe, optional

  • info (Optional[dict]) – New info dict, optional

  • validate (bool) – If True, validate the COCO dataset and raise an error if invalid

Returns:

A new COCOHelper object.

Return type:

COCOHelper

drop_duplicate_anns()[source]

Drop duplicate annotations (same values with different index).

drop_duplicate_cats()[source]

Drop duplicate categories (same values with different index).

drop_duplicate_imgs()[source]

Drop duplicate images (same values with different index).

drop_duplicate_licenses()[source]

Drop duplicate licenses (same values with different index).

drop_labelled()[source]

Get a new COCOHelper dataset that does only contain unlabelled images.

Returns:

A new COCOHelper object containing only unlabelled images.

Return type:

COCOHelper

drop_unlabelled()[source]

Get a new COCOHelper dataset that does not contain unlabelled images.

Returns:

A new COCOHelper object containing only labelled images.

Return type:

COCOHelper

filter(cfilter, *, ann_ids=None, img_ids=None, img_nms=None, cat_ids=None, cat_nms=None, supercat_nms=None, area_rng=None, is_crowd=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False, drop_orphans=True)[source]

Get a copy of the dataset with the applied filters.

Parameters:
  • cfilter (Filter) – a custom Filter for the COCOHelper.

  • ann_ids (Optional[Union[Sequence[int], int]]) – a filter for the annotation ids.

  • img_ids (Optional[Union[Sequence[int], int]]) – a filter for the image ids.

  • img_nms (Optional[Union[Sequence[str], str]]) – a filter for the image file names.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • area_rng (Optional[Tuple[float, float]]) – a filter for the annotation area.

  • is_crowd (Optional[bool]) – a filter for the annotation being a crowd or not (“is_crowd” in the annotation of the COCO JSON file).

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

  • drop_orphans (bool) – if True, drop orphans when applying the filter.

Returns:

A COCOHelper with data filtered according to the given filters.

Return type:

COCOHelper

filter_anns(cfilter=None, *, ann_ids=None, img_ids=None, img_nms=None, cat_ids=None, cat_nms=None, supercat_nms=None, area_rng=None, is_crowd=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get a copy of the dataset with filtered annotations.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • ann_ids (Optional[Union[Sequence[int], int]]) – a filter for the annotation ids.

  • img_ids (Optional[Union[Sequence[int], int]]) – a filter for the image ids.

  • img_nms (Optional[Union[Sequence[str], str]]) – a filter for the image file names.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • area_rng (Optional[Tuple[float, float]]) – a filter for the annotation area.

  • is_crowd (Optional[bool]) – a filter for the annotation being a crowd or not (“is_crowd” in the annotation of the COCO JSON file).

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A COCOHelper with data filtered according to the given filters.

Return type:

COCOHelper

filter_cats(cfilter=None, *, cat_ids=None, cat_nms=None, supercat_nms=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get a copy of the dataset with filtered categories.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A COCOHelper with data filtered according to the given filters.

Return type:

COCOHelper

filter_imgs(cfilter=None, *, img_ids=None, img_nms=None, cat_ids=None, cat_nms=None, supercat_nms=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get a copy of the dataset with filtered images.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • img_ids (Optional[Union[Sequence[int], int]]) – a filter for the image ids.

  • img_nms (Optional[Union[Sequence[str], str]]) – a filter for the image file names.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A COCOHelper with data filtered according to the given filters.

Return type:

COCOHelper

filtered_anns(cfilter=None, *, ann_ids=None, img_ids=None, img_nms=None, cat_ids=None, cat_nms=None, supercat_nms=None, area_rng=None, is_crowd=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get dataset’s annotations after join with categories and images, and potentially after a filtering.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • ann_ids (Optional[Union[Sequence[int], int]]) – a filter for the annotation ids.

  • img_ids (Optional[Union[Sequence[int], int]]) – a filter for the image ids.

  • img_nms (Optional[Union[Sequence[str], str]]) – a filter for the image file names.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • area_rng (Optional[Tuple[float, float]]) – a filter for the annotation area.

  • is_crowd (Optional[bool]) – a filter for the annotation being a crowd or not (“is_crowd” in the annotation of the COCO JSON file).

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A pandas.DataFrame containing the filtered annotations.

Return type:

DataFrame

filtered_cats(cfilter=None, *, cat_ids=None, cat_nms=None, supercat_nms=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get dataset’s categories, potentially filtered by the provided filters.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A pandas.DataFrame containing the filtered categories.

Return type:

DataFrame

filtered_imgs(cfilter=None, *, img_ids=None, img_nms=None, cat_ids=None, cat_nms=None, supercat_nms=None, composition=<class 'cocohelper.filters.filter.AndFilter'>, invert=False)[source]

Get dataset’s images, after join with annotations and categories and potentially filtered by a filter.

Parameters:
  • cfilter (Optional[Filter]) – a custom Filter for the COCOHelper.

  • img_ids (Optional[Union[Sequence[int], int]]) – a filter for the image ids.

  • img_nms (Optional[Union[Sequence[str], str]]) – a filter for the image file names.

  • cat_ids (Optional[Union[Sequence[int], int]]) – a filter for the category ids.

  • cat_nms (Optional[Union[Sequence[str], str]]) – a filter for the category names.

  • supercat_nms (Optional[Union[Sequence[str], str]]) – a filter for the super-category names.

  • composition (Type[ComposeFilter]) – a composition type for the filters (defaults to “and” behavior between each filter).

  • invert (bool) – if True, invert the way the filter works.

Returns:

A pandas.DataFrame containing the filtered images.

Return type:

DataFrame

get_ann_sample(ann_id=None, idx=None, transform=None)[source]

Load a single annotation with the corresponding image.

Parameters:
  • ann_id (Optional[int]) – The id of annotation to load, partially optional (if not provided, idx must be provided).

  • idx (Optional[int]) – The index of annotation to load, partially optional (if not provided, ann_id must be provided).

  • transform (Optional[Transform]) – An optional Transform to modify the image and annotation.

Returns:

The image as a numpy array and the annotation infos as a dict.

Return type:

Tuple[np.ndarray, dict]

get_img(img_id)[source]

Load the image with img_id as a numpy array.

Parameters:

img_id (int) – The id of the image to load.

Returns:

A numpy array with shape (H, W, C).

Return type:

ndarray

get_img_sample(img_id=None, idx=None, transform=None)[source]

Load an image with infos and annotations.

Parameters:
  • img_id (Optional[int]) – The id of the image to load, partially optional (if not provided, idx must be provided).

  • idx (Optional[int]) – The index of the image to load, partially optional (if not provided, img_id must be provided).

  • transform (Optional[Transform]) – An optional Transform to modify the image and annotations.

Returns:

A dictionary with image infos and data, and a list of annotations.

Return type:

Tuple[dict, list]

classmethod load(coco_dir, ann_fname='coco.json', ann_dir='annotations/', img_dir='images/', validate=False)[source]

Create a COCOHelper from a COCO dataset stored in a directory.

Parameters:
  • coco_dir (str) – path to the directory containing the dataset.

  • ann_fname (str) – name of the annotation file to be load.

  • ann_dir (str) – name/relative-path to the directory where annotations are stored.

  • img_dir (str) – name/relative-path to the directory where images are stored.

  • validate (bool) – If True, validate the dataset.

Returns:

A COCOHelper object.

Return type:

COCOHelper

classmethod load_data(annotations, coco_dir, ann_fname='coco.json', ann_dir='annotations/', img_dir='images/', validate=False)[source]
Parameters:
  • annotations (Dict[str, DataFrame]) –

  • coco_dir (str) –

  • ann_fname (str) –

  • ann_dir (str) –

  • img_dir (str) –

  • validate (bool) –

Return type:

COCOHelper

classmethod load_json(json_annotations_file, img_dir='images/', validate=False)[source]

Create COCOHelper from json annotation file of the COCO dataset stored in a directory.

Parameters:
  • json_annotations_file (str) – path to the json file containing the dataset annotations.

  • img_dir (str) – name/relative-path to the directory where images are stored, respect to the coco dataset root.

  • validate (bool) – If True, validate the dataset.

Returns:

A COCOHelper object.

Return type:

COCOHelper

merge(*coco_helper, drop_duplicates=True)[source]

Merge different COCO datasets with all categories, images, annotations and licenses merged.

Parameters:
  • *coco_helper (COCOHelper) – coco dataset(s) to merge with this coco dataset.

  • drop_duplicates (bool) – if True, merge duplicate rows dropping redundant.

Returns:

A COCOHelper resulting from merging multiple datasets.

Return type:

COCOHelper

static new_info_dict()[source]

Get a generic info dict for COCO format.

Return type:

dict

save(coco_dir, fix_img_path=False, copy_images=False)[source]

Save the current COCOHelper to a directory.

Parameters:
  • coco_dir (Union[str, Path]) – Output root directory.

  • fix_img_path (bool) – NotImplemented.

  • copy_images (bool) – NotImplemented.

Returns:

None.

Return type:

None

to_coco()[source]

Convert COCOHelper to pycocotools.COCO

Return type:

COCO

to_json_dataset()[source]

Convert the current COCOHelper to a dict with the same structure of the COCO json file.

Return type:

dict

write_annotations_file(annotation_file_path)[source]

Save the current COCOHelper as a COCO json file.

Parameters:

annotation_file_path (Union[str, Path]) –

Attribute Details

anns

Dataframe containing the annotations data of the COCO dataset.

cats

Dataframe containing the categories data of the COCO dataset.

imgs

Dataframe containing the images metadata of the COCO dataset.

info

Dataframe containing extra information of the COCO dataset.

joins

Get a COCOJoins object, that enable easy access to different joins dataset tables.

labelled_imgs

Get only the labelled images as a DataFrame.

Returns:

A pandas.DataFrame containing the labelled images.

licenses

Dataframe containing the licenses of the COCO dataset.

paths

Information about folder and file organization for a COCO dataset.

root_path

Path to the root directory containing the COCO dataset.

unlabelled_imgs

Get only the unlabelled images as a DataFrame.

Returns:

A pandas.DataFrame containing the unlabelled images.

validator

Get a COCOValidator object, that enable easy access to different validation methods.