COCOStats

class cocohelper.stats.COCOStats[source]

Bases: object

This class contains methods to calculate stats on a dataset.

Parameters:

coco_helper – Coco dataset to calculate stats on.

Method List

get_annotation_size_stats([mode])

Obtain dataset labels size statistics in the form of a dictionary.

get_image_size_stats([eps])

Obtain statistics about dataset images sizes for each individual axis.

get_optimal_image_size([mode, n_pixels])

Estimate optimal image size based on the dataset's image and annotation statistics.

Attributes List

cat_ids_ratios

For each category, compute how many images in the dataset contains at least an annotation for that category.

cat_nms_ratios

For each category, compute how many images in the dataset contains at least an annotation for that category.

coco_helper

nb_anns

Number of annotations in the dataset

nb_cats

Number of categories in the dataset

nb_imgs

Number of images in the dataset

nb_imgs_wo_anns

Number of images in the dataset without annotations

Methods Details

__get_annotations_ratios(col, na_value=<NA>)

Get the ratios of annotations for each value in column.

The function picks the imgs/anns/cats join and uses value_counts to return a dict with the ratio for each value. Values are normalized in [0, 1] by the argument normalize=True. Missing values in the column will be replaced by na_value.

Returns:

Dict associating each value in col with the fraction of annotations (in [0, 1]).

Parameters:
  • col (str) –

  • na_value (Any) –

Return type:

dict

__get_category_ratios(return_nms=True)

Get the ratios of each category.

Parameters:

return_nms (bool) – if True, return the category name, otherwise return the category id.

Returns:

Dict associating each category name or id with the fraction of images

Return type:

dict

get_annotation_size_stats(mode='bbox')[source]

Obtain dataset labels size statistics in the form of a dictionary.

The dictionary pairs each dataset size in the dataset to the list of the smallest bounding box size inside the image. This can be useful to define the optimal rescaling of the image and avoid loosing small boxes when resizing data to a smaller dimension.

Parameters:

mode (str) – annotation type to be used to extract size statistics. Default to bbox.

Returns:

A dictionary with the statistics of the label size in the dataset.

Return type:

Dict

get_image_size_stats(eps=1e-16)[source]

Obtain statistics about dataset images sizes for each individual axis.

Parameters:

eps (float) – epsilon used to compute average height/width ratio.

Returns:

Size information about dataset images in the form of a dictionary.

Return type:

Dict

get_optimal_image_size(mode='median', n_pixels=4)[source]

Estimate optimal image size based on the dataset’s image and annotation statistics.

This function:
  1. Computes the label stats of the dataset paired to each image size.

  2. Computes the minimum image size that guarantees images resampled to

that resolution do not lose labels.

  1. Computes the statistic defined by the argument mode (mode can be in

[“mean”, “median”, “mode”]).

  1. returns the maximum between the minimum image size that guarantees

to maintain an annotation size of at least n_pixels on both the width and the height image axis AND the values computed in (3).

Parameters:
  • mode (str) – statistics to be used to define the optimal image size.

  • n_pixels (int) – minimum number of annotation pixels remaining over the width and height axis after resizing an image to the returned the optimal image size

Returns:

A tuple containing the optimal height and width for resizing all the images of the dataset to the same dimension while preserving label information.

Return type:

Tuple

Attribute Details

cat_ids_ratios

For each category, compute how many images in the dataset contains at least an annotation for that category.

Returns:

A dictionary associating to each category id the fraction of images in the dataset containing at least an annotation of that category (in [0, 1]).

cat_nms_ratios

For each category, compute how many images in the dataset contains at least an annotation for that category.

Returns:

A dictionary associating to each category name the fraction of images in the dataset containing at least an annotation of that category (in [0, 1]).

coco_helper
nb_anns

Number of annotations in the dataset

nb_cats

Number of categories in the dataset

nb_imgs

Number of images in the dataset

nb_imgs_wo_anns

Number of images in the dataset without annotations