COCOStats
- class cocohelper.stats.COCOStats[source]
Bases:
object
This class contains methods to calculate stats on a dataset.
- Parameters:
coco_helper – Coco dataset to calculate stats on.
Method List
get_annotation_size_stats
([mode])Obtain dataset labels size statistics in the form of a dictionary.
get_image_size_stats
([eps])Obtain statistics about dataset images sizes for each individual axis.
get_optimal_image_size
([mode, n_pixels])Estimate optimal image size based on the dataset's image and annotation statistics.
Attributes List
For each category, compute how many images in the dataset contains at least an annotation for that category.
For each category, compute how many images in the dataset contains at least an annotation for that category.
Number of annotations in the dataset
Number of categories in the dataset
Number of images in the dataset
Number of images in the dataset without annotations
Methods Details
- __get_annotations_ratios(col, na_value=<NA>)
Get the ratios of annotations for each value in column.
The function picks the imgs/anns/cats join and uses value_counts to return a dict with the ratio for each value. Values are normalized in [0, 1] by the argument normalize=True. Missing values in the column will be replaced by na_value.
- Returns:
Dict associating each value in col with the fraction of annotations (in [0, 1]).
- Parameters:
col (str) –
na_value (Any) –
- Return type:
dict
- __get_category_ratios(return_nms=True)
Get the ratios of each category.
- Parameters:
return_nms (bool) – if True, return the category name, otherwise return the category id.
- Returns:
Dict associating each category name or id with the fraction of images
- Return type:
dict
- get_annotation_size_stats(mode='bbox')[source]
Obtain dataset labels size statistics in the form of a dictionary.
The dictionary pairs each dataset size in the dataset to the list of the smallest bounding box size inside the image. This can be useful to define the optimal rescaling of the image and avoid loosing small boxes when resizing data to a smaller dimension.
- Parameters:
mode (str) – annotation type to be used to extract size statistics. Default to bbox.
- Returns:
A dictionary with the statistics of the label size in the dataset.
- Return type:
Dict
- get_image_size_stats(eps=1e-16)[source]
Obtain statistics about dataset images sizes for each individual axis.
- Parameters:
eps (float) – epsilon used to compute average height/width ratio.
- Returns:
Size information about dataset images in the form of a dictionary.
- Return type:
Dict
- get_optimal_image_size(mode='median', n_pixels=4)[source]
Estimate optimal image size based on the dataset’s image and annotation statistics.
- This function:
Computes the label stats of the dataset paired to each image size.
Computes the minimum image size that guarantees images resampled to
that resolution do not lose labels.
Computes the statistic defined by the argument mode (mode can be in
[“mean”, “median”, “mode”]).
returns the maximum between the minimum image size that guarantees
to maintain an annotation size of at least n_pixels on both the width and the height image axis AND the values computed in (3).
- Parameters:
mode (str) – statistics to be used to define the optimal image size.
n_pixels (int) – minimum number of annotation pixels remaining over the width and height axis after resizing an image to the returned the optimal image size
- Returns:
A tuple containing the optimal height and width for resizing all the images of the dataset to the same dimension while preserving label information.
- Return type:
Tuple
Attribute Details
- cat_ids_ratios
For each category, compute how many images in the dataset contains at least an annotation for that category.
- Returns:
A dictionary associating to each category id the fraction of images in the dataset containing at least an annotation of that category (in [0, 1]).
- cat_nms_ratios
For each category, compute how many images in the dataset contains at least an annotation for that category.
- Returns:
A dictionary associating to each category name the fraction of images in the dataset containing at least an annotation of that category (in [0, 1]).
- coco_helper
- nb_anns
Number of annotations in the dataset
- nb_cats
Number of categories in the dataset
- nb_imgs
Number of images in the dataset
- nb_imgs_wo_anns
Number of images in the dataset without annotations