COCO目标检测数据集的读取方法与Python工具脚本

COCO (Common Objects in COntext) 是一个大型的图像数据集，提供了目标检测、分割、看图说话等多个任务的标签。COCO的标注文件是用json格式编写的，初次接触时需要花十来分钟熟悉一下COCO的标注格式。

本文将简明地介绍COCO目标检测数据集的读取方法，并给出可以调用的Python脚本。读取其他任务的标签时也可以借鉴这些思路。

完整代码：https://github.com/SingleZombie/DL-Demos/tree/master/dldemos/MyYOLO/load_coco.py

格式介绍

COCO的官方网站给出了标注格式的介绍。其中，目标检测的格式是这样的：

{"info": info, "images": [image], "annotations": [annotation], "licenses": [license],
}info{"year": int, "version": str, "description": str, "contributor": str, "url": str, "date_created": datetime,
}image{"id": int, "width": int, "height": int, "file_name": str, "license": int, "flickr_url": str, "coco_url": str, "date_captured": datetime,
}license{"id": int, "name": str, "url": str,
}annotation{"id": int, "image_id": int, "category_id": int, "segmentation": RLE or [polygon], "area": float, "bbox": [x,y,width,height], "iscrowd": 0 or 1,
}categories[{"id": int, "name": str, "supercategory": str,
}]

这里面很多信息是冗余的。我们主要关心图像的信息和检测框的信息。假设该json文件读取进来后叫做root，我们就可以用root['images']获取图像信息的列表。在列表中，每一条图像信息的主要属性有：

image{"id": int, "width": int, "height": int, "file_name": str,
}

id是用来唯一标记一张图片的。之后我们需要根据这个id把图像与其标签绑定起来。其他三个属性都是常见的图像属性。

我们可以用root['annotation']获取标注信息的列表。每一条标注信息表示一个物体的检测信息，一幅图可能有多条检测信息。在目标检测中，我们主要关注检测框位置、大小、类别这几个信息。因此，我们最终要关注的属性有：

annotation{"id": int, "image_id": int, "category_id": int, "bbox": [x,y,width,height],
}

image_id与之前图像的id对应。我们可以根据这个域把图像和标注绑定起来。category_id可以与下文介绍的分类类别信息绑定起来。bbox则标记出了每一个检测框的位置和大小。’

最后，我们要获取每个类别id对应的类别名，方便检验目标检测的结果是否正确。类别信息的列表可以由root['categories']获得，每一条记录的格式为：

categories[{"id": int, "name": str, "supercategory": str,
}]

其中，id与前文categroy_id对应。name是具体的类别，supercategory是大类。一般我们只关注name就行。

读取脚本

知道了数据格式的原理，就可以写脚本读取它们了。

首先介绍一下Python的json库的用法。使用该库时，要先import json，之后用json.load(fp)就可以读取一个被打开的json文件指针fp了。以下是一个示例，路径请根据实际情况自行更改。

import jsondef print_json():with open('data/coco/annotations/instances_val2014.json') as fp:root = json.load(fp)

打开了文件后，我们可以输出一些信息，熟悉一下json的API，同时具体查看一下COCO标注文件的格式。

import jsondef print_json():with open('data/coco/annotations/instances_val2014.json') as fp:root = json.load(fp)print('info:')print(root['info'])print('categories:')print(root['categories'])print('Length of images:', len(root['images']))print(root['images'][0])print('Length of annotations:', len(root['annotations']))print(root['annotations'][0])def main():print_json()if __name__ == '__main__':main()

json库能以Python词典的形式访问json的对象，以列表的形式访问json的数组。这段代码中，root是文件的根节点。root['info']就是根节点的info属性的对象。root['categories'], root['images'], root['annotations']分别是类别、图像、标注信息的列表。

运行这个脚本，我的输出大概是：

info:
{'description': 'COCO 2014 Dataset', 'url': 'http://cocodataset.org', 'version': '1.0', 'year': 2014, 'contributor': 'COCO Consortium', 'date_created': '2017/09/01'}
categories:
[{'supercategory': 'person', 'id': 1, 'name': 'person'} ...
Length of images: 40504
{'license': 3, 'file_name': 'COCO_val2014_000000391895.jpg', 'coco_url': 'http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg', 'height': 360, 'width': 640, 'date_captured': '2013-11-14 11:18:45', 'flickr_url': 'http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg', 'id': 391895}
Length of annotations: 291875
{'segmentation': [[239.97, 260.24, 222.04, 270.49, 199.84, 253.41, 213.5, 227.79, 259.62, 200.46, 274.13, 202.17, 277.55, 210.71, 249.37, 253.41, 237.41, 264.51, 242.54, 261.95, 228.87, 271.34]], 'area': 2765.1486500000005, 'iscrowd': 0, 'image_id': 558840, 'bbox': [199.84, 200.46, 77.71, 70.88], 'category_id': 58, 'id': 156}

接下来，我们可以编写一个读取图像文件名及其对应检测框的函数。利用这个函数，我们就可以构建一个目标检测项目的数据集了。

def load_img_ann(ann_path='data/coco/annotations/instances_val2014.json'):"""return [{img_name, [{x, y, h, w, label}]}]"""with open(ann_path) as fp:root = json.load(fp)img_dict = {}for img_info in root['images']:img_dict[img_info['id']] = {'name': img_info['file_name'], 'anns': []}for ann_info in root['annotations']:img_dict[ann_info['image_id']]['anns'].append(ann_info['bbox'] + [ann_info['category_id']])return img_dict

在这个函数中，我们想构造一个词典img_dict。它的key是图像id，value是一个属性词典。属性词典的格式是:

{'name': ...,'anns': [[x, y, w, h, label], ...]
}

图像文件名name可以从root['images']里获取，标注信息可以从root['annotations']里获取。跑两个循环，取出对应的信息，把信息组合一下塞入词典即可。

注意！ COCO 2014里的category id不是连续的。看上去category id最多到了90，但实际上一共只有80个类别。在项目中，还应该自己写一层0-79到categroy id的映射。

在自己的目标检测项目中，直接调load_img_ann(ann_path)就行了。根据具体的目标检测算法，再进一步预处理检测框。

可视化验证

我自己的项目里有一个可视化bbox的函数draw_bbox。为了验证读取数据集的函数是否正确，我还写了一个可视化COCO标签的函数。整个脚本如下：


import json
import osdef load_img_ann(ann_path='data/coco/annotations/instances_val2014.json'):"""return [{img_name, [ (x, y, h, w, label), ... ]}]"""with open(ann_path) as fp:root = json.load(fp)img_dict = {}for img_info in root['images']:img_dict[img_info['id']] = {'name': img_info['file_name'], 'anns': []}for ann_info in root['annotations']:img_dict[ann_info['image_id']]['anns'].append(ann_info['bbox'] + [ann_info['category_id']])return img_dictdef show_img_ann(img_info):from PIL import Imagefrom dldemos.nms.show_bbox import draw_bboxprint(img_info)with open('data/coco/annotations/instances_val2014.json') as fp:root = json.load(fp)categories = root['categories']category_dict = {int(c['id']): c['name'] for c in categories}img_path = os.path.join('data/coco/val2014', img_info['name'])img = Image.open(img_path)for ann in img_info['anns']:x, y, w, h = ann[0:4]x1, y1, x2, y2 = x, y, x + w, y + hdraw_bbox(img, (x1, y1, x2, y2), 1.0, text=category_dict[ann[4]])img.save('work_dirs/tmp.jpg')def main():img_dict = load_img_ann()keys = list(img_dict.keys())show_img_ann(img_dict[keys[1]])if __name__ == '__main__':main()

里面的所有路径都需要根据实际情况进行修改。

这份脚本的可视化结果work_dirs/tmp.jpg如下：

可以看出，读取出来的标签都是对的。我写的这个API可以放心地调用。