VOC Dataset
下载链接:
链接:https://pan.baidu.com/s/1L_tCyT3zr4vWcSW6Eyoxeg
提取码:8ful
--来自百度网盘超级会员V3的分享
目录
1. 标注文件内容
2. python解析代码
3. mosaic增强 (可选操作)
1. 标注文件内容
Annotations/000005.xml,主要内容如下。
<annotation><folder>VOC2007</folder><filename>000005.jpg</filename> # 对应图片文件名<source><database>The VOC2007 Database</database><annotation>PASCAL VOC2007</annotation><image>flickr</image><flickrid>325991873</flickrid></source><owner><flickrid>archintent louisville</flickrid><name>?</name></owner><size> # 图像原始尺寸<width>500</width><height>375</height><depth>3</depth></size><segmented>0</segmented> # 是否用于分割<object><name>chair</name> # 物体类别<pose>Rear</pose> # 拍摄角度:front, rear, left, right, unspecified<truncated>0</truncated> # 目标是否被截断,或者被遮挡(超过15%)<difficult>0</difficult> # 检测难易程度,这个主要是根据目标的大小,光照变化,图片质量来判断<bndbox> # 目标位置<xmin>263</xmin><ymin>211</ymin><xmax>324</xmax><ymax>339</ymax></bndbox></object>
</annotation>
2. python解析代码
""" return:img: tensor. rgb. (c,h,w). 缩放后的图像.gt: numpy. (num_bbox,5). 相对于原图归一化的坐标和类别信息[xmin/w, ymin/h, xmax/w, ymax/h, label_ind]eg. [[0.524,0.56,0.646,0.90133333,8], [], ...]"""
"""VOC Dataset ClassesOriginal author: Francisco Massa
https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.pyUpdated by: Ellis Brown, Max deGroot
"""
import os.path as osp
import sys
import torch
import torch.utils.data as data
import cv2
import numpy as np
import randomif sys.version_info[0] == 2:import xml.etree.cElementTree as ET
else:import xml.etree.ElementTree as ETVOC_CLASSES = ( # always index 0'aeroplane', 'bicycle', 'bird', 'boat','bottle', 'bus', 'car', 'cat', 'chair','cow', 'diningtable', 'dog', 'horse','motorbike', 'person', 'pottedplant','sheep', 'sofa', 'train', 'tvmonitor')# note: if you used our download scripts, this should be right
path_to_dir = osp.dirname(osp.abspath(__file__))
VOC_ROOT = path_to_dir + "/VOCdevkit/"# VOC_ROOT = "/home/k303/object-detection/dataset/VOCdevkit/"class VOCAnnotationTransform(object):"""Transforms a VOC annotation into a Tensor of bbox coords and label indexInitilized with a dictionary lookup of classnames to indexesArguments:class_to_ind (dict, optional): dictionary lookup of classnames -> indexes(default: alphabetic indexing of VOC's 20 classes)keep_difficult (bool, optional): keep difficult instances or not(default: False)height (int): heightwidth (int): width"""def __init__(self, class_to_ind=None, keep_difficult=False):"""class_to_ind = {"cat": 0,"**", 1,}"""self.class_to_ind = class_to_ind or dict(zip(VOC_CLASSES, range(len(VOC_CLASSES))))self.keep_difficult = keep_difficultdef __call__(self, target, width, height):"""Arguments:target (annotation) : the target annotation to be made usablewill be an ET.ElementReturns:a list containing lists of bounding boxes [xmin/w, ymin/h, xmax/w, ymax/h, label_ind]"""res = []for obj in target.iter('object'): # 利用根节点,找到子节点objectdifficult = int(obj.find('difficult').text) == 1 # object子节点difficult,等于1,则属于困难样本if not self.keep_difficult and difficult: # 如果不保留困难样本,且当前是困难样本,则跳过。continuename = obj.find('name').text.lower().strip() # 目标类别bbox = obj.find('bndbox') # 目标位置pts = ['xmin', 'ymin', 'xmax', 'ymax']bndbox = []for i, pt in enumerate(pts):cur_pt = int(bbox.find(pt).text) - 1 # 这里为啥要-1# scale height or widthcur_pt = cur_pt / width if i % 2 == 0 else cur_pt / heightbndbox.append(cur_pt)label_idx = self.class_to_ind[name] # 获取目标对应的类别数bndbox.append(label_idx) # 在归一化后的位置信息后追加类别信息。res += [bndbox] # [xmin, ymin, xmax, ymax, label_ind]# img_id = target.find('filename').text[:-4]return res # [[xmin, ymin, xmax, ymax, label_ind], ... ]class VOCDetection(data.Dataset):"""VOC Detection Dataset Objectinput is image, target is annotationArguments:root (string): filepath to VOCdevkit folder.image_set (string): imageset to use (eg. 'train', 'val', 'test')transform (callable, optional): transformation to perform on theinput imagetarget_transform (callable, optional): transformation to perform on thetarget `annotation`(eg: take in caption string, return tensor of word indices)dataset_name (string, optional): which dataset to load(default: 'VOC2007')"""def __init__(self, root, img_size,image_sets=[('2007', 'trainval'), ('2012', 'trainval')],transform=None, target_transform=VOCAnnotationTransform(),dataset_name='VOC0712', mosaic=False):self.root = root # str. 数据路径: path_to_dir + "/VOCdevkit/"self.img_size = img_size # int. 640self.image_set = image_sets # list. [('2007', 'trainval'), ('2012', 'trainval')]self.transform = transform # image transform: resize, -meanself.target_transform = target_transform #self.name = dataset_name # str. VOC0712self._annopath = osp.join('%s', 'Annotations', '%s.xml')self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg')self.ids = list() # 保存参与训练的图片路径self.mosaic = mosaicfor (year, name) in image_sets: # [('2007', 'trainval'), ('2012', 'trainval')]rootpath = osp.join(self.root, 'VOC' + year) # VOC2007for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')): # trainval.txtself.ids.append((rootpath, line.strip())) # line: str. strip():删除前后空格或者换行符的字符串def __getitem__(self, index):"""img: tensor. rgb. (c,h,w). 缩放后的图像.gt: numpy. (num_bbox,5). 相对于原图归一化的坐标加类别信息[xmin/w, ymin/h, xmax/w, ymax/h, label_ind]eg. [[0.524,0.56,0.646,0.90133333,8], [], ...]h: 原图高度w:return:im, gt"""im, gt, h, w = self.pull_item(index)return im, gtdef __len__(self):return len(self.ids)def pull_item(self, index):"""return:img: tensor. (c,h,w). 缩放后的图像.target: numpy. (num_bbox,5). 相对于原图归一化的坐标加类别信息[xmin/w, ymin/h, xmax/w, ymax/h, label_ind][[0.524,0.56,0.646,0.90133333,8], [], ...]height: 原图高度width:"""img_id = self.ids[index]# Parse XML document into element tree. Return root element of this tree.target = ET.parse(self._annopath % img_id).getroot() # 标注信息的根节点img = cv2.imread(self._imgpath % img_id)height, width, channels = img.shapeif self.target_transform is not None:# # [[xmin, ymin, xmax, ymax, label_ind], ... ] 缩放后的位置信息加类别信息target = self.target_transform(target, width, height)# mosaic augmentation 镶嵌增强,即缩放多张图片并拼接一张图片。if self.mosaic and np.random.randint(2):return self.mosaic_augmentation(img=img, target=target, index=index)# basic augmentation(SSDAugmentation or BaseTransform)if self.transform is not None:# check labelsif len(target) == 0: # 如果图片中没有任何目标,则生成全0标注信息target = np.zeros([1, 5]) # 类别设置为0不要紧,因为只计算有目标的类别损失else:target = np.array(target) # (3, 5). list to numpy. 每一行是一个目标信息# resize img, and -mean. 其中boxes(相对大小,不需要变)和labels没有改变img, boxes, labels = self.transform(img, boxes=target[:, :4], labels=target[:, 4])# to rgbimg = img[:, :, (2, 1, 0)]# img = img.transpose(2, 0, 1)target = np.hstack((boxes, np.expand_dims(labels, axis=1))) # (3,4) + (3,1) -> (3,5)return torch.from_numpy(img).permute(2, 0, 1), target, height, width# return torch.from_numpy(img), target, height, widthdef pull_image(self, index):'''Returns the original image object at index in PIL formNote: not using self.__getitem__(), as any transformations passed incould mess up this functionality.Argument:index (int): index of img to showReturn:PIL img'''img_id = self.ids[index]return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR), img_iddef pull_anno(self, index):'''Returns the original annotation of image at indexNote: not using self.__getitem__(), as any transformations passed incould mess up this functionality.Argument:index (int): index of img to get annotation ofReturn:list: [img_id, [(label, bbox coords),...]]eg: ('001718', [('dog', (96, 13, 438, 332))])'''img_id = self.ids[index]anno = ET.parse(self._annopath % img_id).getroot()gt = self.target_transform(anno, 1, 1)return img_id[1], gtclass BaseTransform:def __init__(self, size, mean):self.size = sizeself.mean = np.array(mean, dtype=np.float32)def __call__(self, image, boxes=None, labels=None):x = cv2.resize(image, (self.size[0], self.size[1])).astype(np.float32)x -= self.meanreturn x, boxes, labelsif __name__ == "__main__":img_size = 640# datasetdataset = VOCDetection(VOC_ROOT, img_size, [('2007', 'trainval')],transform=BaseTransform(size=[img_size, img_size], mean=(0, 0, 0)), # resize, -meantarget_transform=VOCAnnotationTransform(),mosaic=True)for i in range(1000):im, gt, h, w = dataset.pull_item(i) # img:rgb缩放后的图像(c,h_r,w_r); gt: 标注信息(num_bbox,5); h和w原始大小img = im.permute(1, 2, 0).numpy()[:, :, (2, 1, 0)].astype(np.uint8) # rgb to bgrcv2.imwrite('-1.jpg', img)img = cv2.imread('-1.jpg')for box in gt: # 一张图像中所有的标注框xmin, ymin, xmax, ymax, cls_idx = box # 相对于原图归一化的位置信息,乘以缩放后的大小,就获得相对于缩放后的位置xmin *= img_sizeymin *= img_sizexmax *= img_sizeymax *= img_sizecv2.rectangle(img, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (0, 0, 255), 2)cv2.putText(img, VOC_CLASSES[int(cls_idx)], (int(xmin), int(ymin)), fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=0.8, color=(255, 0, 0), thickness=2)cv2.imshow('gt', img)cv2.waitKey(0)
3. mosaic增强 (可选操作)
这部分不是重点,且下面写法费解,可忽略。后续有时间则实现一个简易版。
def mosaic_augmentation(self, img, index, target):ids_list_ = self.ids[:index] + self.ids[index + 1:]# random sample 3 indexsid2, id3, id4 = random.sample(ids_list_, 3)ids = [id2, id3, id4]img_lists = [img]tg_lists = [target]for id_ in ids:img_ = cv2.imread(self._imgpath % id_)height_, width_, channels_ = img_.shapetarget_ = ET.parse(self._annopath % id_).getroot()target_ = self.target_transform(target_, width_, height_)img_lists.append(img_)tg_lists.append(target_)mosaic_img = np.zeros([self.img_size * 2, self.img_size * 2, img.shape[2]], dtype=np.uint8)# mosaic centeryc, xc = [int(random.uniform(-x, 2 * self.img_size + x)) for x in[-self.img_size // 2, -self.img_size // 2]]mosaic_tg = []for i in range(4):img_i, target_i = img_lists[i], tg_lists[i]h0, w0, _ = img_i.shape# resize image to img_sizer = self.img_size / max(h0, w0)if r != 1: # always resize down, only resize up if training with augmentationimg_i = cv2.resize(img_i, (int(w0 * r), int(h0 * r)))h, w, _ = img_i.shape# place img in img4if i == 0: # top leftx1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image)x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image)elif i == 1: # top rightx1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, self.img_size * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2: # bottom leftx1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(self.img_size * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3: # bottom rightx1a, y1a, x2a, y2a = xc, yc, min(xc + w, self.img_size * 2), min(self.img_size * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)mosaic_img[y1a:y2a, x1a:x2a] = img_i[y1b:y2b, x1b:x2b]padw = x1a - x1bpadh = y1a - y1b# labelstarget_i = np.array(target_i)target_i_ = target_i.copy()if len(target_i) > 0:# a valid target, and modify it.target_i_[:, 0] = (w * (target_i[:, 0]) + padw)target_i_[:, 1] = (h * (target_i[:, 1]) + padh)target_i_[:, 2] = (w * (target_i[:, 2]) + padw)target_i_[:, 3] = (h * (target_i[:, 3]) + padh)mosaic_tg.append(target_i_)if len(mosaic_tg) == 0:mosaic_tg = np.zeros([1, 5])else:mosaic_tg = np.concatenate(mosaic_tg, axis=0)# Cutout/Clip targetsnp.clip(mosaic_tg[:, :4], 0, 2 * self.img_size, out=mosaic_tg[:, :4])# normalizemosaic_tg[:, :4] /= (self.img_size * 2)# augmentmosaic_img, boxes, labels = self.transform(mosaic_img, mosaic_tg[:, :4], mosaic_tg[:, 4])# to rgbmosaic_img = mosaic_img[:, :, (2, 1, 0)]# img = img.transpose(2, 0, 1)mosaic_tg = np.hstack((boxes, np.expand_dims(labels, axis=1)))scale = np.array([[1., 1., 1., 1.]])offset = np.zeros([1, 4])return torch.from_numpy(mosaic_img).permute(2, 0, 1).float(), mosaic_tg, self.img_size, self.img_size
VOC Dataset相关推荐
- matlab mobilenet v2,MobileNetV2-SSDLite代码分析-6 VOC Dataset
class VOCDataset: 初始化 主要是确定了一下各个文件的目录,确定了class_names def __init__(self, root, transform=None, target ...
- PASCAL VOC DATASET
PASCAL VOC为图像识别和分类提供了一整套标准化的优秀的数据集,从2005年到2012年每年都会举行一场图像识别challenge.该挑战的主要目的是识别真实场景中一些类别的物体.在该挑战中,这 ...
- pascal行人voc_PASCAL VOC DATASET
PASCAL VOC为图像识别和分类提供了一整套标准化的优秀的数据集,从2005年到2012年每年都会举行一场图像识别challenge.该挑战的主要目的是识别真实场景中一些类别的物体.在该挑战中,这 ...
- 目标检测-VOC数据集txt文件制作方法
个人微信公众号:AI研习图书馆,欢迎关注~ 深度学习知识及资源分享,学习交流,共同进步~ VOC数据集中txt文件的制作方法 1.引言 本文介绍两种VOC数据集txt文件生成方法,一种是Python实 ...
- VOC和COCO数据集的介绍和转换
作者:RayChiu_Labloy 版权声明:著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处 VOC目录结构: └── VOCdevkit #根目录└── VOC2012 #不同年 ...
- 论文翻译 DOTA:A Large-scale Dataset for Object Detection in Aerial Images
简介:武大遥感国重实验室-夏桂松和华科电信学院-白翔等合作做的一个航拍图像数据集 摘要: 目标检测是计算机视觉领域一个重要且有挑战性的问题.虽然过去的十几年中目标检测在自然场景已经有了较重要的成就 ...
- pytorch炼金术-DataSet-PASCAL VOC 简介
PASCAL VOC 数据集简介 数据集在语义分割上SOTA模型 1. 简介 1.1 简介 该挑战赛的竞赛项目主要包括 图像分类与检测(Classification/Detection Competi ...
- 数据集学习笔记(四):VOC转COCO数据集并据txt中图片的名字批量提取对应的图片并保存到另一个文件夹
文章目录 转换代码 根据名字将图片保存在另一个文件夹 转换代码 import os import random import shutil import sys import json import ...
- pytorch读取VOC数据集
简单介绍VOC数据集 首先介绍下VOC2007数据集(下图是VOC数据集格式,为了叙述方便,我这里只放了两张图像) Main文件夹内的trainval.txt中的内容如下:存储了图像的名称不加后缀. ...
最新文章
- 云计算之路-试用Azure:数据库备份压缩文件在虚拟机上的恢复速度测试
- php 检查字符串类型,PHP之字符串类型与检验
- matplotlib plot 分组_小白学 Python 数据分析(16):Matplotlib(一)坐标系
- Oleans集群之Consul再解释
- Loj #6077. 「2017 山东一轮集训 Day7」逆序对
- excel文件存入mysql_解析excel文件并将数据导入到数据库中
- pcr mix试剂选购相关问题
- kali永久获取root权限
- 获取设备唯一编号替代IMEI新方案
- 哈工大C语言程序设计精髓第十三周
- 苹果屏幕尺寸_iPhone 12屏幕维修价格出炉,买得起伤不起?
- 华为云鲲鹏服务器部署文档-修正版-CentOS+java微服务开发
- Pr 视频效果:视频
- 任务栏出现两个重复图标的解决办法
- 软件项目管理MOOC(北邮)——第一章测试答案
- 远程桌面启动安卓模拟器
- Ubuntu 16.04解决双系统时间冲突问题
- [2019长沙长郡中学集训]加法
- 数字化转型, ERP加速衰落 or 激流勇进?
- fortran和matlab编程的区别,fortran和matlab
热门文章
- textstudio编辑器缩放
- linux关闭磁盘缓存,在linux上禁用apache2的所有磁盘缓存
- 企业内部专用,企业培训系统源码
- do with与deal with用法
- 条码软件如何制作GS1-128条形码
- Aspects源码解析
- SpringCloud学习Day1
- 【Keras】TensorFlow Serving
- brew install node 报错:Error: No such file or directory @ dir_chdir Bottle installation failed:
- STM32 DIY USB键盘Ⅱ之硅胶键盘