COCO Dataset Format

COCO (Common Objects in Context) dataset数据集是一个广泛应用于目标检测、语义分割的数据集,包含330K 图片数据 与 2.5 million 个目标实体。

1.数据集下载

!wget http://images.cocodataset.org/zips/train2017.zip -O coco_train2017.zip
!wget http://images.cocodataset.org/zips/val2017.zip -O coco_val2017.zip
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O coco_ann2017.zip

 2.数据集解压缩

from zipfile import ZipFile, BadZipFile
import os

def extract_zip_file(extract_path):
    try:
        with ZipFile(extract_path+".zip") as zfile:
            zfile.extractall(extract_path)

        # remove zipfile
        zfileTOremove=f"{extract_path}"+".zip"
        if os.path.isfile(zfileTOremove):
            os.remove(zfileTOremove)
        else:
            print("Error: %s file not found" % zfileTOremove)    

    except BadZipFile as e:
        print("Error:", e)


extract_train_path = "./coco_train2017"
extract_val_path = "./coco_val2017"
extract_ann_path="./coco_ann2017"

extract_zip_file(extract_train_path)
extract_zip_file(extract_val_path)
extract_zip_file(extract_ann_path)

解压缩后,coco_train2017 与 coco_val2017文件夹下包含子文件夹train2017与val2017,各自包含有图片数据.

而coco_ann2017 子文件夹下有6个JSON 格式annotation 文件位于子文件夹annotations.

                ​​​​​​​        ​​​​​​​         

3.数据集格式

随意调整一个format格式文件,说明如下:

{   
    "info": 
    { "description": "COCO 2017 Dataset",
        "url": "http://cocodataset.org",
        "version": "1.0",
        "year": 2017,
        "contributor": "COCO Consortium",
        "date_created": "2017/09/01"
    },
    "licenses": 
    [
        {"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/","id": 1,"name": "Attribution-NonCommercial-ShareAlike License"},
        {"url": "http://creativecommons.org/licenses/by-nc/2.0/","id": 2,"name": "Attribution-NonCommercial License"},
        ...,
        ...
    ],
        
    "images":     
    [
        {"license": 4,"file_name": "000000397133.jpg","coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg","height": 427,"width": 640,"date_captured": "2013-11-14 17:02:52","flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg","id": 397133},
        {"license": 1,"file_name": "000000037777.jpg","coco_url": "http://images.cocodataset.org/val2017/000000037777.jpg","height": 230,"width": 352,"date_captured": "2013-11-14 20:55:31","flickr_url": "http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg","id": 37777},
        ...,
        ...
    ],
    "annotations": 
    [
        {"segmentation": [[510.66,423.01,511.72,...,...]],"area": 702.1057499999998,"iscrowd": 0,"image_id": 289343,"bbox": [473.07,395.93,38.65,28.67],"category_id": 18,"id": 1768},
        {"segmentation": [[289.74,443.39,302.29,...,...]],"area": 27718.476299999995,"iscrowd": 0,"image_id": 61471,"bbox": [272.1,200.23,151.97,279.77],"category_id": 18,"id": 1773}
        ...,
        ...
    ],
    "categories": 
    [
        {"supercategory": "person","id": 1,"name": "person"},
        {"supercategory": "vehicle","id": 2,"name": "bicycle"},
        ...,
        ...
    ]
}

info:

The “info” component provides metadata about the COCO dataset, including the version number, the date it was created, and the contact information for the creators of the dataset.

"info": 
{ "description": "COCO 2017 Dataset",
    "url": "http://cocodataset.org",
    "version": "1.0",
    "year": 2017,
    "contributor": "COCO Consortium",
    "date_created": "2017/09/01"
}

licenses

  • "license": an integer value indicating the license type of the image. This value corresponds to the license "id" in the "licenses" component.
  • "file_name": a string containing the name of the image file.
  • "coco_url": a string containing a URL to the image on the COCO website.
  • "height": an integer value representing the height of the image in pixels.
  • "width": an integer value representing the width of the image in pixels.
  • "date_captured": a string representing the date and time that the image was captured.
  • "flickr_url": a string containing a URL to the image on Flickr.
  • "id": a unique identifier for the image, as an integer value.
"licenses": 
[
    {
        "url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
        "id": 1,
        "name": "Attribution-NonCommercial-ShareAlike License"
    },
    ...,
    ...,
    ...
]

images

  • "license": an integer value indicating the license type of the image. This value corresponds to the license "id" in the "licenses" component.
  • "file_name": a string containing the name of the image file.
  • "coco_url": a string containing a URL to the image on the COCO website.
  • "height": an integer value representing the height of the image in pixels.
  • "width": an integer value representing the width of the image in pixels.
  • "date_captured": a string representing the date and time that the image was captured.
  • "flickr_url": a string containing a URL to the image on Flickr.
  • "id": a unique identifier for the image, as an integer value.
"images":     
[
    {
        "license": 4,
        "file_name": "000000397133.jpg",
        "coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg",
        "height": 427,
        "width": 640,
        "date_captured": "2013-11-14 17:02:52",
        "flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg",
        "id": 397133
    },
    ...,
    ...,
    ...       
]

annotations

  • "segmentation": The “segmentation” key in an annotation dictionary holds a list of floating point numbers that represent the pixel coordinates of an object’s segmentation mask. They can be used to plot the segmentation mask of the object on an image. To plot the mask, we need to take pairs of numbers (the first and second value, then the third and fourth, etc.) and use them as the x and y coordinates of the pixels.
  • "area": a floating point value indicating the area of the object in segmentation mask in pixels squared.
  • "iscrowd": a binary integer value indicating whether the object is part of a crowd (1) or not (0).
  • "image_id": an integer value that is a unique identifier for the image in which the object appears. This "image_id" corresponds to the "id" in "image" component.
  • "bbox": a list of four floating point values representing the bounding box of the object in the format [x, y, width, height].
  • "category_id": an integer value indicating the category or class of the object.
  • "id": a unique identifier for the annotation across the entire COCO dataset, as an integer value.
"annotations": 
[
    {
        "segmentation": [[510.66,423.01,511.72,...,...,...]],
        "area": 702.1057499999998,
        "iscrowd": 0,
        "image_id": 289343,
        "bbox": [473.07,395.93,38.65,28.67],
        "category_id": 18,
        "id": 1768
    },
    ...,
    ...,
    ...
]

categories

  • “supercategory”: a string indicating the supercategory or super class of an object. For example, in the second dictionary, “vehicle” is the supercategory of the bicycle.
  • “id”: a unique identifier for identifying the category of an object , as an integer value.
  • “name”: a string that represents the name of the category.
"categories": 
[
    {
        "supercategory": "person",
        "id": 1,
        "name": "person"
    },
    {
        "supercategory": "vehicle",
        "id": 2,
        "name": "bicycle"
    },
    ...,
    ...,
    ...
]

COCO dataset 总共定义了91 类目标,但其中只有 80类用到了.

 4.数据集解析

COCO dataset 以JSON 文件格式存储annotations,借助下属class可完成解析。

from collections import defaultdict
import json
import numpy as np
class COCOParser:
    def __init__(self, anns_file, imgs_dir):
        with open(anns_file, 'r') as f:
            coco = json.load(f)
            
        self.annIm_dict = defaultdict(list)        
        self.cat_dict = {} 
        self.annId_dict = {}
        self.im_dict = {}
        self.licenses_dict = {}
        for ann in coco['annotations']:           
            self.annIm_dict[ann['image_id']].append(ann) 
            self.annId_dict[ann['id']]=ann
        for img in coco['images']:
            self.im_dict[img['id']] = img
        for cat in coco['categories']:
            self.cat_dict[cat['id']] = cat
        for license in coco['licenses']:
            self.licenses_dict[license['id']] = license
    def get_imgIds(self):
        return list(self.im_dict.keys())
    def get_annIds(self, im_ids):
        im_ids=im_ids if isinstance(im_ids, list) else [im_ids]
        return [ann['id'] for im_id in im_ids for ann in self.annIm_dict[im_id]]
    def load_anns(self, ann_ids):
        im_ids=ann_ids if isinstance(ann_ids, list) else [ann_ids]
        return [self.annId_dict[ann_id] for ann_id in ann_ids]        
    def load_cats(self, class_ids):
        class_ids=class_ids if isinstance(class_ids, list) else [class_ids]
        return [self.cat_dict[class_id] for class_id in class_ids]
    def get_imgLicenses(self,im_ids):
        im_ids=im_ids if isinstance(im_ids, list) else [im_ids]
        lic_ids = [self.im_dict[im_id]["license"] for im_id in im_ids]
        return [self.licenses_dict[lic_id] for lic_id in lic_ids]

if __name__ == "__main__"
    coco_annotations_file="/content/coco_ann2017/annotations/instances_val2017.json"
    coco_images_dir="/content/coco_val2017/val2017"
    coco= COCOParser(coco_annotations_file, coco_images_dir)

5.数据可视化

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
# define a list of colors for drawing bounding boxes
color_list = ["pink", "red", "teal", "blue", "orange", "yellow", "black", "magenta","green","aqua"]*10
num_imgs_to_disp = 4
total_images = len(coco.get_imgIds()) # total number of images
sel_im_idxs = np.random.permutation(total_images)[:num_imgs_to_disp]
img_ids = coco.get_imgIds()
selected_img_ids = [img_ids[i] for i in sel_im_idxs]
ann_ids = coco.get_annIds(selected_img_ids)
im_licenses = coco.get_imgLicenses(selected_img_ids)
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(15,10))
ax = ax.ravel()
for i, im in enumerate(selected_img_ids):
    image = Image.open(f"{coco_images_dir}/{str(im).zfill(12)}.jpg")
    ann_ids = coco.get_annIds(im)
    annotations = coco.load_anns(ann_ids)
    for ann in annotations:
        bbox = ann['bbox']
        x, y, w, h = [int(b) for b in bbox]
        class_id = ann["category_id"]
        class_name = coco.load_cats(class_id)[0]["name"]
        license = coco.get_imgLicenses(im)[0]["name"]
        color_ = color_list[class_id]
        rect = plt.Rectangle((x, y), w, h, linewidth=2, edgecolor=color_, facecolor='none')
        t_box=ax[i].text(x, y, class_name,  color='red', fontsize=10)
        t_box.set_bbox(dict(boxstyle='square, pad=0',facecolor='white', alpha=0.6, edgecolor='blue'))
        ax[i].add_patch(rect)
    
    ax[i].axis('off')
    ax[i].imshow(image)
    ax[i].set_xlabel('Longitude')
    ax[i].set_title(f"License: {license}")
plt.tight_layout()
plt.show()

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/294391.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

【JAVA】深入了解 Java 中的 DelayQueue

🍎个人博客:个人主页 🏆个人专栏: JAVA ⛳️ 功不唐捐,玉汝于成 目录 前言 Java中的DelayQueue是一个强大的工具,常用于处理需要延迟执行的任务或具有过期时间的元素。通过实现Delayed接口&#x…

​三子棋(c语言)

前言: 三子棋是一种民间传统游戏,又叫九宫棋、圈圈叉叉棋、一条龙、井字棋等。游戏规则是双方对战,双方依次在9宫格棋盘上摆放棋子,率先将自己的三个棋子走成一条线就视为胜利。但因棋盘太小,三子棋在很多时候会出现和…

HCIP-端口隔离、arp代理、聚合vlan、QinQ

目录 一,端口隔离(同vlan间同交换机下的端口隔离技术) 端口隔离原理: 双向隔离配置 4,端口隔离特殊使用:单向隔离 6,ARP代理 6.1 路由式代理 6.2 VLAN内ARP代理 6.3 VLAN间ARP代理 6.3…

js逆向第8例:猿人学第1题-js 混淆-源码乱码

题目1:抓取所有(5页)机票的价格,并计算所有机票价格的平均值,填入答案。 老规矩打开控制台调试,出现debugger 过掉这个很简单了,右键点击“一律不在此处暂停” 这样就可以查看具体的网络请求如下: m是加密值,熟悉的大佬能发现这串加密字符非常像md5,|后面的就是时…

三分钟弄清数据传输方式

数据传输方式是指在计算机网络和通信系统中,数据如何在发送端和接收端之间进行传输和交换的方法和技术。不同的数据传输方式可以影响到数据传输的效率、安全性和可靠性,因此在实际应用中选择合适的数据传输方式至关重要。本文将从数据传输方式的基本概念…

初学编程,到底选Java还是C++?

初学编程,到底选Java还是C? 在开始前我有一些资料,是我根据网友给的问题精心整理了一份「C的资料从专业入门到高级教程」, 点个关注在评论区回复“888”之后私信回复“888”,全部无偿共享给大家!!&#x…

Excel 读写

using System.Collections; using System.Collections.Generic; using OfficeOpenXml; using System.IO; using UnityEngine; using System.Text;public class ExcelTest : MonoBehaviour {void Start(){string _filePath Application.streamingAssetsPath "/学生信息.x…

yolov5目标检测神经网络——损失函数计算原理

前面已经写了4篇关于yolov5的文章,链接如下: 1、基于libtorch的yolov5目标检测网络实现——COCO数据集json标签文件解析 2、基于libtorch的yolov5目标检测网络实现(2)——网络结构实现 3、基于libtorch的yolov5目标检测网络实现(3)——Kmeans聚类获取anc…

AcWing 861. 二分图的最大匹配—匈牙利算法

题目链接:AcWing 861. 二分图的最大匹配 问题描述 分析 该题是一道典型的二分图匹配模板题,求解最大匹配数,可以用匈牙利算法来解决,下面举一个例子来说明匈牙利算法是如何运行的 以该图为例,其中 1可以匹配a,c 2可以匹配a,b 3…

面试算法90:环形房屋偷盗

题目 一条环形街道上有若干房屋。输入一个数组表示该条街道上的房屋内财产的数量。如果这条街道上相邻的两幢房屋被盗就会自动触发报警系统。请计算小偷在这条街道上最多能偷取的财产的数量。例如,街道上5家的财产用数组[2,3,4,5…

亚马逊店铺遇到账号申诉模版分享

1.表达诚意,先认错再说:我知道,最近我们在Amazon.com上作为卖家的表现已经低于亚马逊和我们自己的质量标准。 2.清楚分明的格式:我们库存管理的混乱导致了延迟发货,更糟糕的是,物品无法使用。当延迟发货和…

T527 Android 13 编译步骤

步骤1: cd longan./build.sh config (0 2 1) 选择 Android 平台: 步骤2:选择IC为t527: 步骤3:板子类型选为demo_car: 步骤4:选择 flash,默认选择 default 则可: 步骤5&…

性能优化-OpenMP基础教程(四)-Android上运行OpenMP

本文主要介绍如何在一个常规的Android手机上调试OpenMP程序,包括Android NDK的环境配置和使用JNI编写一个OpenMP程序运行在Android手机中。 🎬个人简介:一个全栈工程师的升级之路! 📋个人专栏:高性能&#…

stable diffusion 人物高级提示词(三)动作、表情、眼神

一、动作 中文英文站立Standing走路Walking身体前倾Leaning Forward鞠躬Bowing战斗姿势Fighting Stance单腿站立Standing on One Leg坐在椅子上Sitting on a Chair手叉腰Hand on Hip手插兜Hand in Pocket双臂交叉Crossed Arms翘二郎腿Crossed Legs跪地Kneeling双手举起来Hands…

C# .Net学习笔记—— 异步和多线程(异常处理)

一、异常处理 1、下面for循环20个线程&#xff0c;到11&#xff0c;12号的时候执行失败&#xff0c;这里我也用了try catch来捕获异常。 private void button11_Click(object sender, EventArgs e){TaskFactory taskFactory new TaskFactory();List<Task> taskList ne…

湖仓架构的演进

1.数据仓库架构的历史演进 起初&#xff0c;业界数据处理首选方式是数仓架构。通常数据处理的流程是把一些业务数据库&#xff0c;通过ETL的方式加载到Data Warehouse中&#xff0c;再在前端接入一些报表或者BI的工具去展示。 数据仓库概念是 Inmon 于 1990 年提出并给出了完…

文献综述方法论|全文翻译

最常见的错误是文献综述往往未能为该领域提供真正有价值的贡献。无论综述文章多么优秀和严谨&#xff0c;如果它没有提供足够的新内容&#xff0c;就不会被发表。太常见的情况是&#xff0c;文献综述只是对特定年份之间进行的研究进行描述性总结&#xff0c;描述了诸如发表的文…

聚会小游戏+摇色子+愤怒的大叔+真心话太冒险微信小程序源码系统:活跃气氛神器 带完整的安装包以及搭建教程

在现代社交活动中&#xff0c;如何快速破冰并调动气氛一直是人们关注的焦点。微信小程序以其便捷性、互动性和多样性成为了解决这一问题的理想工具。今天&#xff0c;小编将为大家介绍一款集聚会小游戏、摇色子、真心话大冒险等功能于一身的微信小程序源码系统——“活跃气氛神…

Leetcode13-解密消息(2325)

1、题目 给你字符串 key 和 message &#xff0c;分别表示一个加密密钥和一段加密消息。解密 message 的步骤如下&#xff1a; 使用 key 中 26 个英文小写字母第一次出现的顺序作为替换表中的字母 顺序 。 将替换表与普通英文字母表对齐&#xff0c;形成对照表。 按照对照表 …

[C#]使用OpenCvSharp实现区域文字提取

【官方框架地址】 github.com/shimat/opencvsharp 【算法介绍】 采用opencv算法实现文字区域提取&#xff0c;步骤如下&#xff1a; &#xff08;1&#xff09;形态学操作 &#xff08;2&#xff09;查找轮廓 &#xff08;3&#xff09;筛选那些面积小的 &#xff08;4&#…