👨💻个人简介: 深度学习图像领域工作者
🎉总结链接:
链接中主要是个人工作的总结,每个链接都是一些常用demo,代码直接复制运行即可。包括:
📌1.工作中常用深度学习脚本
📌2.torch、numpy等常用函数详解
📌3.opencv 图片、视频等操作
📌4.个人工作中的项目总结(纯干活)
链接: https://blog.csdn.net/qq_28949847/article/details/128552785
🎉视频讲解: 以上记录,通过B站等平台进行了视频讲解使用,可搜索 ‘Python图像识别’ 进行观看
B站:Python图像识别
抖音:Python图像识别
西瓜视频:Python图像识别
1. 数据下载
COCO2017数据集官方下载:
http://images.cocodataset.org/annotations/annotations_trainval2017.zip
http://images.cocodataset.org/zips/val2017.zip
http://images.cocodataset.org/zips/train2017.zip
COCO整个人体关键点数据集:
github: https://github.com/jin-s13/COCO-WholeBody里面包含下载标签的链接,谷歌网盘下载
百度网盘下载:链接:https://pan.baidu.com/s/1OFE_6kq_0ogTXlxc7rpZdg?pwd=xhiu
里面包含coco2017数据集以及coco整体关键点
2. annotations标签文件介绍:
captions_train2017.json:图片描述训练集
captions_val2017.json:图片描述测试集
coco_wholebody_train_v1.0.json:人全体关键点训练集
coco_wholebody_val_v1.0.json:人全体关键点测试集
instances_train2017.json:目标检测、分割训练集
instances_val2017.json:目标检测、分割测试集
person_keypoints_train2017.json:关键点训练集
person_keypoints_val2017.json:关键点测试集
不过上面的json文件中的标注有些会相互之间包含,比如person_keypoints_train2017.json关键点数据集中,也会有seg和box的标注信息。
3. json文件格式介绍
上面这些json文件的格式大体上都是一样的,都包含以下5个字段:
{
"info": info,
"licenses": [license],
"images": [image],
"annotations": [annotation],
"categories": [categories]
}
info:
是一个字典,包含了数据集的年份、版本、作者,以及描述等信息
licenses:
是一个list,包含了数据集的发布证书信息,由于有多个证书,将它们的信息以序列表的形式进行存储,序列表中每个证书的存储形式是一样的
images:
是一个list,包含了图像信息,由于有多张图像,将它们的信息以序列表的形式进行存储,序列表中每张图像信息的存储形式是一样的
这三个字段在每个json中都是一样的格式,但是annotations和categories,每个json会根据不同标注内容会不一样。
最重要的是 images
字段,它里面包含了图片的 file_name、 height、 width
这三个信息,会经常用到。
images
字段的内容是个列表,包含了所有的图片
上面3个字段的示例:
"info": {
"description": "COCO-WholeBody",
"url": "https://github.com/jin-s13/COCO-WholeBody",
"version": "1.0",
"year": "2020",
"date_created": "2020/07/01"
},
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License"
}],
"images": [
{
"license": 3,
"file_name": "000000391895.jpg",
"coco_url": "http://images.cocodataset.org/train2017/000000391895.jpg",
"height": 360,
"width": 640,
"date_captured": "2013-11-14 11:18:45",
"flickr_url": "http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg",
"id": 391895
},
{
"license": 4,
"file_name": "000000522418.jpg",
"coco_url": "http://images.cocodataset.org/train2017/000000522418.jpg",
"height": 480,
"width": 640,
"date_captured": "2013-11-14 11:38:44",
"flickr_url": "http://farm1.staticflickr.com/1/127244861_ab0c0381e7_z.jpg",
"id": 522418
}]
3.1 Keypoint Detection 人全体关键点 annotations和categories 介绍:
3.2 Keypoint Detection(17个关键点检测) annotations和categories 介绍:
一个图像包干若干对象,一个对象对应一个字典,一个字典注释包含对象注释的所有数据(包括id、bbox等)
”keypoints“字段的value是一个长度为3 * n 的数组,其中n是类别定义的关键点总数(例如人体姿态关键点的n为17)。每个关键点都由3个值组成(x, y, v),其中v=0表示未标记,此时x=y=0;v=1时表示标记,但不可见,不可见的原因在于被遮挡了;v=2时表示标记且可见。
annotation是个列表,里面是好多的字典,每个字典是一个标注对象,注意这个字典中包含的是图片中的一个对象的信息,不是整张图片的所有对象的信息。
形式如下:
annotation[{
"segmentation" : [] # 分割信息
"keypoints" : [x1,y1,v1,x2,y2,v2,...], # 共 3 * 17 个数值
"num_keypoints" : int, # v=1,2的关键点的个数,即有标记的关键点个数
"area" : float, # 标注对象的面积
"iscrowd" : int, #
"image_id" : int, #
"bbox" : [x1, y1, w, h], # 标注对象的box框
},...]
categories[{
"keypoints" : [str], # 长度为17的关键点名字符串,关键点的标注顺序在这里能体现出来
"skeleton" : [edge], # 关键点的连通性,主要是通过一组关键点边缘队列表的形式表示,用于可视化.
}]
4. 相关脚本操作
4.1 coco_wholebody_train_v1.0.json
将coco_wholebody_train_v1.0.json全身数据集中的人的box和人脸的右眼中心、左眼中心、鼻子、右嘴边、左嘴边提取出来,并保存为txt文件(数据未进行归一化,其中的左右是相对标注图片来的),用来同时训练人的box和人脸关键点,脚本如下:
from pycocotools.coco import COCO
import cv2
import os
import numpy as np
import random
def show_img(yolo_res):
# 测试图片路径
# if img_name != '000000004441.jpg':
# continue
img_name = yolo_res[0]
print(img_name)
cls = yolo_res[1]
width = yolo_res[2]
height = yolo_res[3]
print(width, height)
x1 = yolo_res[4]
y1 = yolo_res[5]
width_box = yolo_res[6]
height_box = yolo_res[7]
right_eye_x = yolo_res[8]
right_eye_y = yolo_res[9]
right_eye_value = yolo_res[10]
left_eye_x = yolo_res[11]
left_eye_y = yolo_res[12]
left_eye_value = yolo_res[13]
nose_x = yolo_res[14]
nose_y = yolo_res[15]
nose_value = yolo_res[16]
right_mouse_x = yolo_res[17]
right_mouse_y = yolo_res[18]
right_mouse_value = yolo_res[19]
left_mouse_x = yolo_res[20]
left_mouse_y = yolo_res[21]
left_mouse_value = yolo_res[22]
im = cv2.imread(os.path.join(cocoRoot, img_name))
cv2.circle(im, (int(right_eye_x), int(right_eye_y)), 1, (0, 255, 0), 2)
cv2.circle(im, (int(left_eye_x), int(left_eye_y)), 1, (0, 255, 0), 2)
cv2.circle(im, (int(nose_x), int(nose_y)), 1, (0, 255, 0), 2)
cv2.circle(im, (int(right_mouse_x), int(right_mouse_y)), 1, (0, 255, 0), 2)
cv2.circle(im, (int(left_mouse_x), int(left_mouse_y)), 1, (0, 255, 0), 2)
print(left_mouse_value, right_mouse_value, nose_value, left_mouse_value, left_eye_value, right_eye_value)
# 画矩形
cv2.rectangle(im, (int(x1), int(y1)), (int(x1) + int(width_box), int(y1) + int(height_box)), (0, 255, 0), 2)
cv2.imshow('111', im)
cv2.waitKey(0)
'''
形成如下格式的TXT文件, eye: 是眼的中心
[img_name, cls, width, height, x1, y1, width_box, height_box,
right_eye_x, right_eye_y, right_eye_value,
left_eye_x, left_eye_y, left_eye_value,
nose_x, nose_y, nose_value,
right_mouse_x, right_mouse_y, right_mouse_value,
left_mouse_x, left_mouse_y, left_mouse_value]
数据未进行归一化处理
'''
if __name__ == '__main__':
ann_file = r'E:\lg\coco2017\annotations\coco_wholebody_train_v1.0.json'
# 图片路径
cocoRoot = r'E:\lg\coco2017\train2017'
# 生成的txt保存路径
save_path = r'E:\lg\coco2017\annotations\box_facekpts'
coco = COCO(ann_file)
# 遍历所有的图片
for imgID in coco.getImgIds():
imgInfo = coco.loadImgs(imgID)[0]
print(f'图像{imgID}的信息如下:\n{imgInfo}')
img_name = imgInfo['file_name']
imPath = os.path.join(cocoRoot, img_name)
im = cv2.imread(imPath)
height = imgInfo['height']
width = imgInfo['width']
annIds = coco.getAnnIds(imgIds=imgID)
anns = coco.loadAnns(annIds)
img_lst_kpts = []
img_shapes = []
for ann in anns:
bbox = ann['bbox']
face_kpts = ann['face_kpts']
imgId = ann['image_id']
all_lst_kpts = []
lst_kpts = []
for i, kpts in enumerate(face_kpts):
lst_kpts.append(kpts)
if len(lst_kpts) == 3:
all_lst_kpts.append(lst_kpts)
lst_kpts = []
img_shapes.append({'bbox': bbox, 'face_kpts': all_lst_kpts})
print('img_shapes:', img_shapes)
if len(img_shapes) != 0:
f_txt = open(os.path.join(save_path, os.path.splitext(img_name)[0] + '.txt'), 'w')
# [36, 39, 42, 45, 30, 48, 54]: 索引对应right_eye left_eye nose mouth_right mouth_left
for box_face in img_shapes:
cls = 0
box = box_face['bbox']
x1 = box_face['bbox'][0]
y1 = box_face['bbox'][1]
width_box = box_face['bbox'][2]
height_box = box_face['bbox'][3]
all_lst_kpts = box_face['face_kpts']
# 左右眼是通过左右两个眼角计算得到的
right_eye_x = int(all_lst_kpts[36][0] + ((all_lst_kpts[39][0] - all_lst_kpts[36][0]) / 2))
right_eye_y = int(all_lst_kpts[36][1] + ((all_lst_kpts[39][1] - all_lst_kpts[36][1]) / 2))
right_eye_value = all_lst_kpts[36][2]
left_eye_x = int(all_lst_kpts[42][0] + ((all_lst_kpts[45][0] - all_lst_kpts[42][0]) / 2))
left_eye_y = int(all_lst_kpts[42][1] + ((all_lst_kpts[45][1] - all_lst_kpts[42][1]) / 2))
left_eye_value = all_lst_kpts[42][2]
nose_x = int(all_lst_kpts[30][0])
nose_y = int(all_lst_kpts[30][1])
nose_value = all_lst_kpts[30][2]
right_mouse_x = int(all_lst_kpts[48][0])
right_mouse_y = int(all_lst_kpts[48][1])
right_mouse_value = all_lst_kpts[48][2]
left_mouse_x = int(all_lst_kpts[54][0])
left_mouse_y = int(all_lst_kpts[54][1])
left_mouse_value = all_lst_kpts[54][2]
yolo_res = [img_name, cls, width, height, x1, y1, width_box, height_box,
right_eye_x, right_eye_y, right_eye_value,
left_eye_x, left_eye_y, left_eye_value,
nose_x, nose_y, nose_value,
right_mouse_x, right_mouse_y, right_mouse_value,
left_mouse_x, left_mouse_y, left_mouse_value]
yolo_res = [str(item) for item in yolo_res]
# 进行可视化展示,验证上面的索引取值是否正确
# show_img(yolo_res)
f_txt.write(' '.join(yolo_res) + '\n')
f_txt.close()
保存的格式为:每张图片对应一个txt文件,如下图:
txt的内容形式为:
4.2 instances_train2017.json, 将coco中的bbox,转为VOC格式的bbox
from pycocotools.coco import COCO
import cv2
from lxml import etree
import os
from tqdm import tqdm
class labelimg_Annotations_xml:
def __init__(self, folder_name, filename, path, database="Unknown"):
self.root = etree.Element("annotation")
child1 = etree.SubElement(self.root, "folder")
child1.text = folder_name
child2 = etree.SubElement(self.root, "filename")
child2.text = filename
child3 = etree.SubElement(self.root, "path")
child3.text = path
child4 = etree.SubElement(self.root, "source")
child5 = etree.SubElement(child4, "database")
child5.text = database
def set_size(self, width, height, channel):
size = etree.SubElement(self.root, "size")
widthn = etree.SubElement(size, "width")
widthn.text = str(width)
heightn = etree.SubElement(size, "height")
heightn.text = str(height)
channeln = etree.SubElement(size, "channel")
channeln.text = str(channel)
def set_segmented(self, seg_data=0):
segmented = etree.SubElement(self.root, "segmented")
segmented.text = str(seg_data)
def set_object(self, label, x_min, y_min, x_max, y_max,
pose='Unspecified', truncated=0, difficult=0):
object = etree.SubElement(self.root, "object")
namen = etree.SubElement(object, "name")
namen.text = label
posen = etree.SubElement(object, "pose")
posen.text = pose
truncatedn = etree.SubElement(object, "truncated")
truncatedn.text = str(truncated)
difficultn = etree.SubElement(object, "difficult")
difficultn.text = str(difficult)
bndbox = etree.SubElement(object, "bndbox")
xminn = etree.SubElement(bndbox, "xmin")
xminn.text = str(x_min)
yminn = etree.SubElement(bndbox, "ymin")
yminn.text = str(y_min)
xmaxn = etree.SubElement(bndbox, "xmax")
xmaxn.text = str(x_max)
ymaxn = etree.SubElement(bndbox, "ymax")
ymaxn.text = str(y_max)
def savefile(self, filename):
tree = etree.ElementTree(self.root)
tree.write(filename, pretty_print=True, xml_declaration=False, encoding='utf-8')
if __name__ == '__main__':
ann_file = r'E:\lg\coco2017\annotations\instances_train2017.json'
# 图片路径
cocoRoot = r'E:\lg\coco2017\train2017'
# 生成的txt保存路径
save_path = r'E:\lg\coco2017\annotations\box\train'
channel = 3
coco = COCO(ann_file)
# 获取所有类别,格式: {1: 'person', 2: 'bicycle', 3: 'car', 4: 'motorcycle',...}
classes = dict()
for cat in coco.dataset['categories']:
classes[cat['id']] = cat['name']
# 遍历所有的图片
for imgID in tqdm(coco.getImgIds()):
imgInfo = coco.loadImgs(imgID)[0]
img_name = imgInfo['file_name']
imPath = os.path.join(cocoRoot, img_name)
im = cv2.imread(imPath)
height = imgInfo['height']
width = imgInfo['width']
annIds = coco.getAnnIds(imgIds=imgID)
anns = coco.loadAnns(annIds)
lst_coordis = []
for ann in anns:
label = classes[ann['category_id']]
if 'bbox' in ann:
bbox = ann['bbox']
xmin = int(bbox[0])
ymin = int(bbox[1])
xmax = int(bbox[2] + bbox[0])
ymax = int(bbox[3] + bbox[1])
lst_coordis.append([xmin, ymin, xmax, ymax, label])
anno = labelimg_Annotations_xml('JPEGImages', img_name, 'JPEGImages')
anno.set_size(width, height, channel)
anno.set_segmented()
for data in lst_coordis:
x_min, y_min, x_max, y_max, label = data
anno.set_object(label, int(x_min), int(y_min), int(x_max), int(y_max))
xml_path = os.path.join(save_path, img_name[:-4] + '.xml')
anno.savefile(xml_path)