去年我处理过的最后一个版本是yolov10.新年再次着手处理视频识别的工作。发现自清华的v10之后,去年下半年v11再次发布了。国内访问github有些问题,但是yolo网站是可以方便访问的:
Train - Ultralytics YOLO Docs
国外的知识库做的很棒,没事就可以翻翻。周末对工作中的一些数据进行标注后,今天开始搭建training环境。yolo的训练环境,现在推荐的还是PyTorch,PyTorch有一个快速的安装指南:
Start Locally | PyTorch
通过选择你的操作系统、包管理器、编译语言以及显卡类型,最下面一行可以直接生成要执行的安装指令,非常的便捷。
1.示例数据集及训练结果
1.1极简的月饼数据集 - 不到500张图片
仅仅为了演示Yolo的训练过程,我选择了一个非常小的训练集MoonPie:
【免费】月饼数据集【训练集273】验证集31_yolov10训练自己的模型资源-CSDN文库
它只有不到500张图片和标注,它是标准的yolo标注格式的数据集,可以用来快速跑通整个训练和识别过程。相关博主是可以关注的。
1.2 训练模型 - yolo11n 30遍总耗时1小时
训练部分,我选用了最快的yolo11-n,因为只是为了确认整个训练链路,我只跑了30遍:
1.2.1 收敛速度很快:在第10轮的侦测效果已经可以接受。
注意,我是选择性地遗忘了既有的80种物体的识别效果,专注于特定的对象。第81号对象本身。在非项目型构建时,这种方法似乎更优,得到的.pt文件尺寸也变少了。因为多余的物品识别不再需要了。
1.2.2 30遍耗时1小时:各轮时间消耗(in minutes)
然后30轮训练的耗时(1小时),仅仅用CPU作了训练(i7-13700H)
1.3 训练结果
可以,对吧?注意最后一行第二张图片,cake117.jpg有一个识别错误(苹果识别成了月饼),然后既有的一些识别能力,比如第二行最右侧,第四行左一图中的调羹已经遗忘了。工程训练一定记得你的训练集所涉及的标注必须先跑一遍已有的标注。新的增量标注和既有的识别标注要融合起来,再处理。
1.4 自行使用训练结果识别位置图片
代码参见:附录A 自制识别程序
2.最小训练Python脚本集
yolo可以从命令行进行训练,如果经常执行一些类似的训练任务,可以将其部分参数脚本化:
2.1 训练主脚本 train.py
from ultralytics import YOLO
#load model
model=YOLO('yolo11.yaml').load('yolo11n.pt')
result = model.train(data='moonpie.yaml', epochs=30, batch=16, imgsz=640, device='cpu')
print(f'{result}')
2.2 训练使用的yolo11环境 yolo11.yaml
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
# Ultralytics YOLO11 object detection model with P3/8 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolo11
# Task docs: https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
2.3 训练用的moonpie数据集描述文件
注意,这里我们用了新增一个分类的方法:
train: 'D:\git\yolo\MoonCake_datasets\images\train'
val: 'D:\git\yolo\MoonCake_datasets\images\val'
nc: 81
names: [
"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard",
"cell phone", "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
"teddy bear", "hair drier", "toothbrush", "moonpie"
]
附录A 自制识别程序
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# 获取当前脚本文件所在目录的父目录,并构建相对路径
import os
import sys
from ultralytics.engine.results import Results
import torch
import cv2
current_dir = os.path.dirname(os.path.abspath(__file__))
project_path = os.path.join(current_dir, '..')
sys.path.append(project_path)
sys.path.append(current_dir)
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
from ultralytics import YOLO
import os
import torch
def detect(image_path=None, class_ids=[]):
# Load a model
model = YOLO("best.pt")
if image_path is None:
image_path = r'D:\git\yolo\MoonCake_datasets\images\val\Cake280.jpg'
# Predict with the model
results = model(image_path) # predict on an image
#results1 = filter_results(results, class_ids)
# 获取标注后的图像
annotated_image = results[0].plot()
import cv2
# 获取输入图片的文件名(不带路径)
image_name = os.path.basename(image_path)
# 去掉文件扩展名
image_name_without_ext = os.path.splitext(image_name)[0]
# 生成保存路径(当前目录下)
save_path = f"{image_name_without_ext}_annotated.jpg"
cv2.imwrite(save_path, annotated_image)
# 保存标注后的图像
cv2.imshow("Annotated Image", annotated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
import sys
if __name__ == "__main__":
#pass #remember this is escape code.
print("脚本名称:", sys.argv[0])
print("传递的参数:", sys.argv[1:])
# 获取第一个参数
array1 = []
if len(sys.argv) > 1:
first_param = sys.argv[1]
if len(sys.argv)>2:
# 获取命令行参数
array_str = sys.argv[2]
# 将字符串转换为数组(列表)
array1 = [int(x.strip()) for x in array_str.split(",")]
print(array1)
detect(first_param, array1)
else:
print("没有传递参数")
detect()
附录B 原始的yolo11n.pt和训练后的bast.pt对比:
尺寸无较大变化
附录C 统计训练时间图表绘制程序
这是1.2.2那张图的源代码,直接取了训练的result.csv文件。修改一下,其实可以很方便地配置为返回最后一次的训练时间消耗。
import pandas as pd
import matplotlib.pyplot as plt
# 读取CSV文件
file_path = r'D:\git\yolo\ultralytics\runs\detect\train4\results.csv' # 替换为你的文件路径
data = pd.read_csv(file_path)
# 提取epoch和时间数据
epochs = data['epoch']
times = data['time']
times = times/60.0
# 绘制折线图
plt.figure(figsize=(10, 6))
plt.plot(epochs, times, marker='o', linestyle='-', color='b')
plt.title('Training Time per Epoch')
plt.xlabel('Epoch')
plt.ylabel('Time (mins)')
plt.grid(True)
plt.tight_layout()
# 显示图表
plt.show()
附录D 目录结构
git代码库是从国内镜像下的:
GitCode - yolov10 mirrorGitCode是面向全球开发者的开源社区,包括原创博客,开源代码托管,代码协作,项目管理等。与开发者社区互动,提升您的研发效率和质量。https://gitcode.com/gh_mirrors/ul/ultralytics
D:\git\yolo\ultralytics就是项目顶级目录,训练不需要特别编写代码,就这样的状态已经可用了。工程化时,yolo的训练工具label-studio其实能帮你做到更多。