本文不生产技术,只做技术的搬运工!
前言
最近需要使用yolov8-cls进行模型分类任务,但是使用ultralytics框架去部署非常不方便,因此打算进行onnx或者tensorrt去部署,查看了很多网上的帖子,并没有发现有完整复现yolov8-cls前处理(不需要后处理)的"轮子",通过自己debug找到并复现了前处理代码,在这里做一下代码记录
环境配置
训练环境
ultralytics-8.2.0:https://github.com/ultralytics/ultralytics/tree/v8.2.0
推理环境
tensorrt:Tensorrt安装及分类模型tensorrt推理,并生成混淆矩阵_linux命令 tensorrt转化模型 trtexec-CSDN博客文章浏览阅读368次,点赞7次,收藏4次。分类模型使用tensorrt推理,包括tensorrt安装及推理_linux命令 tensorrt转化模型 trtexechttps://blog.csdn.net/qq_44908396/article/details/143628108onnxruntime-gpu:1.18.1
python:3.9
torch:1.13.1+cu117
推理代码框架
onnx推理:分类模型onnx推理,并生成混淆矩阵-CSDN博客文章浏览阅读148次。onnx推理分类模型https://blog.csdn.net/qq_44908396/article/details/143507869tensorrt推理:
Tensorrt安装及分类模型tensorrt推理,并生成混淆矩阵_linux命令 tensorrt转化模型 trtexec-CSDN博客文章浏览阅读368次,点赞7次,收藏4次。分类模型使用tensorrt推理,包括tensorrt安装及推理_linux命令 tensorrt转化模型 trtexechttps://blog.csdn.net/qq_44908396/article/details/143628108其实作者的这两篇博客已经搭建了好推理框架,这里我们只需要修改一下前处理代码即可
前处理分析
俗话说授人以鱼不如授人以渔,这里作者讲述一下怎样复现分类的前处理代码,纯搞工程的朋友可以跳过这一段,直接去下一段拿代码
推理demo编写
这里我们需要编写一个推理demo方便接下来的debug
from ultralytics import YOLO
model = YOLO("./ultralytics-8.2.0/runs/classify/train/weights/last.pt") # load a pretrained model (recommended for training)
results = model("/home/workspace/temp/1111/14.jpg") # predict on an image
debug
这里我们先找到ultralytics-8.2.0/ultralytics/models/yolo/classify/predict.py文件,同理,如果大家需要检测或者分割的前处理,只要把classify换成detect或者segment即可,前处理代码如下:
def preprocess(self, img):
"""Converts input image to model-compatible data type."""
if not isinstance(img, torch.Tensor):
is_legacy_transform = any(
self._legacy_transform_name in str(transform) for transform in self.transforms.transforms
)
if is_legacy_transform: # to handle legacy transforms
img = torch.stack([self.transforms(im) for im in img], dim=0)
else:
img = torch.stack(
[self.transforms(Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))) for im in img], dim=0
)
img = (img if isinstance(img, torch.Tensor) else torch.from_numpy(img)).to(self.model.device)
return img.half() if self.model.fp16 else img.float() # uint8 to fp16/32
我们将断点设置在这个函数体上,然后逐行执行,发现它走的其实是else,也就是说is_legacy_transform变量是false,那么我们只需要复现else内的语句即可,这里边的内容很简单,我们只需要知道self.transforms是个什么东西就可以了,这里我们可以通过debug监视器查看,也可以简单粗暴加打印,作者更喜欢打印
执行我们上面的demo查看打印
这里详细打印了self.transforms的内容及类型,走到这一步我们基本就知道了该如何复现yolov8-cls的前处理了,前处理代码如下:
def read_image(image_path):
src = cv2.imdecode(np.fromfile(image_path, dtype=np.uint8), cv2.IMREAD_COLOR)
img = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
# 使用 InterpolationMode.BILINEAR 指定双线性插值
transform = transforms.Compose([
transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BILINEAR, max_size=None, antialias=True),
transforms.CenterCrop(size=(224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0., 0., 0.], std=[1., 1., 1.])
])
# 将图像转换为 PIL 图像并应用变换
pil_image = Image.fromarray(img)
normalized_image = transform(pil_image)
return np.expand_dims(normalized_image.numpy(), axis=0), src
这样我们就完美复现了yolov8-cls的前处理
整体代码
onnx模型转换
yolo classify export model=runs/classify/train/weights/last.pt format="onnx"
onnx推理
import onnxruntime
import numpy as np
import os
import cv2
import argparse
import time
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import torch
from torchvision import transforms
from PIL import Image
labels = ["0", "1", "2", "3", "4", "5", "6", "7"]
def sigmoid(x):
"""Sigmoid function for a scalar or NumPy array."""
return 1 / (1 + np.exp(-x))
def getFileList(dir, Filelist, ext=None):
"""
获取文件夹及其子文件夹中文件列表
输入 dir:文件夹根目录
输入 ext: 扩展名
返回: 文件路径列表
"""
newDir = dir
if os.path.isfile(dir):
if ext is None:
Filelist.append(dir)
else:
if ext in dir:
Filelist.append(dir)
elif os.path.isdir(dir):
for s in os.listdir(dir):
newDir = os.path.join(dir, s)
getFileList(newDir, Filelist, ext)
return Filelist
def read_image(image_path):
src = cv2.imdecode(np.fromfile(image_path, dtype=np.uint8), cv2.IMREAD_COLOR)
img = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
# 使用 InterpolationMode.BILINEAR 指定双线性插值
transform = transforms.Compose([
transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BILINEAR, max_size=None, antialias=True),
transforms.CenterCrop(size=(224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0., 0., 0.], std=[1., 1., 1.])
])
# 将图像转换为 PIL 图像并应用变换
pil_image = Image.fromarray(img)
normalized_image = transform(pil_image)
return np.expand_dims(normalized_image.numpy(), axis=0), src
def load_onnx_model(model_path):
providers = ['CUDAExecutionProvider'] # 使用 GPU
# providers = ['CPUExecutionProvider']
#providers = ['TensorrtExecutionProvider']
session = onnxruntime.InferenceSession(model_path, providers=providers)
print("ONNX模型已成功加载。")
return session
def main(image_path, session):
image,_ = read_image(image_path)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
pred = session.run([output_name], {input_name: image})[0]
pred = np.squeeze(pred)
pred = pred.tolist()
return pred.index(max(pred)), max(pred), labels[pred.index(max(pred))]
def plot_confusion_matrix(y_true, y_pred, labels):
"""
绘制混淆矩阵
输入 y_true: 真实标签
输入 y_pred: 预测标签
输入 labels: 标签名称
"""
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--images_path', type=str, default="/home/workspace/temp/test", help='images_path')
parser.add_argument('--model_path', type=str, default="/home/workspace/temp/last.onnx", help='model_path')
args = parser.parse_args()
img_list = []
img_list = getFileList(args.images_path, img_list)
count = 0
session = load_onnx_model(args.model_path)
start = time.time()
y_true = []
y_pred = []
count_time = 0
for img in img_list:
#true_label = int(img.split('/')[-2].split('-')[0])
true_label = img.split('/')[-2]
start_1 = time.time()
predicted_index, score, label = main(img, session)
print(img,label, score)
count_time += time.time() - start_1
y_true.append(true_label)
#y_pred.append(predicted_index)
y_pred.append(label)
if label == true_label:
count += 1
# else:
# dst_path = img.replace('test', 'test_out')
# dst_dir = os.path.dirname(dst_path)
# if not os.path.exists(dst_dir):
# os.makedirs(dst_dir)
# shutil.copy(img, dst_path.replace('.jpg', "-" + label + '.jpg'))
accuracy = count / len(img_list) * 100
print(f"Accuracy: {accuracy:.2f}%")
print(f"Correct predictions: {count}, Total images: {len(img_list)}")
print(f"Time taken: {time.time() - start:.6f} seconds")
print("推理", len(img_list), "张图像用时", count_time)
# 绘制混淆矩阵
plot_confusion_matrix(y_true, y_pred, labels)
tensorrt模型转换
/home/tensorrt8.6/TensorRT-8.6.1.6/bin/trtexec --onnx=last.onnx --saveEngine=last-fp16.engine --workspace=3000 --verbose --fp16
tensorrt代码
import os
import cv2
import tensorrt as trt
import numpy as np
import pycuda.driver as cuda
import pycuda.autoinit
import time
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from torchvision import transforms
from PIL import Image
import warnings
warnings.filterwarnings("ignore")
def plot_confusion_matrix(y_true, y_pred, labels):
"""
绘制混淆矩阵
输入 y_true: 真实标签
输入 y_pred: 预测标签
输入 labels: 标签名称
"""
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()
def sigmoid(x):
"""Sigmoid function for a scalar or NumPy array."""
return 1 / (1 + np.exp(-x))
def getFileList(dir, Filelist, ext=None):
"""
获取文件夹及其子文件夹中文件列表
输入 dir:文件夹根目录
输入 ext: 扩展名
返回: 文件路径列表
"""
newDir = dir
if os.path.isfile(dir):
if ext is None:
Filelist.append(dir)
else:
if ext in dir:
Filelist.append(dir)
elif os.path.isdir(dir):
for s in os.listdir(dir):
newDir = os.path.join(dir, s)
getFileList(newDir, Filelist, ext)
return Filelist
def read_image(image_path):
src = cv2.imdecode(np.fromfile(image_path, dtype=np.uint8), cv2.IMREAD_COLOR)
img = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
# 使用 InterpolationMode.BILINEAR 指定双线性插值
transform = transforms.Compose([
transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BILINEAR, max_size=None, antialias=True),
transforms.CenterCrop(size=(224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0., 0., 0.], std=[1., 1., 1.])
])
# 将图像转换为 PIL 图像并应用变换
pil_image = Image.fromarray(img)
normalized_image = transform(pil_image)
return np.expand_dims(normalized_image.numpy(), axis=0), src
def load_engine(engine_file_path):
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open(engine_file_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
def create_context(engine):
return engine.create_execution_context()
def allocate_buffers(engine):
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = trt.nptype(engine.get_binding_dtype(binding))
host_mem = cuda.pagelocked_empty(size, dtype)
device_mem = cuda.mem_alloc(host_mem.nbytes)
bindings.append(int(device_mem))
if engine.binding_is_input(binding):
inputs.append({'host': host_mem, 'device': device_mem})
else:
outputs.append({'host': host_mem, 'device': device_mem})
return inputs, outputs, bindings, stream
def infer(context, inputs, outputs, bindings, stream, input_data):
# Transfer input data to the GPU
[np.copyto(i['host'], input_data.ravel().astype(np.float32)) for i in inputs]
[cuda.memcpy_htod_async(i['device'], i['host'], stream) for i in inputs]
# Execute the model
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
# Transfer predictions back from the GPU
[cuda.memcpy_dtoh_async(o['host'], o['device'], stream) for o in outputs]
# Synchronize the stream
stream.synchronize()
# Return the host output
return [o['host'] for o in outputs]
def main(image_path, context,inputs, outputs, bindings, stream):
input_data, src = read_image(image_path)
pred = infer(context, inputs, outputs, bindings, stream, input_data)
pred = np.squeeze(pred)
pred = pred.tolist()
return pred.index(max(pred)), max(pred), labels[pred.index(max(pred))]
if __name__ == '__main__':
image_dir = r"/home/workspace/temp/test"
engine_file_path = '/home/workspace/temp/last-fp16.engine'
labels = ["0", "1", "2", "3", "4", "5", "6", "7"]
engine = load_engine(engine_file_path)
context = create_context(engine)
inputs, outputs, bindings, stream = allocate_buffers(engine)
img_list = []
img_list = getFileList(image_dir, img_list)
count = 0
start = time.time()
y_true = []
y_pred = []
count_time = 0
for img in img_list:
# true_label = int(img.split('/')[-2].split('-')[0])
true_label = img.split('/')[-2]
start_1 = time.time()
predicted_index, score, label = main(img, context,inputs, outputs, bindings, stream)
count_time += time.time() - start_1
y_true.append(true_label)
# y_pred.append(predicted_index)
y_pred.append(label)
if label == true_label:
count += 1
# else:
# dst_path = img.replace('test', 'test_out')
# dst_dir = os.path.dirname(dst_path)
# if not os.path.exists(dst_dir):
# os.makedirs(dst_dir)
# shutil.copy(img, dst_path.replace('.jpg', "-" + label + '.jpg'))
accuracy = count / len(img_list) * 100
print(f"Accuracy: {accuracy:.2f}%")
print(f"Correct predictions: {count}, Total images: {len(img_list)}")
print(f"Time taken: {time.time() - start:.6f} seconds")
print("推理", len(img_list), "张图像用时", count_time)
# 绘制混淆矩阵
plot_confusion_matrix(y_true, y_pred, labels)
注意事项
代码中虽然编写了sigmoid函数,但并为使用,主要是因为通过netron查看onnx模型时发现,其输出已经包含了softmax层,因此不需要再进行额外的分类输出函数
如果大家使用pycharm执行tensorrt推理有可能遇到找不到链接库的问题,可以参考作者的这一篇博客
pycharm解决ImportError: libnvinfer.so.8: cannot open shared object file: No such file or directory-CSDN博客文章浏览阅读138次。解决pycharm无法识别tensorrt系统环境变量的问题_libnvinfer.so.8: cannot open shared object file: no such file or directoryhttps://blog.csdn.net/qq_44908396/article/details/143628859