python基于DETR(DEtection TRansformer)开发构建人员手持物品检测识别分析系统

PyTorch训练代码和DETR（DEDetection-TRansformer）的预训练模型。我们用Transformer替换了完全复杂的手工制作的对象检测管道，并将Faster R-CNN与ResNet-50匹配，使用一半的计算能力（FLOP）和相同数量的参数在COCO上获得42个AP。

官方项目地址在这里，如下所示：

可以看到目前已经收获了超过1.2w的star量，还是很不错的了。

DETR整体数据流程示意图如下所示：

官方也提供了对应的预训练模型，可以自行使用：

首先按照README基础操作按照配置环境，如下所示：

相关的预处理内容在我前面的博文中都有涉及，这里就不再展开介绍了。

DETR (DEtection TRansformer) 是一种基于Transformer架构的端到端目标检测模型。与传统的基于区域提议的目标检测方法（如Faster R-CNN）不同，DETR采用了全新的思路，将目标检测问题转化为一个序列到序列的问题，通过Transformer模型实现目标检测和目标分类的联合训练。

DETR的工作流程如下：

输入图像通过卷积神经网络（CNN）提取特征图。
特征图作为编码器输入，经过一系列的编码器层得到图像特征的表示。
目标检测问题被建模为一个序列到序列的转换任务，其中编码器的输出作为解码器的输入。
解码器使用自注意力机制（self-attention）对编码器的输出进行处理，以获取目标的位置和类别信息。
最终，DETR通过一个线性层和softmax函数对解码器的输出进行分类，并通过一个线性层预测目标框的坐标。

DETR的优点包括：

端到端训练：DETR模型能够直接从原始图像到目标检测结果进行端到端训练，避免了传统目标检测方法中复杂的区域提议生成和特征对齐的过程，简化了模型的设计和训练流程。
不受固定数量的目标限制：DETR可以处理变长的输入序列，因此不受固定数量目标的限制。这使得DETR能够同时检测图像中的多个目标，并且不需要设置预先确定的目标数量。
全局上下文信息：DETR通过Transformer的自注意力机制，能够捕捉到图像中不同位置的目标之间的关系，提供了更大范围的上下文信息。这有助于提高目标检测的准确性和鲁棒性。

然而，DETR也存在一些缺点：

计算复杂度高：由于DETR采用了Transformer模型，它在处理大尺寸图像时需要大量的计算资源，导致其训练和推理速度相对较慢。
对小目标的检测性能较差：DETR模型在处理小目标时容易出现性能下降的情况。这是因为Transformer模型在处理小尺寸目标时可能会丢失细节信息，导致难以准确地定位和分类小目标。

现在我们来对比一下DETR和YOLO系列以及SSD等知名目标检测模型的优劣：

YOLO系列（包括YOLOv1、YOLOv2、YOLOv3和YOLOv4）和SSD是基于锚框的目标检测方法。它们的优点包括：

实时性能较好：YOLO系列和SSD通过使用锚框和特征金字塔网络，能够在保持较高检测准确性的同时，实现实时目标检测。
对小目标的检测效果较好：锚框的使用使得YOLO系列和SSD对小目标的检测能力相对较强。
计算效率高：相对于DETR的Transformer模型，YOLO系列和SSD的计算复杂度较低，因此训练和推理速度更快。

然而，YOLO系列和SSD也存在一些缺点：

定位精度相对较低：由于采用了固定数量的锚框，YOLO系列和SSD在目标定位方面的精度相对较低。特别是对于小尺寸目标，容易出现边界框偏移或不完整的情况。
对密集目标的处理困难：由于锚框的固定尺寸和位置，YOLO系列和SSD在处理密集目标（多个目标在空间上重叠）时可能存在困难，容易发生目标漏检或重叠框的问题。

综上所述，DETR相对于YOLO系列和SSD等基于锚框的目标检测模型具有端到端训练、不受固定目标数量限制和全局上下文信息等优点。然而，DETR在计算复杂度和对小目标的检测性能方面存在一些限制。对于实时性能要求高且注重目标定位精度的场景，YOLO系列和SSD可能是更好的选择。而对于需要全局上下文信息和不受固定目标数量限制的场景，DETR可能更适合。选择适合的目标检测模型应根据具体应用场景和需求进行评估。

首先看下整体效果：

接下来看下数据集：

需要借助于脚本转化处理为coco格式的：

这块网上现成的教程自行百度即可。

默认是100次epoch的迭代计算，看下结果详情，如下所示：

训练完成截图如下所示：

整体训练过程可视化核心实现如下所示：

def plot_logs(logs, fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt'):
    '''
    Function to plot specific fields from training log(s). Plots both training and test results.

    :: Inputs - logs = list containing Path objects, each pointing to individual dir with a log file
              - fields = which results to plot from each log file - plots both training and test for each field.
              - ewm_col = optional, which column to use as the exponential weighted smoothing of the plots
              - log_name = optional, name of log file if different than default 'log.txt'.

    :: Outputs - matplotlib plots of results in fields, color coded for each log file.
               - solid lines are training results, dashed lines are test results.

    '''
    func_name = "plot_utils.py::plot_logs"

    # verify logs is a list of Paths (list[Paths]) or single Pathlib object Path,
    # convert single Path to list to avoid 'not iterable' error

    if not isinstance(logs, list):
        if isinstance(logs, PurePath):
            logs = [logs]
            print(f"{func_name} info: logs param expects a list argument, converted to list[Path].")
        else:
            raise ValueError(f"{func_name} - invalid argument for logs parameter.\n \
            Expect list[Path] or single Path obj, received {type(logs)}")

    # Quality checks - verify valid dir(s), that every item in list is Path object, and that log_name exists in each dir
    for i, dir in enumerate(logs):
        if not isinstance(dir, PurePath):
            raise ValueError(f"{func_name} - non-Path object in logs argument of {type(dir)}: \n{dir}")
        if not dir.exists():
            raise ValueError(f"{func_name} - invalid directory in logs argument:\n{dir}")
        # verify log_name exists
        fn = Path(dir / log_name)
        if not fn.exists():
            print(f"-> missing {log_name}.  Have you gotten to Epoch 1 in training?")
            print(f"--> full path of missing log file: {fn}")
            return

    # load log file(s) and plot
    dfs = [pd.read_json(Path(p) / log_name, lines=True) for p in logs]

    fig, axs = plt.subplots(ncols=len(fields), figsize=(16, 5))

    for df, color in zip(dfs, sns.color_palette(n_colors=len(logs))):
        for j, field in enumerate(fields):
            if field == 'mAP':
                coco_eval = pd.DataFrame(
                    np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]
                ).ewm(com=ewm_col).mean()
                axs[j].plot(coco_eval, c=color)
            else:
                df.interpolate().ewm(com=ewm_col).mean().plot(
                    y=[f'train_{field}', f'test_{field}'],
                    ax=axs[j],
                    color=[color] * 2,
                    style=['-', '--']
                )
    for ax, field in zip(axs, fields):
        ax.legend([Path(p).name for p in logs])
        ax.set_title(field)


def plot_precision_recall(files, naming_scheme='iter'):
    if naming_scheme == 'exp_id':
        # name becomes exp_id
        names = [f.parts[-3] for f in files]
    elif naming_scheme == 'iter':
        names = [f.stem for f in files]
    else:
        raise ValueError(f'not supported {naming_scheme}')
    fig, axs = plt.subplots(ncols=2, figsize=(16, 5))
    for f, color, name in zip(files, sns.color_palette("Blues", n_colors=len(files)), names):
        data = torch.load(f)
        # precision is n_iou, n_points, n_cat, n_area, max_det
        precision = data['precision']
        recall = data['params'].recThrs
        scores = data['scores']
        # take precision for all classes, all areas and 100 detections
        precision = precision[0, :, :, 0, -1].mean(1)
        scores = scores[0, :, :, 0, -1].mean(1)
        prec = precision.mean()
        rec = data['recall'][0, :, 0, -1].mean()
        print(f'{naming_scheme} {name}: mAP@50={prec * 100: 05.1f}, ' +
              f'score={scores.mean():0.3f}, ' +
              f'f1={2 * prec * rec / (prec + rec + 1e-8):0.3f}'
              )
        axs[0].plot(recall, precision, c=color)
        axs[1].plot(recall, scores, c=color)

    axs[0].set_title('Precision / Recall')
    axs[0].legend(names)
    axs[1].set_title('Scores / Recall')
    axs[1].legend(names)
    return fig, axs

结果如下所示：