基于ADAS 与关键点特征金字塔网络融合的3D LiDAR目标检测原理与算法实现

一、概述

3D LiDAR目标检测是一种在三维空间中识别和定位感兴趣目标的技术。在自动驾驶系统和先进的空间分析中，目标检测方法的不断演进至关重要。3D LiDAR目标检测作为一种变革性的技术，在环境感知方面提供了前所未有的准确性和深度信息.

在这里，我们将深入探讨使用关键点特征金字塔网络（K-FPN）结合KITTI 360 Vision数据集，融合RGB相机和3D LiDAR数据，实现自动驾驶的详细过程和训练方法。

二、3D 点云中的目标检测

3D目标检测的核心在于识别和定位三维空间中的物体。与仅考虑图像平面上的高度和宽度的2D检测不同，3D检测还融入了深度信息，从而提供了完整的空间理解。这对于自动驾驶、机器人技术和增强现实等应用至关重要，因为这些领域与环境的交互是三维的.

2.1 人类深度感知

3D目标检测背后的基本直觉源于人类感知深度的方式。人类视觉利用阴影、透视和视差等线索推断第三维。类似地，3D检测算法利用几何形状、阴影和点的相对运动来辨别深度。
在这里插入图片描述

2.2 数字深度感知

Shuiwang Ji等人在他们的研究论文《3D Convolutional Neural Networks for Human Action Recognition》中首次提出了3D - CNN的概念。他们的模型能够通过执行3D卷积从空间和时间维度提取特征，从而捕捉多个相邻帧中编码的运动信息。这个特定模型从输入帧生成多个信息通道，最终的特征表示结合了所有通道的信息。
在这里插入图片描述

3D环境的表示通常通过点云实现，点云是三维坐标系中的顶点集合。这些顶点通常来自结构光或激光雷达（LiDAR）传感器。3D检测的一个关键方面是将这些点云转换为可处理的格式，以便识别目标。这涉及到分割，即将点云划分为可能代表目标的簇，然后将这些簇分类为已知类别，如汽车、行人或其他感兴趣的目标。

这里的技术挑战很大，因为点云数据具有稀疏性和可变性。与2D图像中的像素不同，3D空间中的点分布不均匀，并且其密度会随与传感器的距离而变化。诸如PointNet及其后续版本（如PointNet++）等复杂算法可以直接处理点云，学习对排列不变且对遮挡和杂乱具有鲁棒性的特征。

2.3 3D点云环境中目标检测的特殊性

在3D点云环境中检测目标引入了传统2D目标检测中不存在的几个特殊特征：

深度估计：最显著的特征之一是深度估计，它允许确定目标与传感器的距离。在点云中直接测量深度，而在2D图像中则必须推断深度。
体积估计：算法可以利用数据的体积性质，考虑目标的实际形状和大小。这与2D边界框不同，2D边界框仅近似目标在图像平面中的占位面积。
6DoF（六个自由度）目标姿态：3D检测算法不仅定位目标，还确定其在空间中的方向，提供完整的6DoF姿态估计（三个用于位置，三个用于旋转）。
尺度不变性：检测过程可以对目标的尺度不变。这对于基于LiDAR的系统尤为重要，因为目标可能出现在不同距离，因此具有不同尺度。
动态环境中的时间连续性：先进的3D目标检测系统利用动态环境中的时间连续性。通过跟踪点云数据随时间的变化，它们可以预测移动目标的轨迹和速度。

三、论文综述

3.1 VoxelNet

Yin Zhou和Oncel Tuzel提出了VoxelNet——一种基于点云的3D目标检测的端到端学习方法。VoxelNet创新地将点云划分为结构化的3D体素网格，并采用独特的体素特征编码层将每个体素内的点转换为全面的特征表示。该表示与区域建议网络（RPN）无缝集成，以生成目标检测结果。在KITTI汽车检测基准测试中，VoxelNet显著优于现有的基于LiDAR的检测方法，并展示了学习不同目标表示的卓越能力，其在检测行人和自行车方面也取得了有前景的结果。
在这里插入图片描述

3.2 BirdNet

Jorge Beltrán等人引入了BirdNet——一个基于LiDAR信息的3D目标检测框架。他们的方法首先对激光数据的鸟瞰图投影进行创新的单元编码，然后使用从图像处理技术改编的卷积神经网络估计目标位置和方向。最后阶段涉及后处理，以巩固3D定向检测。在KITTI数据集上进行验证时，他们的框架不仅在该领域设定了新标准，还在不同LiDAR系统中表现出通用性，证实了其在现实交通条件下的稳健性。
在这里插入图片描述

3.3 VirConvNet

Hai Wu等人提出了VirConvNet，这是一种新颖且高效的骨干网络，旨在提高检测性能同时管理计算负载。VirConvNet的核心是两个创新组件：StVD（随机体素丢弃），它有策略地减少冗余体素计算；NRConv（抗噪子流形卷积），它通过利用2D和3D数据稳健地编码体素特征。作者展示了他们管道的三个变体：VirConv - L用于效率，VirConv - T用于精度，VirConv - S用于半监督方法。令人印象深刻的是，他们的管道在KITTI汽车3D检测排行榜上取得了顶级排名，VirConv - S领先，VirConv - L具有快速推理时间。
在这里插入图片描述
Peixuan Li等人开发了一种新颖的单目3D检测框架，能够进行高效且准确的单次预测。他们的方法摆脱了对传统2D边界框约束的依赖，创新性地从单目图像预测3D边界框的九个关键点，利用几何关系准确推断3D空间中的尺寸、位置和方向。即使在有噪声的关键点估计情况下，这种方法也被证明是稳健的，其紧凑的架构有助于实现快速检测速度。值得注意的是，他们的训练方案不需要外部网络依赖或监督数据。该框架成为第一个用于单目图像3D检测的实时系统，在KITTI数据集上设定了新的性能基准。
在这里插入图片描述

四、用于3D LiDAR目标检测的数据集可视化

4.1 KITTI 360 Vision数据集

在这里将使用KITTI 360 Vision数据集进行训练过程。这是一个相对较大的数据集，因此需要进行3D LiDAR可视化以进行探索性数据分析（EDA）过程。以下是该实验的一些可视化结果。
在这里插入图片描述

可视化突出了来自传感器的3D LiDAR数据的三维表示。然而，在RGB相机流上可视化3D边界框也很重要，这对于开发先进驾驶辅助系统（ADAS）至关重要。为此，您必须首先下载数据集并创建目录结构。以下是KITTI 360 Vision数据集特定文件的链接：

Velodyne点云 - 激光信息（29GB）
对象数据集的训练标签（5MB）
对象数据集的相机校准矩阵（16MB）
对象数据集的左彩色图像（12GB） - 用于可视化

现在安排文件，使目录结构如下所示：

kitti
├── demo|
   └── calib.txt
├── gt_database
├── gt_database_mm
├── ImageSets
   ├── train.txt|
   ├── test.txt|
   └── valid.txt
├── training
   ├── image_2
   ├── label_2
   ├── calib
   └── velodyne
├── testing
   ├── image_2
   ├── calib
   └── velodyne
├── kitti_dbinfos_train.pkl
├── kitti_dbinfos_train_mm.pkl
├── kitti_infos_train.pkl
├── kitti_infos_trainval.pkl
├── kitti_infos_val.pkl
└── kitti_infos_test.pkl

花点时间探索代码库中kitti_dataset.py文件里定义的KittiDataset类中的方法。可以通过滚动到本研究文章的代码演练部分或点击此处下载代码。

这个KittiDataset类是一个自定义数据集类，适用于加载和操作来自KITTI 360 Vision数据集的数据。这个数据集类针对不同的操作模式（如训练（'train'）、验证（'val'）和测试（'test'））进行了定制，并通过configs参数进行配置，该参数包含目录路径、输入大小和类别数量等设置。这是在data_process目录中的kitti_dataset.py脚本中实现的。

以下是类方法及其功能的细分：

def __init__(self, configs, mode='train', lidar_aug=None, hflip_prob=None, num_samples=None):
    self.dataset_dir = configs.dataset_dir
    self.input_size = configs.input_size
    self.hm_size = configs.hm_size
    self.num_classes = configs.num_classes
    self.max_objects = configs.max_objects
    assert mode in ['train', 'val', 'test'], 'Invalid mode: {}'.format(mode)
    self.mode = mode
    self.is_test = (self.mode == 'test')
    sub_folder = 'testing' if self.is_test else 'training'
    self.lidar_aug = lidar_aug
    self.hflip_prob = hflip_prob
    self.image_dir = os.path.join(self.dataset_dir, sub_folder, "image_2")
    self.lidar_dir = os.path.join(self.dataset_dir, sub_folder, "velodyne")
    self.calib_dir = os.path.join(self.dataset_dir, sub_folder, "calib")
    self.label_dir = os.path.join(self.dataset_dir, sub_folder, "label_2")
    split_txt_path = os.path.join(self.dataset_dir, 'ImageSets', '{}.txt'.format(mode))
    self.sample_id_list = [int(x.strip()) for x in open(split_txt_path).readlines()]
    if num_samples is not None:
        self.sample_id_list = self.sample_id_list[:num_samples]
    self.num_samples = len(self.sample_id_list)

这个初始化方法通过初始化各种数据目录（图像、LiDAR、校准和标签）的路径来设置数据集，并根据操作模式创建要使用的样本ID列表。它可以可选地应用LiDAR数据增强（lidar_aug）和水平翻转（hflip_prob）进行数据增强。如果指定了num_samples，数据集将相应地截断其长度。

def __len__(self):
    return len(self.sample_id_list)

此方法返回数据集中的样本数量，允许PyTorch的DataLoader正确迭代数据集。

def __getitem__(self, index):
    if self.is_test:
        return self.load_img_only(index)
    else:
        return self.load_img_with_targets(index)

此方法从数据集中检索单个数据点。如果模式为“test”，它调用load_img_only，仅检索图像数据。对于“train”或“val”，它调用load_img_with_targets以获取图像数据和相应的目标标签。

def load_img_only(self, index):
    """Load only image for the testing phase"""
    sample_id = int(self.sample_id_list[index])
    img_path, img_rgb = self.get_image(sample_id)
    lidarData = self.get_lidar(sample_id)
    lidarData = get_filtered_lidar(lidarData, cnf.boundary)
    bev_map = makeBEVMap(lidarData, cnf.boundary)
    bev_map = torch.from_numpy(bev_map)
    metadatas = {
        'img_path': img_path,
    }
    return metadatas, bev_map, img_rgb

此方法在测试阶段用于仅加载图像数据及其相关元数据，因为测试期间不使用标签。

def load_img_with_targets(self, index):
    """Load images and targets for the training and validation phase"""
    sample_id = int(self.sample_id_list[index])
    img_path = os.path.join(self.image_dir, '{:06d}.png'.format(sample_id))
    lidarData = self.get_lidar(sample_id)
    calib = self.get_calib(sample_id)
    labels, has_labels = self.get_label(sample_id)
    if has_labels:
        labels[:, 1:] = transformation.camera_to_lidar_box(labels[:, 1:], calib.V2C, calib.R0, calib.P2)
    if self.lidar_aug:
        lidarData, labels[:, 1:] = self.lidar_aug(lidarData, labels[:, 1:])
    lidarData, labels = get_filtered_lidar(lidarData, cnf.boundary, labels)
    bev_map = makeBEVMap(lidarData, cnf.boundary)
    bev_map = torch.from_numpy(bev_map)
    hflipped = False
    if np.random.random() < self.hflip_prob:
        hflipped = True
    # C, H, W
    bev_map = torch.flip(bev_map, [-1])
    targets = self.build_targets(labels, hflipped)
    metadatas = {
        'img_path': img_path,
        'hflipped': hflipped
    }
    return metadatas, bev_map, targets

此方法加载用于训练或验证的图像和目标标签。它应用任何指定的LiDAR增强，并在需要时处理翻转鸟瞰图（BEV）映射。它还构建用于目标检测的目标，包括热图、中心偏移、尺寸和方向。

def get_image(self, idx):
    img_path = os.path.join(self.image_dir, '{:06d}.png'.format(idx))
    img = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)
    return img_path, img

此方法获取图像文件路径并使用OpenCV加载它，将其从BGR转换为RGB格式。

def get_calib(self, idx):
    calib_file = os.path.join(self.calib_dir, '{:06d}.txt'.format(idx))
    # assert os.path.isfile(calib_file)
    return Calibration(calib_file)

此方法检索指定索引的校准数据，这对于在相机和LiDAR坐标系之间进行转换至关重要。

def get_lidar(self, idx):
    lidar_file = os.path.join(self.lidar_dir, '{:06d}.bin'.format(idx))
    # assert os.path.isfile(lidar_file)
    return np.fromfile(lidar_file, dtype=np.float32).reshape(-1, 4)

它从二进制文件加载原始LiDAR数据并将其重塑为N x 4的NumPy数组，其中N是点数，4表示x、y、z坐标和反射强度。

def get_label(self, idx):
    labels = []
    label_path = os.path.join(self.label_dir, '{:06d}.txt'.format(idx))
    for line in open(label_path, 'r'):
        line = line.rstrip()
        line_parts = line.split(' ')
        obj_name = line_parts[0]  # 'Car', 'Pedestrian',...
        cat_id = int(cnf.CLASS_NAME_TO_ID[obj_name])
        if cat_id <= -99:  # ignore Tram and Misc
            continue
        truncated = int(float(line_parts[1]))  # truncated pixel ratio [0..1]
        occluded = int(line_parts[2])  # 0=visible, 1=partly occluded, 2=fully occluded, 3=unknown
        alpha = float(line_parts[3])  # object observation angle [-pi..pi]
        # xmin, ymin, xmax, ymax
        bbox = np.array([float(line_parts[4]), float(line_parts[5]), float(line_parts[6]), float(line_parts[7])])
        # height, width, length (h, w, l)
        h, w, l = float(line_parts[8]), float(line_parts[9]), float(line_parts[10])
        # location (x,y,z) in camera coord.
        x, y, z = float(line_parts[11]), float(line_parts[12]), float(line_parts[13])
        ry = float(line_parts[14])  # yaw angle (around Y-axis in camera coordinates) [-pi..pi]
        object_label = [cat_id, x, y, z, h, w, l, ry]
        labels.append(object_label)
    if len(labels) == 0:
        labels = np.zeros((1, 8), dtype=np.float32)
        has_labels = False
    else:
        labels = np.array(labels, dtype=np.float32)
        has_labels = True
    return labels, has_labels

此方法从标签文件读取对象标签，包括对象类型、尺寸和方向等属性。

def build_targets(self, labels, hflipped):
    minX = cnf.boundary['minX']
    maxX = cnf.boundary['maxX']
    minY = cnf.boundary['minY']
    maxY = cnf.boundary['maxY']
    minZ = cnf.boundary['minZ']
    maxZ = cnf.boundary['maxZ']
    num_objects = min(len(labels), self.max_objects)
    hm_l, hm_w = self.hm_size
    hm_main_center = np.zeros((self.num_classes, hm_l, hm_w), dtype=np.float32)
    cen_offset = np.zeros((self.max_objects, 2), dtype=np.float32)
    direction = np.zeros((self.max_objects, 2), dtype=np.float32)
    z_coor = np.zeros((self.max_objects, 1), dtype=np.float32)
    dimension = np.zeros((self.max_objects, 3), dtype=np.float32)
    indices_center = np.zeros((self.max_objects), dtype=np.int64)
    obj_mask = np.zeros((self.max_objects), dtype=np.uint8)
    for k in range(num_objects):
        cls_id, x, y, z, h, w, l, yaw = labels[k]
        cls_id = int(cls_id)
        # Invert yaw angle
        yaw = -yaw
        if not ((minX <= x <= maxX) and (minY <= y <= maxY) and (minZ <= z <= maxZ)):
            continue
        if (h <= 0) or (w <= 0) or (l <= 0):
            continue
        bbox_l = l / cnf.bound_size_x * hm_l
        bbox_w = w / cnf.bound_size_y * hm_w
        radius = compute_radius((math.ceil(bbox_l), math.ceil(bbox_w)))
        radius = max(0, int(radius))
        center_y = (x - minX) / cnf.bound_size_x * hm_l  # x --> y (invert to 2D image space)
        center_x = (y - minY) / cnf.bound_size_y * hm_w  # y --> x
        center = np.array([center_x, center_y], dtype=np.float32)
        if hflipped:
            center[0] = hm_w - center[0] - 1
        center_int = center.astype(np.int32)
        if cls_id < 0:
            ignore_ids = [_ for _ in range(self.num_classes)] if cls_id == -1 else [-cls_id - 2]
            # Consider to make mask ignore
            for cls_ig in ignore_ids:
                gen_hm_radius(hm_main_center[cls_ig], center_int, radius)
            hm_main_center[ignore_ids, center_int[1], center_int[0]] = 0.9999
            continue
        # Generate heatmaps for main center
        gen_hm_radius(hm_main_center[cls_id], center, radius)
        # Index of the center
        indices_center[k] = center_int[1] * hm_w + center_int[0]
        # targets for center offset
        cen_offset[k] = center - center_int
        # targets for dimension
        dimension[k, 0] = h
        dimension[k, 1] = w
        dimension[k, 2] = l
        # targets for direction
        direction[k, 0] = math.sin(float(yaw))  # im
        direction[k, 1] = math.cos(float(yaw))  # re
        # im --> -im
        if hflipped:
            direction[k, 0] = -direction[k, 0]
        # targets for depth
        z_coor[k] = z - minZ
        # Generate object masks
        obj_mask[k] = 1
    targets = {
        'hm_cen': hm_main_center,
        'cen_offset': cen_offset,
        'direction': direction,
        'z_coor': z_coor,
        'dim': dimension,
        'indices_center': indices_center,
        'obj_mask': obj_mask,
    }
    return targets

基于处理后的标签和增强信息，此方法构建用于训练模型的目标变量。这些包括对象中心的热图、中心点的偏移、对象尺寸、方向向量和指示对象存在的掩码。

def draw_img_with_label(self, index):
    sample_id = int(self.sample_id_list[index])
    img_path, img_rgb = self.get_image(sample_id)
    lidarData = self.get_lidar(sample_id)
    calib = self.get_calib(sample_id)
    labels, has_labels = self.get_label(sample_id)
    if has_labels:
        labels[:, 1:] = transformation.camera_to_lidar_box(labels[:, 1:], calib.V2C, calib.R0, calib.P2)
    if self.lidar_aug:
        lidarData, labels[:, 1:] = self.lidar_aug(lidarData, labels[:, 1:])
    lidarData, labels = get_filtered_lidar(lidarData, cnf.boundary, labels)
    bev_map = makeBEVMap(lidarData, cnf.boundary)
    return bev_map, labels, img_rgb, img_path

最后，这个实用函数用于在BEV图上叠加标签以进行可视化，这对于理解数据和调试数据集类特别有用。

4.2 RGB POV相机和3D BEV LiDAR点云模拟的分析

以下是由上一节中所示的KittiDataset类生成的一些可视化结果。
在这里插入图片描述

上面图像上半部分显示了道路场景的标准POV相机视图，而下半部分显示了来自3D LiDAR数据的相应鸟瞰图（BEV）。让我们仔细看看并分析这个可视化：

RGB POV相机视图：街道视图中的对象被封闭在3D边界框中，表示对象在三维空间中的空间范围：长度、宽度和高度。
3D BEV LiDAR视图：底部图像表示由LiDAR点构建的BEV图。在BEV图中，世界从俯视角度查看，LiDAR数据投影到二维平面上。这种投影有助于理解对象之间的空间布局和关系，而不会受到相机图像的透视失真影响。BEV中的红色边界框对应于相机视图中的3D边界框注释，平铺到2D平面上。它显示了检测到的对象相对于车辆位置（通常位于同心弧的中心）的位置和方向。同心弧表示与3D LiDAR传感器的距离间隔。它们给出了点云中的点和对象的尺度和距离感。

五、关键点特征金字塔网络架构

关键点特征金字塔网络（KFPN），如Peixuan Li等人在RTM3D研究论文中详细描述的那样，为3D目标检测提供了一种复杂而细致的方法，特别是在自动驾驶场景中。这个网络架构专门用于处理从3D LiDAR点云编码的鸟瞰图（BEV），并输出具有七个自由度（7 - DOF）的详细目标检测结果。
在这里插入图片描述

5.1 关键技术

骨干网络：使用ResNet - 18和DLA - 34骨干网络进行初始图像处理，应用了下采样因子为[此处可能是文档中缺失的下采样因子具体值]以提高计算效率。
上采样和特征连接：采用一系列双线性插值和[此处可能是文档中缺失的卷积相关内容]卷积，通过在每个上采样阶段连接相应的低级特征图来丰富特征表示。
关键点特征金字塔：采用一种新颖的方法进行尺度不变的关键点检测，将每个尺度的特征图调整为最大尺度以进行一致的关键点分析。
检测头：由基本组件和可选组件组合而成，包括用于3D边界框主中心和顶点检测的热图。
关键点关联：回归局部偏移以进行关键点分组，并采用多箱方法进行精确的偏航角估计，提高3D LiDAR目标检测的准确性。

5.2 骨干网络

KFPN利用两个不同的结构作为其骨干网络：ResNet - 18和DLA - 34。这些骨干网络负责对单个RGB输入图像（表示为[此处可能是文档中缺失的图像表示相关内容]）进行初始处理。图像经过下采样因子为[此处可能是文档中缺失的下采样因子具体值]的下采样，与图像分类网络中的标准做法一致，其中最大下采样因子为×32。骨干网络在特征提取和降低计算复杂性方面起着至关重要的作用。

5.3 上采样和特征连接

在初始下采样之后，网络采用一系列上采样层。这个过程涉及三个双线性插值与[此处可能是文档中缺失的卷积相关内容]卷积层相结合。在每个上采样步骤之前，网络连接相应的低级特征图，然后通过一个[此处可能是文档中缺失的卷积相关内容]卷积层来减少通道维度。经过这三个上采样层后，输出通道分别为256、128和64。这种策略确保了丰富的特征表示，涵盖了输入的高级和低级细节。

5.4 关键点特征金字塔

在传统的特征金字塔网络（FPN）中，多尺度检测很常见。然而，对于关键点检测，由于图像中的关键点大小变化不大，KFPN采用了不同的方法。它提出了一种新颖的关键点特征金字塔，用于在点空间中检测尺度不变的关键点。这涉及将每个尺度的特征图调整回最大尺度，生成特征图[此处可能是文档中缺失的特征图相关内容]，然后应用softmax操作来得出每个尺度的重要性（权重）。最终的尺度空间得分图[此处可能是文档中缺失的得分图相关内容]通过这些特征图的线性加权和获得。

5.5 检测头

KFPN的检测头包括三个基本组件和六个可选组件。这些组件旨在以最小的计算开销提高3D检测的准确性。受CenterNet的启发，网络使用一个关键点作为连接所有特征的主中心。这个主中心的热图定义为[此处可能是文档中缺失的热图定义相关内容]，其中[此处可能是文档中缺失的类别数量相关内容]表示对象类别数量。网络还输出由3D边界框的顶点和中心投影的九个透视点的热图，表示为[此处可能是文档中缺失的热图表示相关内容]。

5.6 关键点关联和其他组件

为了关联对象的关键点，网络回归从主中心的局部偏移[此处可能是文档中缺失的偏移相关内容]。这有助于将属于同一对象的关键点分组。其他组件，如3D对象的中心和顶点偏移、尺寸和方向，被包括在内以提供更多约束并提高检测性能。方向由偏航角[此处可能是文档中缺失的偏航角相关内容]表示，网络利用多箱方法回归局部方向。

六、代码演示 - KFPN

6.1 训练策略

KFPN用于3D LiDAR目标检测的训练遵循一种侧重于平衡正负样本的策略。焦点损失被用于解决这种不平衡，这是目标检测网络中优化学习过程的常见方法。整个管道在train.py脚本中实现。让我们探索构成这个训练管道的函数：

def main_worker(gpu_idx, configs):
    configs.gpu_idx = gpu_idx
    configs.device = torch.device('cpu' if configs.gpu_idx is None else 'cuda:{}'.format(configs.gpu_idx))
    if configs.distributed:
        if configs.dist_url == "env://" and configs.rank == -1:
            configs.rank = int(os.environ["RANK"])
        if configs.multiprocessing_distributed:
            # For multiprocessing distributed training, rank needs to be the
            # global rank among all the processes
            configs.rank = configs.rank * configs.ngpus_per_node + gpu_idx
        dist.init_process_group(backend=configs.dist_backend, init_method=configs.dist_url,
                                world_size=configs.world_size, rank=configs.rank)
        configs.subdivisions = int(64 / configs.batch_size / configs.ngpus_per_node)
    else:
        configs.subdivisions = int(64 / configs.batch_size)
    configs.is_master_node = (not configs.distributed) or (
            configs.distributed and (configs.rank % configs.ngpus_per_node == 0))
    if configs.is_master_node:
        logger = Logger(configs.logs_dir, configs.saved_fn)
        logger.info('>>> Created a new logger')
        logger.info('>>> configs: {}'.format(configs))
        tb_writer = SummaryWriter(log_dir=os.path.join(configs.logs_dir, 'tensorboard'))
    else:
        logger = None
        tb_writer = None
    # model
    model = create_model(configs)
    # load weight from a checkpoint
    if configs.pretrained_path is not None:
        assert os.path.isfile(configs.pretrained_path), "=> no checkpoint found at '{}'".format(configs.pretrained_path)
        model.load_state_dict(torch.load(configs.pretrained_path, map_location='cpu'))
        if logger is not None:
            logger.info('loaded pretrained model at {}'.format(configs.pretrained_path))
    # resume weights of model from a checkpoint
    if configs.resume_path is not None:
        assert os.path.isfile(configs.resume_path), "=> no checkpoint found at '{}'".format(configs.resume_path)
        model.load_state_dict(torch.load(configs.resume_path, map_location='cpu'))
        if logger is not None:
            logger.info('resume training model from checkpoint {}'.format(configs.resume_path))
    # Data Parallel
    model = make_data_parallel(model, configs)
    # Make sure to create optimizer after moving the model to cuda
    optimizer = create_optimizer(configs, model)
    lr_scheduler = create_lr_scheduler(optimizer, configs)
    configs.step_lr_in_epoch = False if configs.lr_type in ['multi_step', 'cosin', 'one_cycle'] else True
    # resume optimizer, lr_scheduler from a checkpoint
    if configs.resume_path is not None:
        utils_path = configs.resume_path.replace('Model_', 'Utils_')
        assert os.path.isfile(utils_path), "=> no checkpoint found at '{}'".format(utils_path)
        utils_state_dict = torch.load(utils_path, map_location='cuda:{}'.format(configs.gpu_idx))
        optimizer.load_state_dict(utils_state_dict['optimizer'])
        lr_scheduler.load_state_dict(utils_state_dict['lr_scheduler'])
        configs.start_epoch = utils_state_dict['epoch'] + 1
    if configs.is_master_node:
        num_parameters = get_num_parameters(model)
        logger.info('number of trained parameters of the model: {}'.format(num_parameters))
    if logger is not None:
        logger.info(">>> Loading dataset & getting dataloader...")
    # Create dataloader
    train_dataloader, train_sampler = create_train_dataloader(configs)
    if logger is not None:
        logger.info('number of batches in training set: {}'.format(len(train_dataloader)))
    if configs.evaluate:
        val_dataloader = create_val_dataloader(configs)
        val_loss = validate(val_dataloader, model, configs)
        print('val_loss: {:.4e}'.format(val_loss))
        return
    for epoch in range(configs.start_epoch, configs.num_epochs + 1):
        if logger is not None:
            logger.info('{}'.format('*' * 40))
            logger.info('{} {}/{} {}'.format('=' * 35, epoch, configs.num_epochs, '=' * 35))
            logger.info('{}'.format('*' * 40))
            logger.info('>>> Epoch: [{}/{}]'.format(epoch, configs.num_epochs))
        if configs.distributed:
            train_sampler.set_epoch(epoch)
        # train for one epoch
        train_one_epoch(train_dataloader, model, optimizer, lr_scheduler, epoch, configs, logger, tb_writer)
        if (not configs.no_val) and (epoch % configs.checkpoint_freq == 0):
            val_dataloader = create_val_dataloader(configs)
            print('number of batches in val_dataloader: {}'.format(len(val_dataloader)))
            val_loss = validate(val_dataloader, model, configs)
            print('val_loss: {:.4e}'.format(val_loss))
            if tb_writer is not None:
                tb_writer.add_scalar('Val_loss', val_loss, epoch)
        # Save checkpoint
        if configs.is_master_node and ((epoch % configs.checkpoint_freq) == 0):
            model_state_dict, utils_state_dict = get_saved_state(model, optimizer, lr_scheduler, epoch, configs)
            save_checkpoint(configs.checkpoints_dir, configs.saved_fn, model_state_dict, utils_state_dict, epoch)
        if not configs.step_lr_in_epoch:
            lr_scheduler.step()
            if tb_writer is not None:
                tb_writer.add_scalar('LR', lr_scheduler.get_lr()[0], epoch)
    if tb_writer is not### 训练策略（续）
```python
def train_one_epoch(train_dataloader, model, optimizer, lr_scheduler, epoch, configs, logger, tb_writer):
    batch_time = AverageMeter('Time', ':6.3f')
    data_time = AverageMeter('Data', ':6.3f')
    losses = AverageMeter('Loss', ':.4e')
    progress = ProgressMeter(len(train_dataloader), [batch_time, data_time, losses],
                             prefix="Train - Epoch: [{}/{}]".format(epoch, configs.num_epochs))
    criterion = Compute_Loss(device=configs.device)
    num_iters_per_epoch = len(train_dataloader)
    # switch to train mode
    model.train()
    start_time = time.time()
    for batch_idx, batch_data in enumerate(tqdm(train_dataloader)):
        data_time.update(time.time() - start_time)
        metadatas, imgs, targets = batch_data
        batch_size = imgs.size(0)
        global_step = num_iters_per_epoch * (epoch - 1) + batch_idx + 1
        for k in targets.keys():
            targets[k] = targets[k].to(configs.device, non_blocking=True)
        imgs = imgs.to(configs.device, non_blocking=True).float()
        outputs = model(imgs)
        total_loss, loss_stats = criterion(outputs, targets)
        # For torch.nn.DataParallel case
        if (not configs.distributed) and (configs.gpu_idx is None):
            total_loss = torch.mean(total_loss)
        # compute gradient and perform backpropagation
        total_loss.backward()
        if global_step % configs.subdivisions == 0:
            optimizer.step()
            # zero the parameter gradients
            optimizer.zero_grad()
            # Adjust learning rate
            if configs.step_lr_in_epoch:
                lr_scheduler.step()
                if tb_writer is not None:
                    tb_writer.add_scalar('LR', lr_scheduler.get_lr()[0], global_step)
        if configs.distributed:
            reduced_loss = reduce_tensor(total_loss.data, configs.world_size)
        else:
            reduced_loss = total_loss.data
        losses.update(to_python_float(reduced_loss), batch_size)
        # measure elapsed time
        # torch.cuda.synchronize()
        batch_time.update(time.time() - start_time)
        if tb_writer is not None:
            if (global_step % configs.tensorboard_freq) == 0:
                loss_stats['avg_loss'] = losses.avg
                tb_writer.add_scalars('Train', loss_stats, global_step)
        # Log message
        if logger is not None:
            if (global_step % configs.print_freq) == 0:
                logger.info(progress.get_message(batch_idx))
        start_time = time.time()

6.2 验证

验证在训练过程中同样重要。其主要目的是评估模型在验证数据集上的性能。为此，在这个脚本中使用了validate()函数。让我们也详细看看这个函数：

def validate(val_dataloader, model, configs):
    losses = AverageMeter('Loss', ':.4e')
    criterion = Compute_Loss(device=configs.device)
    # switch to train mode
    model.eval()
    with torch.no_grad():
        for batch_idx, batch_data in enumerate(tqdm(val_dataloader)):
            metadatas, imgs, targets = batch_data
            batch_size = imgs.size(0)
            for k in targets.keys():
                targets[k] = targets[k].to(configs.device, non_blocking=True)
            imgs = imgs.to(configs.device, non_blocking=True).float()
            outputs = model(imgs)
            total_loss, loss_stats = criterion(outputs, targets)
            # For torch.nn.DataParallel case
            if (not configs.distributed) and (configs.gpu_idx is None):
                total_loss = torch.mean(total_loss)
            if configs.distributed:
                reduced_loss = reduce_tensor(total_loss.data, configs.world_size)
            else:
                reduced_loss = total_loss.data
            losses.update(to_python_float(reduced_loss), batch_size)
    return losses.avg

在这里插入图片描述

6.3 参数

val_dataloader：提供验证数据批次的数据加载器。
model：正在评估的模型。
configs：包含评估参数（包括设备信息）的配置设置。

6.4 函数内部运作详细说明

损失度量初始化：一个名为losses的AverageMeter对象被初始化，用于跟踪验证数据集上的平均损失。Compute_Loss函数使用配置中的指定设备进行初始化。这个函数将计算模型预测与真实值之间的损失。
模型评估模式：使用model.eval()将模型设置为评估模式。这会禁用某些仅在训练期间相关的层和行为，如随机失活（dropout）和批量归一化（batch normalization），确保模型在验证期间的行为一致且确定性。在这里，使用torch.no_grad()上下文管理器来禁用梯度计算，这可以减少内存消耗并加快过程，因为在模型评估时不需要梯度。
返回平均损失：在成功进行前向传播后，函数返回整个验证数据集上的平均损失，由losses AverageMeter计算得出。

6.5 模型推理

在本节中，我们将探索专门为处理和分析用于3D LiDAR目标检测任务的鸟瞰图（BEV）而设计的推理管道。

if __name__ == '__main__':
    configs = parse_demo_configs()
    # Try to download the dataset for demonstration
    server_url = 'https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data'
    download_url = '{}/{}/{}.zip'.format(server_url, configs.foldername[:-5], configs.foldername)
    download_and_unzip(configs.dataset_dir, download_url)
    model = create_model(configs)
    print('\n\n' + '-*=' * 30 + '\n\n')
    assert os.path.isfile(configs.pretrained_path), "No file at {}".format(configs.pretrained_path)
    model.load_state_dict(torch.load(configs.pretrained_path, map_location='cpu'))
    print('Loaded weights from {}\n'.format(configs.pretrained_path))
    configs.device = torch.device('cpu' if configs.no_cuda else 'cuda:{}'.format(configs.gpu_idx))
    model = model.to(device=configs.device)
    model.eval()
    out_cap = None
    demo_dataset = Demo_KittiDataset(configs)
    with torch.no_grad():
        for sample_idx in range(len(demo_dataset)):
            metadatas, front_bevmap, back_bevmap, img_rgb = demo_dataset.load_bevmap_front_vs_back(sample_idx)
            front_detections, front_bevmap, fps = do_detect(configs, model, front_bevmap, is_front=True)
            back_detections, back_bevmap, _ = do_detect(configs, model, back_bevmap, is_front=False)
            # Draw prediction in the image
            front_bevmap = (front_bevmap.permute(1, 2, 0).numpy() * 255).astype(np.uint8)
            front_bevmap = cv2.resize(front_bevmap, (cnf.BEV_WIDTH, cnf.BEV_HEIGHT))
            front_bevmap = draw_predictions(front_bevmap, front_detections, configs.num_classes)
            # Rotate the front_bevmap
            front_bevmap = cv2.rotate(front_bevmap, cv2.ROTATE_90_COUNTERCLOCKWISE)
            # Draw prediction in the image
            back_bevmap = (back_bevmap.permute(1, 2, 0).numpy() * 255).astype(np.uint8)
            back_bevmap = cv2.resize(back_bevmap, (cnf.BEV_WIDTH, cnf.BEV_HEIGHT))
            back_bevmap = draw_predictions(back_bevmap, back_detections, configs.num_classes)
            # Rotate the back_bevmap
            back_bevmap = cv2.rotate(back_bevmap, cv2.ROTATE_90_CLOCKWISE)
            # merge front and back bevmap
            full_bev = np.concatenate((back_bevmap, front_bevmap), axis=1)
            img_path = metadatas['img_path'][0]
            img_bgr = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2BGR)
            calib = Calibration(configs.calib_path)
            kitti_dets = convert_det_to_real_values(front_detections)
            if len(kitti_dets) > 0:
                kitti_dets[:, 1:] = lidar_to_camera_box(kitti_dets[:, 1:], calib.V2C, calib.R0, calib.P2)
                img_bgr = show_rgb_image_with_boxes(img_bgr, kitti_dets, calib)
                img_bgr = cv2.resize(img_bgr, (cnf.BEV_WIDTH * 2, 375))
            out_img = np.concatenate((img_bgr, full_bev), axis=0)
            write_credit(out_img, (50, 410), text_author='Cre: github.com/maudzung', org_fps=(900, 410), fps=fps)
            if out_cap is None:
                out_cap_h, out_cap_w = out_img.shape[:2]
                fourcc = cv2.VideoWriter_fourcc(*'MJPG')
                out_path = os.path.join(configs.results_dir, '{}_both_2_sides.avi'.format(configs.foldername))
                print('Create video writer at {}'.format(out_path))
                out_cap = cv2.VideoWriter(out_path, fourcc, 30, (out_cap_w, out_cap_h))
            out_cap.write(out_img)
    if out_cap:
        out_cap.release()

6.6 注意事项

热图上的最大池化：在推理过程中，对中心热图应用一个3×3的最大池化操作，以增强特征响应并抑制噪声。
预测选择：仅保留中心置信度大于0.2的前50个预测，专注于最可能的对象中心。
航向角计算：使用反正切计算每个检测的偏航角，即虚部与实部的比值，提供检测对象的方向。

6.7 推理详细解析

下载演示数据集：脚本从指定的URL下载KITTI Vision 360数据集的一个较小样本，然后将其解压到指定的数据集目录中。这一步对于获取推理所需的数据至关重要。
模型和权重初始化：使用create_model(configs)创建模型，并从configs.pretrained_path加载预训练权重。这一步确保模型已经学习到了进行准确预测所需的特征，并且模型被移动到配置中指定的设备（CPU或GPU）上。
推理循环：管道遍历Demo_KittiDataset，该数据集可能包含数据集中每个样本的BEV图和其他相关数据。对于每个样本，它加载前后BEV图以及其他元数据。分别对前后BEV图调用do_detect函数。这个函数执行实际的对象检测，输出检测结果和修改后的BEV图。
BEV图调整：对BEV图（前后）进行处理（转置、调整大小），并使用draw_predictions在其上绘制预测结果。然后旋转这些图以获得正确的方向，并将前后BEV图连接起来形成一个完整的BEV视角。
转换和校准：将常规RGB图像转换为BGR格式（OpenCV常用格式），并使用校准数据将检测结果转换为真实世界值，然后将RGB图像和完整的BEV图连接起来形成最终的输出图像。在图像上添加版权信息和每秒帧数（fps）信息。
将推理结果写入视频：如果尚未初始化，则创建一个VideoWriter对象，将输出写入视频文件。每个处理后的图像帧都被写入视频文件，创建一个检测过程的可视化。在过程结束时释放视频捕获，最终确定视频文件。

七、实验测试

7.1 实验结果分析

基于从实验中获得的推理可视化，可以得出以下观察结果：

BEV图：从传感器生成的自上而下的BEV 3D LiDAR深度图中检测到定位的对象。这将前后视图连接为一个完整的地图。
三类3D目标检测：在推理结果中检测到预定义的类别，如汽车、行人和自行车。这些类别在KITTI 360 Vision数据集中预先进行了注释。
定位准确性：使用3D边界框可视化检测到的对象，在2D RGB相机和3D LiDAR传感器两种模式中均如此。不仅如此，还可以观察到两种流中边界框放置的准确性。
实时性能：推理管道在训练该模型的同一深度学习机器上进行了测试，该机器配备了NVIDIA RTX 3080 Ti和12GB显存。在这种情况下，模型在实时推理期间实现了一致的160 - 180 FPS性能。

7.2 评估指标分析

在前几节中，我们对训练模型的视觉结果有了一定的理解。但是，该模型的性能仍有很大的提升空间。为此，让我们看看评估指标，这些指标是在训练过程中使用TensorBoard记录的。
在这里插入图片描述

7.3 学习率

学习率（LR）图显示了一个逐步衰减的计划，从略低于0.001开始，在第300步时逐渐下降到约0.0001。在特定间隔的急剧下降表明了预定的大幅降低LR的时期。在这些下降之间，LR趋于平稳，使模型能够稳定其学习。该图表明了在初始快速学习和随后的微调之间的平衡，遵循了模型训练中常见的LR调度实践。

7.4 训练损失

在这个特定实验中，KFPN模型总共训练了300个 epoch，训练损失图显示了多个下降趋势，初始高损失表明了早期学习阶段。随着训练的进行，所有损失指标，包括avg_loss、cen_offset_loss和total_loss，持续下降，表明模型在改进。值得注意的是，损失曲线在约69k步时开始趋于平稳，表明模型接近收敛。综合的total_loss也呈现下降趋势，反映了各个损失优化的累积效果。

7.5 验证损失

另一方面，验证损失图在初始下降后呈现持续上升趋势，这表明早期学习成功但随后出现过拟合。在50步之后持续的上升趋势表明模型的泛化能力下降。损失的波动表明学习的可变性，最终验证损失稳定在2.8695左右，高于其最小值，证实了随着时间的推移在未见过的数据上性能下降。

八、结论

本实验对使用关键点特征金字塔网络（KFPN）模型进行3D LiDAR目标检测的研究得出了几个关键见解。该模型在BEV地图中展示了强大的目标定位能力，整合了3D LiDAR深度图的前后视图以实现全面覆盖。目标检测的准确性值得注意，系统能够有效地识别和围绕汽车、行人和自行车（KITTI 360 Vision数据集中的三个关键类别）放置边界框。

在性能方面，该模型在NVIDIA RTX 3080 Ti上进行的实时推理测试显示出令人印象深刻的结果，始终达到160 - 180 FPS，强调了该模型在实际应用中部署的潜力，因为在这些应用中快速处理至关重要。在300个 epoch期间观察到的训练损失趋势强调了一个成功的学习阶段，所有损失指标都表明稳步改进并接近收敛。这与验证损失形成对比，验证损失在初始下降后显示增加，表明在50步之后可能存在过拟合。训练和验证损失之间的差异表明，虽然模型有效地学习了训练数据，但其对新数据的泛化能力需要进一步增强。

所进行的研究和获得的结果对先进驾驶辅助系统（ADAS）和自主导航系统的发展具有重要意义。该模型在准确快速目标检测方面的有效性为提高自动驾驶技术的安全性和效率开辟了道路。展望未来，解决过拟合问题并确保模型的泛化仍然是一个优先事项，有可能探索更复杂的正则化技术或自适应学习率计划，以优化模型在未见过数据集上的性能。

原文地址：https://learnopencv.com/3d-lidar-object-detection/#aioseo-code-walkthrough-kfpn