[论文精读]Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

论文网址:[2304.08876] 用于定向微小目标检测的动态粗到细学习 (arxiv.org)

论文代码:https://github.com/ChaselTsui/mmrotate-dcfl

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

1. 省流版

1.1. 心得

(1)为什么学脑科学的我要看这个啊?愿世界上没有黑工

(2)最开始写小标题的时候就发现了,分得好细啊,好感度++

(3)作为一个外行人,这文章感觉提出了好多东西

1.2. 论文总结图

2. 论文逐段精读

2.1. Abstract

        ①Extreme geometric shapes (tiny) and finite features (few pixels) of tiny rotating objects will cause serious mismatch (inaccurate positional prior?) and imbalance (inaccurate positive sample features?) issues

        ②They proposed dynamic prior and coarse-to-fine assigner, called DCFL

posterior  adj.在后部的;在后面的  n.臀部;屁股

2.2. Introduction

        ①Oriented bounding box greatly eliminates redundant background area, especially in aerial images

        ②Comparison figure:

where M* denotes matching function;

green, blue and red boxes are true positive, false positive, and false negative predictions respectively,

the left figure set is static and the right is dynamic

        ③Figure of mismatch and imbalance issues:

each point in the left figure denotes a prior location(先验打那么多个点啊...而且为啥打得那么整齐,这是什么one-stage吗

饼状图是说当每个框都是某个角度的时候吗?当每个框都不旋转的时候阳性样本平均数量是5.2?还是说饼状图的意思是自由旋转,某个特定角度的框的阳性样本是多少多少?这个饼状图并没有横向比较诶,只有这张图自己内部比较。

柱状图是锚框大小不同下平均阳性

        ④They introduce dynamic Prior Capturing Block (PCB) as their prior method. Based on this, they further utilize Cross-FPN-layer Coarse Positive Sample (CPS) to assign labels. After that, they reorder these candidates by prediction (posterior), and present gt by finer Dynamic Gaussian Mixture Model (DGMM)

eradicate  vt.根除;消灭;杜绝  n.根除者;褪色灵

2.3. Related Work

2.3.1. Oriented Object Detection

(1)Prior for Oriented Objects

(2)Label Assignment

2.3.2. Tiny Object Detection

(1)Multi-scale Learning

(2)Label Assignment

(3)Context Information

(4)Feature Enhancement

2.4. Method

(1)Overview

        ①For a set of dense prior P\in\mathbb{R}^{W\times H\times C}, where W denotes width, H denotes height and C denotes the number of shape information(什么东西啊,是那些点吗), mapping it to D by Deep Neural Network (DNN):

D=\mathrm{DNN}_{h}(P)

where \mathrm{DNN}_{h} represents the detection head(探测头...外行不太懂,感觉也就是一个函数嘛?);

one part D_{cls}\in\mathbb{R}^{W\times H\times A} in D denotes the classification scores, where A means the class number(更被认为是阳性的样本那层的W\times H里的数据会更大吗);

one part D_{reg}\in\mathbb{R}^{W\times H\times B} in D denotes the classification scores, where B means the box parameter number(查宝说是w, h, x, y, a之类的是box parameter

        ②In static methods, the pos labels assigned for P is G=\mathcal{M}_{s}(P,GT)

        ③In dynamic methods, the pos labels set G integrate posterior information: G={\mathcal M}_{d}(P,D,GT)

        ④The loss function:

\mathcal{L}=\sum_{i=1}^{N_{pos}}\mathcal{L}_{pos}(D_{i},G_{i})+\sum_{j=1}^{N_{neg}}\mathcal{L}_{neg}(D_{j},y_{j})

where N_{pos} and N_{neg} represent the number of positive and negative samples, y_i is the neg labels set

        ⑤Modelling D{\mathcal M}_{d} and GT:

\tilde{D}=\mathrm{DNN}_{h}(\underbrace{\mathrm{DNN}_{p}(P)}_{\text{Dynamic Prior}\hat{P}})

\tilde{G}=\mathcal{M}_{d}(\mathcal{M}_{s}(\tilde{P},GT),\tilde{GT})

\mathcal{L}=\sum_{i=1}^{\hat{N}_{pos}}\mathcal{L}_{pos}(\tilde{D}_{i},\tilde{G}_{i})+\sum_{j=1}^{\tilde{N}_{neg}}\mathcal{L}_{neg}(\tilde{D}_{j},y_{j})

2.4.1. Dynamic Prior

        ①Flexibility may alleviate mismatch problem

        ②Each prior represents a feature point

        ③The structure of Prior Capturing Block (PCB):

the surrounding information is considered by dilated convolution. Then caputure dynamic prior by Deformable Convolution Network (DCN). Moreover, using the offset learned from the regression branch to guide feature extraction in the classification branch and improve alignment between the two tasks.

        ④To achieve dynamic prior capturing, initializing each prior loaction \mathbf{p}(x,y) by each feature point’s spatial location \mathbf{s}. In each iteration, capture the offset set of each prior position \Delta \mathbf{o} to update \mathbf{s}:

\tilde{\mathbf{s}}=\mathbf{s}+st\sum_{i=1}^{n}\Delta\mathbf{o}_{i}/2n

where st denotes the stride of feature map, n denotes the number of offsets;

2D Gaussian distribution \mathcal{N}_{p}(\boldsymbol{\mu}_{p},\boldsymbol{\Sigma}_{p}) is regarded as the prior distribution;

动态的\tilde{\mathbf{s}}作为高斯的平均向量\boldsymbol{\mu}_{p}啥玩意儿??);

        ⑤Presetting a square \left ( w,h,\theta \right ) on each feature point

        ⑥The co-variance matrix:

\Sigma_p=\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}\begin{bmatrix}\frac{w^2}{4}&0\\0&\frac{h^2}{4}\end{bmatrix}\begin{bmatrix}\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix}\\\\ =\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}\begin{bmatrix}\frac{w}{2}&0\\0&\frac{h}{2}\end{bmatrix}\begin{bmatrix}\frac{w}{2}&0\\0&\frac{h}{2}\end{bmatrix}\begin{bmatrix}\cos\theta&\sin\theta\\-\sin\theta&\cos\theta\end{bmatrix}\\\\ =RR^T

dilate  v.扩张;(使)膨胀;扩大    deformable  adj.可变形的;应变的;易变形的

2.4.2. Coarse Prior Matching

        ①For prior, limiting gt to a single FPN may cause sub-optimal layer selection and releasing gt to all layers may cause slow convergence

        ②Therefore, they propose Cross-FPN-layer Coarse Positive Sample (CPS) candidates, expanding candidate layers to gt's nearby spatial location and adjacent FPN layers

        ③Generalized Jensen-Shannon Divergence (GJSD) constructs CPS between \mathcal{N}_{p}(\boldsymbol{\mu}_{p},\boldsymbol{\Sigma}_{p}) and \mathcal{N}_{g}(\boldsymbol{\mu}_{g},\boldsymbol{\Sigma}_{g}):

\mathrm{GJSD}(\mathcal{N}_{p},\mathcal{N}_{g})=(1-\alpha)\mathrm{KL}(\mathcal{N}_{\alpha},\mathcal{N}_{p})+\alpha\mathrm{KL}(\mathcal{N}_{\alpha},\mathcal{N}_{g})

\left\{\begin{matrix} \operatorname{KL}\left(P\left\|Q\right)\right. =\sum P\left(x\right)\log\frac{P\left(x\right)}{Q\left(x\right)} \\\\ \operatorname{KL}\left(P\left\|Q\right)\right) =\int P\left(x\right)\log\frac{P\left(x\right)}{Q\left(x\right)}dx \end{matrix}\right.

which yields a closed-form solution;

where \Sigma_{\alpha}=(\Sigma_{p}\Sigma_{g})_{\alpha}^{\Sigma}=\left((1-\alpha)\Sigma_{p}^{-1}+\alpha\Sigma_{g}^{-1}\right)^{-1};

\begin{aligned} \mu_{\alpha}& =\left(\mu_{p}\mu_{g}\right)_{\alpha}^{\mu} \\ &=\Sigma_{\alpha}\left((1-\alpha)\Sigma_{p}^{-1}\mu_{p}+\alpha\Sigma_{g}^{-1}\mu_{g}\right) \end{aligned}

and due to the homogeneity of \mathcal{N}_{p} and \mathcal{N}_{g}\alpha =0.5

        ④Choosing top K prior with highest GJSD for each gt(选差异最大的那些)

2.4.3. Finer Dynamic Posterior Matching

        ①Two main steps are contained in this section, a posterior re-ranking strategy and a Dynamic Gaussian Mixture Model (DGMM) constraint

        ②The Possibility of becoming True predictions (PT) of the i^{th} sample D_i is:

PT_i=\frac{1}{2}Cls(D_i)+\frac{1}{2}IoU(D_i,gt_i)

choosing top Q samples with the highest scores as Medium Positive Sample (MPS) candidates

        ③They apply DGMM, which contains geometry center and semantic center in one object, to filter far samples

        ④For specific instance gt_i, the mean vector \boldsymbol{\mu}_{i,1} of the first Gaussian is the geometry center \left ( cx_i,cy_i \right ), the deduced \boldsymbol{\mu}_{i,2} in MPS denotes semantic center \left ( sx_i,sy_i \right )

        ⑤Parameterizing a instance:

DGMM_i(s|x,y)=\sum_{m=1}^2w_{i,m}\sqrt{2\pi|\Sigma_{i,m}|}\mathcal{N}_{i,m}(\mu_{i,m},\Sigma_{i,m})

where w_{i,m} denotes weight of each Gaussian distribution and their summation is 1;

\mu_{i,m} equals to gt's \boldsymbol{\Sigma}_{g}什么啊这是,但是m可以等于1或者2诶,那你g的协方差不就又是语义中心又是几何中心了吗

        ⑥For any DGMM(s|MPS)<e^{-g}, setting negative masks

2.5.  Experiments

2.5.1. Datasets

        ①Datasets: DOTAv1.0 /v1.5/v2.0, DIOR-R, VisDrone, and MS COCO

        ②Ablation dataset: DOTA-v2.0 with the most numbet of tiny objects

        ③Comparing dataset: DOTA-v1.0, DOTAv1.5, DOTA-v2.0, VisDrone2019, MS COCO and DIOR-R

2.5.2. Implementation Details

        ①Batch size: 4

        ②Framework based: MMDetection and MMRotate

        ③Backbone: ImageNet pre-trained models

        ④Learning rate: 0.005 with SGD

        ⑤Momentum: 0.9

        ⑥Weight decay: 0.0001

        ⑦Default backbone: ResNet-50 with FPN

        ⑧Loss: Focal loss for classifying and IoU loss for regression

        ⑨Data augmentation: random flipping

        ⑩On DOTA-v1.0 and DOTA-v2.0, using official setting to crop images to 1024×1024. The overlap is 200 and epoch is 12

        ⑪On other datasets, setting the input size to 1024 × 1024 (overlap 200), 800 × 800, 1333 × 800, and 1333×800 for DOTA-v1.5, DIOR-R, VisDrone, and COCO respectively. Epoch is set as  40, 40, 12, and 12 on the DOTA-v1.5, DIOR-R, COCO, and VisDrone

2.5.3. Main Results

(1)Results on DOTA series

        ①Comparison table on DOTA-v2.0 OBB:

where the red ones are the best and the blue ones are the second best performance on each metric

        ②Comparison table on DOTA-v1.0 OBB:

        ③Comparison table on DOTA-v1.5 OBB:

(2)Results on DIOR-R

        ①Comparison table on DIOR-R:

        ②Results of typical tiny objects vehicle, bridge, and wind-mill:

(3)Results on HBB Datasets

        ①Comparison table on VisDrone, MS COCO abd DOTA-v2.0 HBB:

2.5.4. Ablation Study

(1)Effects of Individual Strategy

        ①Employ prior on each feature point

        ②Individual effectiveness:

(2)Comparisons of Different CPS

        ①Ablation:

(3)Fixed Prior and Dynamic Prior

        ①Ablation:

(4)Detailed Design in PCB

(5)Effects of Parameters

2.6. Analysis

(1)Reconciliation of imbalance problems

(2)Visualization

(3)Speed

2.7. Conclusion

3. 知识补充

4. Reference List

Xu, C. et al. (2023) 'Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection', CVPR. doi: https://doi.org/10.48550/arXiv.2304.08876

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/464244.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

网站安全监测:守护网络空间的坚实防线

随着互联网技术的飞速发展和广泛应用&#xff0c;网站已成为企业、机构和个人展示形象、提供服务、传递信息的重要平台。然而&#xff0c;与此同时&#xff0c;网站也面临着日益严重的安全威胁。黑客攻击、数据泄露、恶意软件等安全问题频发&#xff0c;给网站运营者带来了巨大…

FFplay使用滤镜添加字幕到现有视频显示

1.创建字幕文件4k.srt 4k.srt内容: 1 00:00:01.000 --> 00:00:30.000 日照香炉生紫烟2 00:00:31.000 --> 00:00:60.000 遥看瀑布挂前川3 00:01:01.000 --> 00:01:30.000 飞流直下三千尺4 00:01:31.000 --> 00:02:00.000 疑是银河落九天2.通过使用滤镜显示字幕在视…

ping和telnet的区别

ping是ICMP协议&#xff0c;只包含控制信息没有端口&#xff0c;用于测试两个网络主机之间网络是否畅通 telnet是TCP协议&#xff0c;用于查看目标主机某个端口是否开发。 总结&#xff1a;ping是物理计算机间的网络互通检查&#xff0c;telnet是应用服务间的访问连通检查&am…

GPU密集型计算性能优化的方法和技术

对GPU密集型计算进行性能优化的方法和技术多种多样。通过一些优化策略和技术需要综合考虑应用程序的具体需求、所使用的GPU硬件、以及编程模型和库的选择。通过不断地分析和调整&#xff0c;可以实现GPU计算性能的持续提升。以下是一些常用的优化策略和技术&#xff1a; 算法优…

Oracle 部署及基础使用

1. Oracle 简介 Oracle Database&#xff0c;又名 Oracle RDBMS&#xff0c;简称 Oracle Oracle系统&#xff0c;即是以Oracle关系数据库为数据存储和管理作为构架基础&#xff0c;构建出的数据库管理系统。是目前最流行的客户/服务器&#xff08;client/server&#xff09;或…

监视和内存观察

监视和内存观察 5.监视和内存观察5.1 监视5.2 内存 5.监视和内存观察 在调试的过程中我们&#xff0c;如果要观察代码执行过程中&#xff0c;上下文环境中的变量的值&#xff0c;有哪些方法呢&#xff1f; 这些观察的前提条件一定是开始调试后观察&#xff0c;比如&#xff1…

金枪鱼群优化算法TSO优化BiLSTM-ATTENTION实现风力发电功率预测(matlab)

金枪鱼群优化算法TSO优化BiLSTM-ATTENTION实现风力发电功率预测&#xff08;matlab&#xff09; TSO-BiLSTM-Attention金枪鱼群算法优化长短期记忆神经网络结合注意力机制的数据回归预测 Matlab语言。 金枪鱼群优化算法&#xff08;Tuna Swarm Optimization&#xff0c;TSO)是一…

upload-labs第一关

上一篇文章中搭建好了upload-labs环境&#xff0c;接下来进行第一关的尝试&#xff0c;我也是第一次玩这个挺有意思。 1、第一关的界面是这样的先不看其他的源码&#xff0c;手动尝试下试试。 2、写一个简单的php一句话木马 3、直接上传&#xff0c;提示必须要照片格式的文…

论文阅读——BLIP

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation &#xff08;1&#xff09;单模态编码器&#xff0c;它分别对图像和文本进行编码。图像编码器用ViT&#xff0c;并使用附加的 [CLS] 标记来表示全局图像特征。文本…

20240314-2-字符串string

1.最长公共前缀 编写一个函数来查找字符串数组中的最长公共前缀。 如果不存在公共前缀&#xff0c;返回空字符串 “”。 示例 1: 输入: [“flower”,“flow”,“flight”] 输出: “fl” 示例 2: 输入: [“dog”,“racecar”,“car”] 输出: “” 解释: 输入不存在公共前缀…

面向对象编程第三式: 多态 (Java篇)

本篇会加入个人的所谓‘鱼式疯言’ ❤️❤️❤️鱼式疯言:❤️❤️❤️此疯言非彼疯言 而是理解过并总结出来通俗易懂的大白话, 小编会尽可能的在每个概念后插入鱼式疯言,帮助大家理解的. &#x1f92d;&#x1f92d;&#x1f92d;可能说的不是那么严谨.但小编初心是能让更多人…

brpc之ResourcePool

简介 ResourcePool用于管理资源&#xff0c;负责资源的分配以及回收 结构 BlockGroup&#xff1a;资源池中包含多个BlockGroup&#xff0c;最多65536个 Block&#xff1a;一个BlockGroup中包含多个Block&#xff0c;最多(1<<16)个&#xff1b;1个Block中包含BLOCK_NITE…

浅谈C/C++的常量const、指针和引用问题

今天我们来探讨C/C中const、指针和引用的相关问题。这些概念是编程中的重要组成部分&#xff0c;它们的正确使用对于代码的可读性和可维护性至关重要。通过深入了解const的不可变性、指针的灵活性以及引用的简洁性&#xff0c;我们能够更好地掌握编程的精髓&#xff0c;并写出更…

PLC_博图系列☞基本指令“SET_BF”置位位域

PLC_博图系列☞基本指令“SET_BF”置位位域 文章目录 PLC_博图系列☞基本指令“SET_BF”置位位域背景介绍SET_BF&#xff1a;置位位域说明类型为 PLC 数据类型、STRUCT 或 ARRAY 的位域参数示例 关键字&#xff1a; PLC、 西门子、 博图、 Siemens 、 SET_BF 背景介绍 这是…

【Algorithms 4】算法(第4版)学习笔记 19 - 6.0.4 网络流算法

文章目录 前言参考目录学习笔记1&#xff1a;介绍1.1&#xff1a;最小切分问题1.2&#xff1a;最大流问题1.3&#xff1a;小结2&#xff1a;Ford-Fulkerson 算法&#xff08;FF 算法&#xff09;2.1&#xff1a;介绍2.2&#xff1a;问题3&#xff1a;最大流量 - 最小切分定理 m…

ConsiStory:Training-Free的主体一致性生成

Overview 一、总览二、PPT详解 ConsiStory 一、总览 题目&#xff1a; Training-Free Consistent Text-to-Image Generation 机构&#xff1a;NVIDIA, Tel-Aviv University 论文&#xff1a;https://arxiv.org/pdf/2402.03286.pdf 代码&#xff1a;https://consistory-paper.g…

Github 2024-03-17 开源项目日报Top10

根据Github Trendings的统计,今日(2024-03-17统计)共有10个项目上榜。根据开发语言中项目的数量,汇总情况如下: 开发语言项目数量Python项目5TypeScript项目2Rust项目1JavaScript项目1C#项目1非开发语言项目1Solidity项目1《Hello 算法》:动画图解、一键运行的数据结构与算…

OPC UA 服务器的Web访问

基于Web 的应用非常普及&#xff0c;例如基于web 的SCADA &#xff0c;物联网 Dashboard 等等&#xff0c;那么基于Web 的应用如何访问OPC UA 服务器呢&#xff1f;本博文讨论这方面的问题。 Web 的通信方式 Web 是我们通常讲的网站&#xff0c;它由浏览器&#xff0c;HTTP 服…

腾讯云有免费服务器吗?在哪领取?

腾讯云免费服务器申请入口 https://curl.qcloud.com/FJhqoVDP 免费服务器可选轻量应用服务器和云服务器CVM&#xff0c;轻量配置可选2核2G3M、2核8G7M和4核8G12M&#xff0c;CVM云服务器可选2核2G3M和2核4G3M配置&#xff0c;腾讯云服务器网txyfwq.com分享2024年最新腾讯云免费…

elementUI两个select单选框联动

实现需求&#xff1a;两个单选框内容两栋&#xff0c;在选择第一个时&#xff0c;第二个选框能自动更新对应选项。且在切换第一个选项内容时&#xff0c;第二个选框会被清空且切换到新的对应选项。 设置值班班次和备班情况两个选项 &#xff0c;完整代码如下&#xff1a; <…