Bounding boxes augmentation for object detection

Different annotations formats¶

Bounding boxes are rectangles that mark objects on an image. There are multiple formats of bounding boxes annotations. Each format uses its specific representation of bouning boxes coordinates 每种格式都使用其特定的边界框坐标表示。. Albumentations supports four formats: pascal_vocalbumentationscoco, and yolo .

Let's take a look at each of those formats and how they represent coordinates 坐标 of bounding boxes.

As an example, we will use an image from the dataset named Common Objects in Context. It contains one bounding box that marks a cat. The image width is 640 pixels, and its height is 480 pixels. The width of the bounding box is 322 pixels, and its height is 117 pixels.

 An example image with a bounding box from the COCO dataset

pascal_voc¶

pascal_voc is a format used by the Pascal VOC dataset. Coordinates of a bounding box are encoded with four values in pixels: [x_min, y_min, x_max, y_max]. x_min and y_min are coordinates of the top-left corner of the bounding box. x_max and y_max are coordinates of bottom-right corner of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 420, 462].

albumentations¶

albumentations is similar to pascal_voc, because it also uses four values [x_min, y_min, x_max, y_max] to represent a bounding box. But unlike pascal_vocalbumentations uses normalized values. To normalize values, we divide coordinates in pixels for the x- and y-axis by the width and the height of the image.

Coordinates of the example bounding box in this format are [98 / 640, 345 / 480, 420 / 640, 462 / 480] which are [0.153125, 0.71875, 0.65625, 0.9625].

Albumentations uses this format internally 内部 to work with bounding boxes and augment them.

coco¶

coco is a format used by the Common Objects in Context COCOCOCO dataset.

In coco, a bounding box is defined by four values in pixels [x_min, y_min, width, height]. They are coordinates of the top-left corner along with the width and height of the bounding box.

Coordinates of the example bounding box in this format are [98, 345, 322, 117].

yolo¶

In yolo, a bounding box is represented by four values [x_center, y_center, width, height]x_center and y_center are the normalized coordinates of the center of the bounding box. To make coordinates normalized, we take pixel values of x and y, which marks the center of the bounding box on the x- and y-axis. Then we divide the value of x by the width of the image and value of y by the height of the image. width and height represent the width and the height of the bounding box. They are normalized as well.

Coordinates of the example bounding box in this format are [((420 + 98) / 2) / 640, ((462 + 345) / 2) / 480, 322 / 640, 117 / 480] which are [0.4046875, 0.840625, 0.503125, 0.24375].

Bounding boxes augmentation¶

Just like with images and masks augmentation, the process of augmenting bounding boxes consists of 4 steps.

  1. You import the required libraries.
  2. You define an augmentation pipeline.
  3. You read images and bounding boxes from the disk.
  4. You pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.

Note

Some transforms in Albumentation don't support bounding boxes. If you try to use them you will get an exception. Please refer to this article to check whether a transform can augment bounding boxes.

Step 1. Import the required libraries.¶

import albumentations as A
import cv2

Step 2. Define an augmentation pipeline.¶

Here an example of a minimal declaration of an augmentation pipeline that works with bounding boxes.

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco'))

Note that unlike image and masks augmentation, Compose now has an additional parameter bbox_params. You need to pass an instance of A.BboxParams to that argument. A.BboxParams specifies settings for working with bounding boxes. format sets the format for bounding boxes coordinates.

It can either be pascal_vocalbumentationscoco or yolo. This value is required because Albumentation needs to know the coordinates' source format for bounding boxes to apply augmentations correctly.

Besides formatA.BboxParams supports a few more settings.

Here is an example of Compose that shows all available settings with A.BboxParams:

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', min_area=1024, min_visibility=0.1, label_fields=['class_labels']))

min_area and min_visibility

min_area and min_visibility parameters control what Albumentations should do to the augmented bounding boxes if their size has changed after augmentation. The size of bounding boxes could change if you apply spatial augmentations 空间增强 , for example, when you crop 裁剪 a part of an image or when you resize an image.

min_area is a value in pixels 是以像素为单位的值. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won't contain that bounding box.

min_visibility is a value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won't be present in the returned list of the augmented bounding boxes.

Here is an example image that contains two bounding boxes. Bounding boxes coordinates are declared using the coco format.

 An example image with two bounding boxes

First, we apply the CenterCrop augmentation without declaring parameters min_area and min_visibility. The augmented image contains two bounding boxes.

 An example image with two bounding boxes after applying augmentation

Next, we apply the same CenterCrop augmentation, but now we also use the min_area parameter. Now, the augmented image contains only one bounding box, because the other bounding box's area after augmentation became smaller than min_area, so Albumentations dropped that bounding box.

 An example image with one bounding box after applying augmentation with 'min_area'

Finally, we apply the CenterCrop augmentation with the min_visibility. After that augmentation, the resulting image doesn't contain any bounding box, because visibility of all bounding boxes after augmentation are below threshold set by min_visibility.

An example image with zero bounding boxes after applying augmentation with 'min_visibility' 

Class labels for bounding boxes¶

Besides coordinates, each bounding box should have an associated class label that tells which object lies inside the bounding box. There are two ways to pass a label for a bounding box.

Let's say you have an example image with three objects: dogcat, and sports ball. Bounding boxes coordinates in the coco format for those objects are [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49].

An example image with 3 bounding boxes from the COCO dataset

1. You can pass labels along with bounding boxes coordinates by adding them as additional values to the list of coordinates.¶

For the image above, bounding boxes with class labels will become [23, 74, 295, 388, 'dog'][377, 294, 252, 161, 'cat'], and [333, 421, 49, 49, 'sports ball'].

Class labels could be of any type: integer, string, or any other Python data type. For example, integer values as class labels will look the following: [23, 74, 295, 388, 18][377, 294, 252, 161, 17], and [333, 421, 49, 49, 37].

 Also, you can use multiple class values for each bounding box, for example [23, 74, 295, 388, 'dog', 'animal'][377, 294, 252, 161, 'cat', 'animal'], and [333, 421, 49, 49, 'sports ball', 'item'].

2.You can pass labels for bounding boxes as a separate list (the preferred way).¶

 For example, if you have three bounding boxes like [23, 74, 295, 388][377, 294, 252, 161], and [333, 421, 49, 49] you can create a separate list with values like ['cat', 'dog', 'sports ball'], or [18, 17, 37] that contains class labels for those bounding boxes. Next, you pass that list with class labels as a separate argument to the transform function. Albumentations needs to know the names of all those lists with class labels to join them with augmented bounding boxes correctly. Then, if a bounding box is dropped after augmentation because it is no longer visible, Albumentations will drop the class label for that box as well. Use label_fields parameter to set names for all arguments in transform that will contain label descriptions for bounding boxes (more on that in Step 4).

Step 3. Read images and bounding boxes from the disk.¶

Read an image from the disk.

image = cv2.imread("/path/to/image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Bounding boxes can be stored on the disk in different serialization formats: JSON, XML, YAML, CSV, etc. So the code to read bounding boxes depends on the actual format of data on the disk.

After you read the data from the disk, you need to prepare bounding boxes for Albumentations.

Albumentations expects that bounding boxes will be represented 表示 as a list of lists. Each list contains information about a single bounding box. A bounding box definition should have at list four elements that represent the coordinates of that bounding box. The actual meaning of those four values depends on the format of bounding boxes (either pascal_vocalbumentationscoco, or yolo). Besides four coordinates, each definition of a bounding box may contain one or more extra values. You can use those extra values to store additional information about the bounding box, such as a class label of the object inside the box. During augmentation, Albumentations will not process those extra values. The library will return them as is along with the updated coordinates of the augmented bounding box 库将按原样返回它们以及增强边界框的更新坐标.

Step 4. Pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.¶

As discussed in Step 2, there are two ways of passing class labels along with bounding boxes coordinates:

1. Pass class labels along with coordinates.¶

So, if you have coordinates of three bounding boxes that look like this:

bboxes = [
    [23, 74, 295, 388],
    [377, 294, 252, 161],
    [333, 421, 49, 49],
]

you can add a class label for each bounding box as an additional element of the list along with four coordinates. So now a list with bounding boxes and their coordinates will look the following:

bboxes = [
    [23, 74, 295, 388, 'dog'],
    [377, 294, 252, 161, 'cat'],
    [333, 421, 49, 49, 'sports ball'],
]

or with multiple labels per each bounding box:

bboxes = [
    [23, 74, 295, 388, 'dog', 'animal'],
    [377, 294, 252, 161, 'cat', 'animal'],
    [333, 421, 49, 49, 'sports ball', 'item'],
]

You can use any data type for declaring class labels. It can be string, integer, or any other Python data type.

Next, you pass an image and bounding boxes for it to the transform function and receive the augmented image and bounding boxes.

transformed = transform(image=image, bboxes=bboxes)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']

 Example input and output data for bounding boxes augmentation

2. Pass class labels in a separate argument to transform (the preferred way).¶

 Let's say you have coordinates of three bounding boxes

bboxes = [
    [23, 74, 295, 388],
    [377, 294, 252, 161],
    [333, 421, 49, 49],
]

You can create a separate list that contains class labels for those bounding boxes:

class_labels = ['cat', 'dog', 'parrot']

Then you pass both bounding boxes and class labels to transform. Note that to pass class labels, you need to use the name of the argument that you declared in label_fields when creating an instance of Compose in step 2. In our case, we set the name of the argument to class_labels.

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']

Example input and output data for bounding boxes augmentation with a separate argument for class labels 

Note that label_fields expects a list, so you can set multiple fields that contain labels for your bounding boxes. So if you declare Compose like

transform = A.Compose([
    A.RandomCrop(width=450, height=450),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
], bbox_params=A.BboxParams(format='coco', label_fields=['class_labels', 'class_categories'])))

you can use those multiple arguments to pass info about class labels, like

class_labels = ['cat', 'dog', 'parrot']
class_categories = ['animal', 'animal', 'item']

transformed = transform(image=image, bboxes=bboxes, class_labels=class_labels, class_categories=class_categories)
transformed_image = transformed['image']
transformed_bboxes = transformed['bboxes']
transformed_class_labels = transformed['class_labels']
transformed_class_categories = transformed['class_categories']

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/226998.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

TCP聊天

一、项目创建 二、代码 Client类 package tcp; import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.net.Socket; import java.util.Scanner; public class Client { public sta…

Elasticsearch- 环境-Windows集群部署和环境-Linux单节点部署和Linux集群部署-03

Elasticsearch环境 环境-简介 单机 & 集群 单台 Elasticsearch 服务器提供服务,往往都有最大的负载能力,超过这个阈值,服务器性能就会大大降低甚至不可用,所以生产环境中,一般都是运行在指定服务器集群中。除了…

STM32 cubeMX 呼吸灯实验

文章代码使用 HAL 库。 文章目录 一、1.PWM原理二、LED 原理图三、使用cubemx 配置 led四、PWM 相关函数五、PWM占空比占空比计算六、PWM 呼吸灯重要代码总结 呼吸灯 一、1.PWM原理 PWM全称为脉冲宽度调制(Pulse Width Modulation),是一种常…

软著项目推荐 深度学习验证码识别 - 机器视觉 python opencv

文章目录 0 前言1 项目简介2 验证码识别步骤2.1 灰度处理&二值化2.2 去除边框2.3 图像降噪2.4 字符切割2.5 识别 3 基于tensorflow的验证码识别3.1 数据集3.2 基于tf的神经网络训练代码 4 最后 0 前言 🔥 优质竞赛项目系列,今天要分享的是 &#x…

php循环遍历删除文件下文件和目录

前言 今天在写一个demo的时候需要循环删除目录下文件。如下想删temp下文件和目录。 具体实现 private function deleteDir($dirPath){if (is_dir($dirPath)) {$contents scandir($dirPath);// 如果是空目录if (count($contents) 2) {rmdir($dirPath);return;}// 不是空目录f…

windows MYSQL下载和自定路径安装,以及解决中文乱码问题。

文章讲的很详细,请耐心往下看。 一、mysql下载 下载网址:https://www.mysql.com/downloads/ 表示不登录,直接下载。 以上就把安装包下载完了。下载是8.0.35版本。 二、接下来看怎么安装 1.双击安装包,进行安装。 注意&#x…

【论文阅读笔记】M3Care: Learning with Missing Modalities in Multimodal Healthcare Data

本文介绍了一种名为“MCare”的模型,旨在处理多模态医疗保健数据中的缺失模态问题。这个模型是端到端的,能够补偿病人缺失模态的信息,以执行临床分析。MCare不是生成原始缺失数据,而是在潜在空间中估计缺失模态的任务相关信息&…

【web安全】文件包含漏洞详细整理

前言 菜某的笔记总结,如有错误请指正。 本文用的是PHP语言作为案例 文件包含漏洞的概念 开发者使用include()等函数,可以把别的文件中的代码引入当前文件中执行,而又没有对用户输入的内容进行充分的过滤&#xff0…

算法通关村第十八关-青铜挑战回溯是怎么回事

大家好我是苏麟 , 今天聊聊回溯是怎么个事 . 回溯是最重要的算法思想之一,主要解决一些暴力枚举也搞不定的问题,例如组合、分割、子集、排列,棋盘等。从性能角度来看回溯算法的效率并不高,但对于这些暴力都搞不定的算法能出结果就…

区分node,npm,nvm

目录 一,nodejs二,npm三,nvm 区分node,npm,nvm 几年前学习前端的时候学习的就是htmlcssjs 三件套。 现在只学习这些已经不能满足需要了。 一,nodejs nodejs是编程语言javascript运行时环境。(比…

【复杂gRPC之Java调用go】

1 注意点 一般上来说如果java调用java的话,我们可以使用springcloud来做,而面对这种跨语言的情况下,gRPC就展现出了他的优势。 代码放在这了,请结合前面的go服务器端一起使用 https://gitee.com/guo-zonghao/java-client-grpc /…

阿里云实时数据仓库HologresFlink

1. 实时数仓Hologres特点 专注实时场景:数据实时写入、实时更新,写入即可见,与Flink原生集成,支持高吞吐、低延时、有模型的实时数仓开发,满足业务洞察实时性需求。亚秒级交互式分析:支持海量数据亚秒级交…

量子算力引领未来!玻色量子出席第二届CCF量子计算大会

​8月19日至20日,中国计算机学会(CCF)主办的第二届CCF量子计算大会暨中国量子计算峰会(CQCC 2023)在中国合肥成功举办。本届大会以“量超融合,大国算力”为主题,设有量子计算软件、硬件、应用生…

机器学习应用 | 使用 MATLAB 进行异常检测(上)

异常检测任务,指的是检测偏离期望行为的事件或模式,可以是简单地检测数值型数据中,是否存在远超出正常取值范围的离群值,也可以是借助相对复杂的机器学习算法识别数据中隐藏的异常模式。 在不同行业中,异常检测的典型…

智能优化算法应用:基于材料生成算法无线传感器网络(WSN)覆盖优化 - 附代码

智能优化算法应用:基于材料生成算法无线传感器网络(WSN)覆盖优化 - 附代码 文章目录 智能优化算法应用:基于材料生成算法无线传感器网络(WSN)覆盖优化 - 附代码1.无线传感网络节点模型2.覆盖数学模型及分析3.材料生成算法4.实验参数设定5.算法结果6.参考…

Tomcat头上有个叉叉

问题原因: 这是因为它就是个空的tomcat,并没有导入项目运行 解决方案: war模式:发布模式,正式发布时用,将WEB工程以war包的形式上传到服务器 war exploded模式:开发时用,将WEB工程的文件夹直接…

Navicat 连接 GaussDB分布式的快速入门

Navicat Premium(16.3.3 Windows版或以上)正式支持 GaussDB 分布式数据库。GaussDB分布式模式更适合对系统可用性和数据处理能力要求较高的场景。Navicat 工具不仅提供可视化数据查看和编辑功能,还提供强大的高阶功能(如模型、结构…

【2023年网络安全优秀创新成果大赛专刊】医疗机构临床数据合规共享解决方案(美创科技)

“2023年网络安全优秀创新成果大赛”由中央网信办网络安全协调局指导,中国网络安全产业联盟(CCIA)主办。本次大赛由3场分站赛、3场专题赛、1场大学生创新创业作品赛组成。 在杭州分站赛,美创科技—“医疗机构临床合规共享解决方案…

redis-学习笔记(list)

因为 list 可以头插头删, 尾插尾删, 所以其实更像 C 中的 deque (双端队列) ---- 知道就好, 别乱说, 具体底层编码是啥, 俺也不知道(没注意过) 可以通过组合, 把 list 当作队列 / 栈来用 list 的几种底层编码: ziplist(压缩列表) , linkedlist(链表) , quicklist ziplist 就是将…

docker镜像仓库hub.docker.com无法访问

docker镜像仓库hub.docker.com无法访问 文章主要内容: 介绍dockerhub为什么无法访问解决办法 1 介绍dockerhub为什么无法访问 最近许多群友都询问为什么无法访问Docker镜像仓库,于是我也尝试去访问,结果果然无法访问。 大家的第一反应就是…