文章目录
- 1、任务描述
- 2、人脸检测模型
- 3、完整代码
- 4、结果展示
- 5、涉及到的库函数
- 6、参考
1、任务描述
基于质心实现多目标(以人脸为例)跟踪
人脸检测采用深度学习的方法
核心步骤:
步骤#1:接受边界框坐标并计算质心
步骤#2:计算新边界框和现有对象之间的欧几里得距离
步骤 #3:更新现有对象的 (x, y) 坐标
步骤#4:注册新对象
步骤#5:注销旧对象
当旧对象无法与任何现有对象匹配总共 N 个后续帧时,我们将取消注册。
2、人脸检测模型
输入 1x3x300x300
resnet + ssd,分支还是挺多的
3、完整代码
质心跟踪代码
# import the necessary packages
from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np
class CentroidTracker():
def __init__(self, maxDisappeared=65):
# initialize the next unique object ID along with two ordered
# dictionaries used to keep track of mapping a given object
# ID to its centroid and number of consecutive frames it has
# been marked as "disappeared", respectively
self.nextObjectID = 0
# 用于为每个对象分配唯一 ID 的计数器。如果对象离开帧并且在 maxDisappeared 帧中没有返回,则将分配一个新的(下一个)对象 ID。
self.objects = OrderedDict()
# 对象 ID 作为键,质心 (x, y) 坐标作为值的字典
self.disappeared = OrderedDict()
# 保存特定对象 ID(键)已被标记为“丢失”的连续帧数(值)
# store the number of maximum consecutive frames a given
# object is allowed to be marked as "disappeared" until we
# need to deregister the object from tracking
self.maxDisappeared = maxDisappeared
# 在我们取消注册该对象之前,允许将对象标记为“丢失/消失”的连续帧数。
def register(self, centroid):
# when registering an object we use the next available object
# ID to store the centroid
# 注册新的 id
self.objects[self.nextObjectID] = centroid
self.disappeared[self.nextObjectID] = 0
self.nextObjectID += 1
def deregister(self, objectID):
# to deregister an object ID we delete the object ID from
# both of our respective dictionaries
# 注销掉 id
del self.objects[objectID]
del self.disappeared[objectID]
def update(self, rects):
# check to see if the list of input bounding box rectangles
# is empty
# rects = (startX, startY, endX, endY)
if len(rects) == 0: # 没有检测到目标
# loop over any existing tracked objects and mark them
# as disappeared
# 我们将遍历所有对象 ID 并增加它们的disappeared计数
for objectID in list(self.disappeared.keys()):
self.disappeared[objectID] += 1
# if we have reached a maximum number of consecutive
# frames where a given object has been marked as
# missing, deregister it
if self.disappeared[objectID] > self.maxDisappeared:
self.deregister(objectID)
# return early as there are no centroids or tracking info
# to update
return self.objects
# initialize an array of input centroids for the current frame
inputCentroids = np.zeros((len(rects), 2), dtype="int")
# loop over the bounding box rectangles
for (i, (startX, startY, endX, endY)) in enumerate(rects):
# use the bounding box coordinates to derive the centroid
cX = int((startX + endX) / 2.0) # 检测框中心横坐标
cY = int((startY + endY) / 2.0) # 检测框中心纵坐标
inputCentroids[i] = (cX, cY)
# if we are currently not tracking any objects take the input
# centroids and register each of them
# 如果当前没有我们正在跟踪的对象,我们将注册每个新对象
if len(self.objects) == 0:
for i in range(0, len(inputCentroids)):
self.register(inputCentroids[i])
# otherwise, are are currently tracking objects so we need to
# try to match the input centroids to existing object
# centroids
else:
# grab the set of object IDs and corresponding centroids
objectIDs = list(self.objects.keys())
objectCentroids = list(self.objects.values())
# compute the distance between each pair of object
# centroids and input centroids, respectively -- our
# goal will be to match an input centroid to an existing
# object centroid
D = dist.cdist(np.array(objectCentroids), inputCentroids)
# in order to perform this matching we must (1) find the
# smallest value in each row and then (2) sort the row
# indexes based on their minimum values so that the row
# with the smallest value as at the *front* of the index
# list
rows = D.min(axis=1).argsort()
# next, we perform a similar process on the columns by
# finding the smallest value in each column and then
# sorting using the previously computed row index list
cols = D.argmin(axis=1)[rows]
# in order to determine if we need to update, register,
# or deregister an object we need to keep track of which
# of the rows and column indexes we have already examined
usedRows = set()
usedCols = set()
# loop over the combination of the (row, column) index
# tuples
for (row, col) in zip(rows, cols):
# if we have already examined either the row or
# column value before, ignore it
# val
# 老目标被选中过或者新目标被选中过
if row in usedRows or col in usedCols:
continue
# otherwise, grab the object ID for the current row,
# set its new centroid, and reset the disappeared
# counter
objectID = objectIDs[row] # 老目标 id
self.objects[objectID] = inputCentroids[col] # 更新老目标 id 的质心为新目标的质心,因为两者距离最近
self.disappeared[objectID] = 0 # 丢失索引重置为 0
# indicate that we have examined each of the row and
# column indexes, respectively
usedRows.add(row)
usedCols.add(col)
# compute both the row and column index we have NOT yet
# examined
unusedRows = set(range(0, D.shape[0])).difference(usedRows)
unusedCols = set(range(0, D.shape[1])).difference(usedCols)
# in the event that the number of object centroids is
# equal or greater than the number of input centroids
# we need to check and see if some of these objects have
# potentially disappeared
# 如果老目标不少于新目标
if D.shape[0] >= D.shape[1]:
# loop over the unused row indexes
for row in unusedRows:
# grab the object ID for the corresponding row
# index and increment the disappeared counter
objectID = objectIDs[row] # 老目标 id
self.disappeared[objectID] += 1 # 跟踪帧数 +1
# check to see if the number of consecutive
# frames the object has been marked "disappeared"
# for warrants deregistering the object
# 检查disappeared计数是否超过 maxDisappeared 阈值,如果是,我们将注销该对象
if self.disappeared[objectID] > self.maxDisappeared:
self.deregister(objectID)
# otherwise, if the number of input centroids is greater
# than the number of existing object centroids we need to
# register each new input centroid as a trackable object
else: # 新目标多于老目标
for col in unusedCols:
self.register(inputCentroids[col]) # 注册新目标
# return the set of trackable objects
return self.objects
self.nextObjectID = 0
用于为每个对象分配唯一 ID 的计数器。如果对象离开帧并且在 maxDisappeared 帧中没有返回,则将分配一个新的(下一个)对象 ID。
self.objects = OrderedDict()
对象 ID 作为键,质心 (x, y) 坐标作为值的字典
self.disappeared = OrderedDict()
保存特定对象 ID(键)已被标记为“丢失”的连续帧数(值)
self.maxDisappeared = maxDisappeared
,在我们取消注册该对象之前,允许将对象标记为“丢失/消失”的连续帧数。
有初始化,注册,注销,更新过程
核心代码在 def update
如果没有检测到目标,当前所有 id 的 self.disappeared
会加 1
如果注册的 id 为空,这当前帧所有质心会被注册上新的 id,否则会计算历史质心和当前帧质心的距离,距离较近的匹配上,更新 id 的质心,没有被分配上的历史质心对应 id 的 self.disappeared
会加 1,没有被分配的当前帧质心会被注册新的 id
超过 self.maxDisappeared
的 id 会被注销掉
根据 D
计算 rows
和 cols
的时候有点绕,看看下面这段注释就会好理解一些
"""
D
array([[ 2. , 207.48252939],
[206.65188119, 1.41421356]])
D.min(axis=1) 每行最小值
array([2. , 1.41421356])
rows 每行最小值再排序,表示按从小到大行数排序,比如最小的数在rows[0]所在的行,第二小的数在rows[1]所在的行
array([1, 0])
D.argmin(axis=1) 返回指定轴最小值的索引,也即每行最小值的索引
array([0, 1])
cols 每行每列最小值,按行从小到大排序
array([1, 0])
"""
再看看一个 shape 不一样的例子
import numpy as np
D = np.array([[2., 1, 3],
[1.1, 1.41421356, 0.7]])
print(D.min(axis=1))
print(D.argmin(axis=1))
rows = D.min(axis=1).argsort()
cols = D.argmin(axis=1)[rows]
print(rows)
print(cols)
"""
[1. 0.7]
[1 2]
[1 0]
[2 1]
"""
实际运用的时候,rows 是历史 ids,cols 是当前帧的 ids
人脸检测+跟踪+绘制结果 pipeline
# USAGE
# python object_tracker.py --prototxt deploy.prototxt --model res10_300x300_ssd_iter_140000.caffemodel
# import the necessary packages
from CentroidTracking.centroidtracker import CentroidTracker
import numpy as np
import argparse
import imutils
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", default="deploy.prototxt.txt",
help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", default="res10_300x300_ssd_iter_140000.caffemodel",
help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
help="minimum probability to filter weak detections")
args = vars(ap.parse_args())
# initialize our centroid tracker and frame dimensions
ct = CentroidTracker()
(H, W) = (None, None)
# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
# initialize the video stream
print("[INFO] starting video stream...")
vs = cv2.VideoCapture("4.mp4")
index = 0
# loop over the frames from the video stream
while True:
index += 1
# read the next frame from the video stream and resize it
rect, frame = vs.read()
if not rect:
break
frame = imutils.resize(frame, width=600)
(H, W) = frame.shape[:2]
# construct a blob from the frame, pass it through the network,
# obtain our output predictions, and initialize the list of
# bounding box rectangles
blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H),
(104.0, 177.0, 123.0))
net.setInput(blob)
detections = net.forward() # (1, 1, 200, 7)
"""detections[0][0][0]
array([0. , 1. , 0.9983138 , 0.418003 , 0.22666326,
0.5242793 , 0.50829136], dtype=float32)
第三个维度是 score
最后四个维度是左上右下坐标
"""
rects = [] # 记录所有检测框的左上右下坐标
# loop over the detections
for i in range(0, detections.shape[2]):
# filter out weak detections by ensuring the predicted
# probability is greater than a minimum threshold
if detections[0, 0, i, 2] > args["confidence"]:
# compute the (x, y)-coordinates of the bounding box for
# the object, then update the bounding box rectangles list
box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
rects.append(box.astype("int"))
# draw a bounding box surrounding the object so we can
# visualize it
(startX, startY, endX, endY) = box.astype("int")
cv2.rectangle(frame, (startX, startY), (endX, endY),
(0, 0, 255), 2)
# update our centroid tracker using the computed set of bounding
# box rectangles
objects = ct.update(rects)
# loop over the tracked objects
for (objectID, centroid) in objects.items():
# draw both the ID of the object and the centroid of the
# object on the output frame
text = "ID {}".format(objectID)
cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 0, 255), -1)
# show the output frame
cv2.imshow("Frame", frame)
# cv2.imwrite(f"./frame4/{str(index).zfill(3)}.jpg", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
break
# do a bit of cleanup
cv2.destroyAllWindows()
vs.release()
网络前向后的结果,第三个维度是 score,最后四个维度是左上右下坐标
核心跟新坐标的过程在 objects = ct.update(rects)
4、结果展示
单个人脸
tracking-face1
多个人脸
tracking-face2
多个人脸,存在漏检情况,id 还是持续了很长一段时间(算法中配置的是 50 帧)
tracking-face3
多个人脸
tracking-face4
5、涉及到的库函数
scipy.spatial.distance.cdist
是 SciPy 库中 spatial.distance 模块的一个函数,用于计算两个输入数组之间所有点对之间的距离。这个函数非常有用,特别是在处理大规模数据集时,需要计算多个点之间的距离矩阵。
函数的基本用法如下:
from scipy.spatial.distance import cdist
# 假设 XA 和 XB 是两个二维数组,其中每一行代表一个点的坐标
XA = ... # 形状为 (m, n) 的数组,m 是点的数量,n 是坐标的维度
XB = ... # 形状为 (p, n) 的数组,p 是另一个集合中点的数量
# 计算 XA 和 XB 中所有点对之间的距离
# metric 参数指定使用的距离度量,默认为 'euclidean'(欧氏距离)
D = cdist(XA, XB, metric='euclidean')
在这个例子中,D 将是一个形状为 (m, p) 的数组,其中 D[i, j] 表示 XA 中第 i 个点和 XB 中第 j 个点之间的距离。
cdist
函数支持多种距离度量,通过 metric 参数指定。除了默认的欧氏距离外,还可以选择如曼哈顿距离(‘cityblock’)、切比雪夫距离(‘chebyshev’)、闵可夫斯基距离(‘minkowski’,需要额外指定 p 值)、余弦距离(注意:虽然余弦通常用作相似度度量,但可以通过 1 - cosine(u, v) 转换为距离度量,不过 cdist 不直接支持负的余弦距离,需要手动计算)、汉明距离(‘hamming’,仅适用于布尔数组或整数数组)等。
6、参考
- https://github.com/gopinath-balu/computer_vision
- https://pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/
- 链接:https://pan.baidu.com/s/1UX_HmwwJLtHJ9e5tx6hwOg?pwd=123a
提取码:123a - 目标跟踪(7)使用 OpenCV 进行简单的对象跟踪
- https://pixabay.com/zh/videos/woman-phone-business-casual-154833/