0%

目标跟踪-DeepSort

简介

Sample online and realtime tracking (SORT)利用了帧和帧之间的关系,用卡尔曼滤波和匈牙利算法来进行多目标跟踪。但是SORT的ID switches次数很高,因为SORT在跟踪目标动作的可预测性很高的情况下性能比较好,但在有遮挡的场景表现不太好。

DeepSORT结合了目标的动作状态和图像特征信息。具体是利用CNN来分辨行人。

算法细节

跟踪处理与状态估计

目标的状态空间为:image-20220427115037637

其中u和v是bbox的中心店坐标,y是宽高比例,h是高,后面四个值是前面四个值在图像坐标上的加速度。我们使用了匀速运动和线性观测模型的卡尔曼滤波器。

每个轨道k都会预测最后成功匹配上目标的那一帧到下一次匹配到目标的那一帧之间的帧数a_k, 当匹配到时a_k清0,当a_k超过一个阈值A_max时,就会认为这个轨道k已经离开了画面,从轨道匹配池中去掉。当一个新的目标没办法匹配上任何一个k时,就会新建一个轨道。当一个新轨道的前三帧没有全部匹配上时,就删除这个轨道。

对象分配问题

为了考虑运动信息,我们使用预测的卡尔曼状态和新到的测量值之间的(平方)Mahalanobis距离:

image-20220427120402875

第i个track的在度量空间中的分布image-20220427135149852

第j个bbox的检测结果image-20220427135222368

通过计算image-20220427135346383

如果b=1,则第i个track和第j个检测结果是匹配的,这里t设计的阈值为9.4877

因为Mahalanobis距离在目标运动变化性较低的时候效果更好,而卡尔曼滤波对新的状态的预测也是一个粗略估计,再加上摄像机也有可能会发生抖动,所以在有遮挡时,Mahalanobis距离的效果可能不太好。

因此引入第二个metric来帮助跟踪任务进行对象分配。

为每个检测结果d_j构造一个外形描述向量r_j,且r_j的L1范数为1。同时保存一个track上最后L_k个r向量,所以我们第二个metric是比较第i个track和第j个检测结果的cosine距离:

image-20220427140300284

同上,也有image-20220427140408605

在实际使用时,我们使用CNN来计算这个r向量

最后将这两个b值进行统一:

image-20220427140815315

级联匹配(Matching Cascade)

解决的问题:当一个目标因为被遮挡等原因而失去追踪时,卡尔曼滤波器会增加预测目标位置的不确定性,从而导致预测的目标分布更分散,观测可能性(observation likelihood)也会分散,更难被匹配上。直觉上应该增加物体到track判断距离的阈值来应对这种情形。但是反直觉的是当有两个track来匹配同一个检测目标时,Mahalanobis距离会偏向于预测不确定性更大的track,因为这种操作会显著减少任何检测到映射track平均值的标准偏差距离。这是一个不太合理的行为,因为这样会导致tracks更加碎片化,增加了track的不稳定性。

因此我们提出级联匹配来优先匹配出现频率更高的目标,来应对我们发现的匹配相关性中的概率分散问题。

image-20220427143045821

image-20220427143554243是指有n帧没有检测结果匹配上的track。

第七行的意思是从当前帧的检测结果中找与已有track中cost值最小的检测目标

如果这个目标满足b值条件,就匹配上相应的track

外观描述向量(Deep Appearance Descriptor)

是一个基于CNN模型

image-20220427144143537

实验结果如下:

在这里插入图片描述

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class KalmanFilter(object):
"""
A simple Kalman filter for tracking bounding boxes in image space.

The 8-dimensional state space

x, y, a, h, vx, vy, va, vh

contains the bounding box center position (x, y), aspect ratio a, height h,
and their respective velocities.

Object motion follows a constant velocity model. The bounding box location
(x, y, a, h) is taken as direct observation of the state space (linear
observation model).

"""

def __init__(self):
ndim, dt = 4, 1.

# Create Kalman filter model matrices.
self.a_motion_mt = np.eye(2 * ndim, 2 * ndim) # 8 * 8的对角线矩阵
for i in range(ndim):
self._motion_mat[i, ndim + i] = dt
self._update_mat = np.eye(ndim, 2 * ndim)

# Motion and observation uncertainty are chosen relative to the current
# state estimate. These weights control the amount of uncertainty in
# the model. This is a bit hacky.
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160

这里self.a_motion_mt如下:

1
2
3
4
5
6
7
8
array([[1., 0., 0., 0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.]])

self._update_mat如下

1
2
3
4
array([[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.]])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def initiate(self, measurement):
"""
Create track from unassociated measurement.

Args:
measurement (ndarray): Bounding box coordinates (x, y, a, h) with
center position (x, y), aspect ratio a, and height h.

Returns:
The mean vector (8 dimensional) and covariance matrix (8x8
dimensional) of the new track. Unobserved velocities are
initialized to 0 mean.
"""
mean_pos = measurement
mean_vel = np.zeros_like(mean_pos)
mean = np.r_[mean_pos, mean_vel] # 这里是给4维位置相关参数配上4个加速度,初始化为0

std = [
2 * self._std_weight_position * measurement[3],
2 * self._std_weight_position * measurement[3], 1e-2,
2 * self._std_weight_position * measurement[3],
10 * self._std_weight_velocity * measurement[3],
10 * self._std_weight_velocity * measurement[3], 1e-5,
10 * self._std_weight_velocity * measurement[3]
]
covariance = np.diag(np.square(std)) # 8 * 8矩阵
return mean, covariance

def predict(self, mean, covariance):
"""
Run Kalman filter prediction step.

Args:
mean (ndarray): The 8 dimensional mean vector of the object state
at the previous time step.
covariance (ndarray): The 8x8 dimensional covariance matrix of the
object state at the previous time step.

Returns:
The mean vector and covariance matrix of the predicted state.
Unobserved velocities are initialized to 0 mean.
"""
std_pos = [
self._std_weight_position * mean[3], self._std_weight_position *
mean[3], 1e-2, self._std_weight_position * mean[3]
]
std_vel = [
self._std_weight_velocity * mean[3], self._std_weight_velocity *
mean[3], 1e-5, self._std_weight_velocity * mean[3]
]
motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))

#mean = np.dot(self._motion_mat, mean)
mean = np.dot(mean, self._motion_mat.T)
covariance = np.linalg.multi_dot(
(self._motion_mat, covariance, self._motion_mat.T)) + motion_cov

return mean, covariance

def project(self, mean, covariance):
"""
Project state distribution to measurement space.

Args
mean (ndarray): The state's mean vector (8 dimensional array).
covariance (ndarray): The state's covariance matrix (8x8 dimensional).

Returns:
The projected mean and covariance matrix of the given state estimate.
"""
std = [
self._std_weight_position * mean[3], self._std_weight_position *
mean[3], 1e-1, self._std_weight_position * mean[3]
]
innovation_cov = np.diag(np.square(std))

mean = np.dot(self._update_mat, mean)
covariance = np.linalg.multi_dot((self._update_mat, covariance,
self._update_mat.T))
return mean, covariance + innovation_cov

def multi_predict(self, mean, covariance):
"""
Run Kalman filter prediction step (Vectorized version).

Args:
mean (ndarray): The Nx8 dimensional mean matrix of the object states
at the previous time step.
covariance (ndarray): The Nx8x8 dimensional covariance matrics of the
object states at the previous time step.

Returns:
The mean vector and covariance matrix of the predicted state.
Unobserved velocities are initialized to 0 mean.
"""
std_pos = [
self._std_weight_position * mean[:, 3], self._std_weight_position *
mean[:, 3], 1e-2 * np.ones_like(mean[:, 3]),
self._std_weight_position * mean[:, 3]
]
std_vel = [
self._std_weight_velocity * mean[:, 3], self._std_weight_velocity *
mean[:, 3], 1e-5 * np.ones_like(mean[:, 3]),
self._std_weight_velocity * mean[:, 3]
]
sqr = np.square(np.r_[std_pos, std_vel]).T

motion_cov = []
for i in range(len(mean)):
motion_cov.append(np.diag(sqr[i]))
motion_cov = np.asarray(motion_cov)

mean = np.dot(mean, self._motion_mat.T)
left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2))
covariance = np.dot(left, self._motion_mat.T) + motion_cov

return mean, covariance

def update(self, mean, covariance, measurement):
"""
Run Kalman filter correction step.

Args:
mean (ndarray): The predicted state's mean vector (8 dimensional).
covariance (ndarray): The state's covariance matrix (8x8 dimensional).
measurement (ndarray): The 4 dimensional measurement vector
(x, y, a, h), where (x, y) is the center position, a the aspect
ratio, and h the height of the bounding box.

Returns:
The measurement-corrected state distribution.
"""
projected_mean, projected_cov = self.project(mean, covariance)

chol_factor, lower = scipy.linalg.cho_factor(
projected_cov, lower=True, check_finite=False)
kalman_gain = scipy.linalg.cho_solve(
(chol_factor, lower),
np.dot(covariance, self._update_mat.T).T,
check_finite=False).T
innovation = measurement - projected_mean

new_mean = mean + np.dot(innovation, kalman_gain.T)
new_covariance = covariance - np.linalg.multi_dot(
(kalman_gain, projected_cov, kalman_gain.T))
return new_mean, new_covariance

def gating_distance(self,
mean,
covariance,
measurements,
only_position=False,
metric='maha'):
"""
Compute gating distance between state distribution and measurements.
A suitable distance threshold can be obtained from `chi2inv95`. If
`only_position` is False, the chi-square distribution has 4 degrees of
freedom, otherwise 2.

Args:
mean (ndarray): Mean vector over the state distribution (8
dimensional).
covariance (ndarray): Covariance of the state distribution (8x8
dimensional).
measurements (ndarray): An Nx4 dimensional matrix of N measurements,
each in format (x, y, a, h) where (x, y) is the bounding box center
position, a the aspect ratio, and h the height.
only_position (Optional[bool]): If True, distance computation is
done with respect to the bounding box center position only.
metric (str): Metric type, 'gaussian' or 'maha'.

Returns
An array of length N, where the i-th element contains the squared
Mahalanobis distance between (mean, covariance) and `measurements[i]`.
"""
mean, covariance = self.project(mean, covariance)
if only_position:
mean, covariance = mean[:2], covariance[:2, :2]
measurements = measurements[:, :2]

d = measurements - mean
if metric == 'gaussian':
return np.sum(d * d, axis=1)
elif metric == 'maha':
cholesky_factor = np.linalg.cholesky(covariance)
z = scipy.linalg.solve_triangular(
cholesky_factor,
d.T,
lower=True,
check_finite=False,
overwrite_b=True)
squared_maha = np.sum(z * z, axis=0)
return squared_maha
else:
raise ValueError('invalid distance metric')