简介

Sample online and realtime tracking (SORT)利用了帧和帧之间的关系，用卡尔曼滤波和匈牙利算法来进行多目标跟踪。但是SORT的ID switches次数很高，因为SORT在跟踪目标动作的可预测性很高的情况下性能比较好，但在有遮挡的场景表现不太好。

DeepSORT结合了目标的动作状态和图像特征信息。具体是利用CNN来分辨行人。

算法细节

跟踪处理与状态估计

目标的状态空间为：

其中u和v是bbox的中心店坐标，y是宽高比例，h是高，后面四个值是前面四个值在图像坐标上的加速度。我们使用了匀速运动和线性观测模型的卡尔曼滤波器。

每个轨道k都会预测最后成功匹配上目标的那一帧到下一次匹配到目标的那一帧之间的帧数a_k, 当匹配到时a_k清0，当a_k超过一个阈值A_max时，就会认为这个轨道k已经离开了画面，从轨道匹配池中去掉。当一个新的目标没办法匹配上任何一个k时，就会新建一个轨道。当一个新轨道的前三帧没有全部匹配上时，就删除这个轨道。

对象分配问题

为了考虑运动信息，我们使用预测的卡尔曼状态和新到的测量值之间的（平方）Mahalanobis距离：

第i个track的在度量空间中的分布

第j个bbox的检测结果

通过计算

如果b=1，则第i个track和第j个检测结果是匹配的，这里t设计的阈值为9.4877

因为Mahalanobis距离在目标运动变化性较低的时候效果更好，而卡尔曼滤波对新的状态的预测也是一个粗略估计，再加上摄像机也有可能会发生抖动，所以在有遮挡时，Mahalanobis距离的效果可能不太好。

因此引入第二个metric来帮助跟踪任务进行对象分配。

为每个检测结果d_j构造一个外形描述向量r_j，且r_j的L1范数为1。同时保存一个track上最后L_k个r向量，所以我们第二个metric是比较第i个track和第j个检测结果的cosine距离：

同上，也有

在实际使用时，我们使用CNN来计算这个r向量

最后将这两个b值进行统一：

级联匹配（Matching Cascade）

解决的问题：当一个目标因为被遮挡等原因而失去追踪时，卡尔曼滤波器会增加预测目标位置的不确定性，从而导致预测的目标分布更分散，观测可能性（observation likelihood）也会分散，更难被匹配上。直觉上应该增加物体到track判断距离的阈值来应对这种情形。但是反直觉的是当有两个track来匹配同一个检测目标时，Mahalanobis距离会偏向于预测不确定性更大的track，因为这种操作会显著减少任何检测到映射track平均值的标准偏差距离。这是一个不太合理的行为，因为这样会导致tracks更加碎片化，增加了track的不稳定性。

因此我们提出级联匹配来优先匹配出现频率更高的目标，来应对我们发现的匹配相关性中的概率分散问题。

是指有n帧没有检测结果匹配上的track。

第七行的意思是从当前帧的检测结果中找与已有track中cost值最小的检测目标

如果这个目标满足b值条件，就匹配上相应的track

外观描述向量（Deep Appearance Descriptor）

是一个基于CNN模型

实验结果如下：

代码

class KalmanFilter(object):
    """
    A simple Kalman filter for tracking bounding boxes in image space.

    The 8-dimensional state space

        x, y, a, h, vx, vy, va, vh

    contains the bounding box center position (x, y), aspect ratio a, height h,
    and their respective velocities.

    Object motion follows a constant velocity model. The bounding box location
    (x, y, a, h) is taken as direct observation of the state space (linear
    observation model).

    """

    def __init__(self):
        ndim, dt = 4, 1.

        # Create Kalman filter model matrices.
        self.a_motion_mt = np.eye(2 * ndim, 2 * ndim) # 8 * 8的对角线矩阵
        for i in range(ndim):
            self._motion_mat[i, ndim + i] = dt
        self._update_mat = np.eye(ndim, 2 * ndim)

        # Motion and observation uncertainty are chosen relative to the current
        # state estimate. These weights control the amount of uncertainty in
        # the model. This is a bit hacky.
        self._std_weight_position = 1. / 20
        self._std_weight_velocity = 1. / 160

这里self.a_motion_mt如下：

array([[1., 0., 0., 0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.]])

self._update_mat如下

array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.]])

def initiate(self, measurement):
    """
    Create track from unassociated measurement.

    Args:
        measurement (ndarray): Bounding box coordinates (x, y, a, h) with
            center position (x, y), aspect ratio a, and height h.

    Returns:
        The mean vector (8 dimensional) and covariance matrix (8x8
        dimensional) of the new track. Unobserved velocities are 
        initialized to 0 mean.
    """
    mean_pos = measurement
    mean_vel = np.zeros_like(mean_pos)
    mean = np.r_[mean_pos, mean_vel] # 这里是给4维位置相关参数配上4个加速度，初始化为0

    std = [
        2 * self._std_weight_position * measurement[3],
        2 * self._std_weight_position * measurement[3], 1e-2,
        2 * self._std_weight_position * measurement[3],
        10 * self._std_weight_velocity * measurement[3],
        10 * self._std_weight_velocity * measurement[3], 1e-5,
        10 * self._std_weight_velocity * measurement[3]
    ]
    covariance = np.diag(np.square(std)) # 8 * 8矩阵
    return mean, covariance

def predict(self, mean, covariance):
    """
    Run Kalman filter prediction step.

    Args:
        mean (ndarray): The 8 dimensional mean vector of the object state
            at the previous time step.
        covariance (ndarray): The 8x8 dimensional covariance matrix of the
            object state at the previous time step.

    Returns:
        The mean vector and covariance matrix of the predicted state. 
        Unobserved velocities are initialized to 0 mean.
    """
    std_pos = [
        self._std_weight_position * mean[3], self._std_weight_position *
        mean[3], 1e-2, self._std_weight_position * mean[3]
    ]
    std_vel = [
        self._std_weight_velocity * mean[3], self._std_weight_velocity *
        mean[3], 1e-5, self._std_weight_velocity * mean[3]
    ]
    motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))

    #mean = np.dot(self._motion_mat, mean)
    mean = np.dot(mean, self._motion_mat.T)
    covariance = np.linalg.multi_dot(
        (self._motion_mat, covariance, self._motion_mat.T)) + motion_cov

    return mean, covariance

def project(self, mean, covariance):
    """
    Project state distribution to measurement space.

    Args
        mean (ndarray): The state's mean vector (8 dimensional array).
        covariance (ndarray): The state's covariance matrix (8x8 dimensional).

    Returns:
        The projected mean and covariance matrix of the given state estimate.
    """
    std = [
        self._std_weight_position * mean[3], self._std_weight_position *
        mean[3], 1e-1, self._std_weight_position * mean[3]
    ]
    innovation_cov = np.diag(np.square(std))

    mean = np.dot(self._update_mat, mean)
    covariance = np.linalg.multi_dot((self._update_mat, covariance,
                                      self._update_mat.T))
    return mean, covariance + innovation_cov

def multi_predict(self, mean, covariance):
    """
    Run Kalman filter prediction step (Vectorized version).
    
    Args:
        mean (ndarray): The Nx8 dimensional mean matrix of the object states
            at the previous time step.
        covariance (ndarray): The Nx8x8 dimensional covariance matrics of the
            object states at the previous time step.

    Returns:
        The mean vector and covariance matrix of the predicted state.
        Unobserved velocities are initialized to 0 mean.
    """
    std_pos = [
        self._std_weight_position * mean[:, 3], self._std_weight_position *
        mean[:, 3], 1e-2 * np.ones_like(mean[:, 3]),
        self._std_weight_position * mean[:, 3]
    ]
    std_vel = [
        self._std_weight_velocity * mean[:, 3], self._std_weight_velocity *
        mean[:, 3], 1e-5 * np.ones_like(mean[:, 3]),
        self._std_weight_velocity * mean[:, 3]
    ]
    sqr = np.square(np.r_[std_pos, std_vel]).T

    motion_cov = []
    for i in range(len(mean)):
        motion_cov.append(np.diag(sqr[i]))
    motion_cov = np.asarray(motion_cov)

    mean = np.dot(mean, self._motion_mat.T)
    left = np.dot(self._motion_mat, covariance).transpose((1, 0, 2))
    covariance = np.dot(left, self._motion_mat.T) + motion_cov

    return mean, covariance

def update(self, mean, covariance, measurement):
    """
    Run Kalman filter correction step.

    Args:
        mean (ndarray): The predicted state's mean vector (8 dimensional).
        covariance (ndarray): The state's covariance matrix (8x8 dimensional).
        measurement (ndarray): The 4 dimensional measurement vector
            (x, y, a, h), where (x, y) is the center position, a the aspect
            ratio, and h the height of the bounding box.

    Returns:
        The measurement-corrected state distribution.
    """
    projected_mean, projected_cov = self.project(mean, covariance)

    chol_factor, lower = scipy.linalg.cho_factor(
        projected_cov, lower=True, check_finite=False)
    kalman_gain = scipy.linalg.cho_solve(
        (chol_factor, lower),
        np.dot(covariance, self._update_mat.T).T,
        check_finite=False).T
    innovation = measurement - projected_mean

    new_mean = mean + np.dot(innovation, kalman_gain.T)
    new_covariance = covariance - np.linalg.multi_dot(
        (kalman_gain, projected_cov, kalman_gain.T))
    return new_mean, new_covariance

def gating_distance(self,
                    mean,
                    covariance,
                    measurements,
                    only_position=False,
                    metric='maha'):
    """
    Compute gating distance between state distribution and measurements.
    A suitable distance threshold can be obtained from `chi2inv95`. If
    `only_position` is False, the chi-square distribution has 4 degrees of
    freedom, otherwise 2.
    
    Args:
        mean (ndarray): Mean vector over the state distribution (8
            dimensional).
        covariance (ndarray): Covariance of the state distribution (8x8
            dimensional).
        measurements (ndarray): An Nx4 dimensional matrix of N measurements,
            each in format (x, y, a, h) where (x, y) is the bounding box center
            position, a the aspect ratio, and h the height.
        only_position (Optional[bool]): If True, distance computation is 
            done with respect to the bounding box center position only.
        metric (str): Metric type, 'gaussian' or 'maha'.

    Returns
        An array of length N, where the i-th element contains the squared
        Mahalanobis distance between (mean, covariance) and `measurements[i]`.
    """
    mean, covariance = self.project(mean, covariance)
    if only_position:
        mean, covariance = mean[:2], covariance[:2, :2]
        measurements = measurements[:, :2]

    d = measurements - mean
    if metric == 'gaussian':
        return np.sum(d * d, axis=1)
    elif metric == 'maha':
        cholesky_factor = np.linalg.cholesky(covariance)
        z = scipy.linalg.solve_triangular(
            cholesky_factor,
            d.T,
            lower=True,
            check_finite=False,
            overwrite_b=True)
        squared_maha = np.sum(z * z, axis=0)
        return squared_maha
    else:
        raise ValueError('invalid distance metric')

Upperlan

目标跟踪-DeepSort

简介

算法细节

跟踪处理与状态估计

对象分配问题

级联匹配（Matching Cascade）

外观描述向量（Deep Appearance Descriptor）

代码