Towards Real-Time Multi-Object Tracking
Towards Real-Time Multi-Object Tracking论文阅读
Abstract
The components of traditional MOT strategies which follows the tracking-by-detection paradigm1:
- detection model
- appearance embedding model
- data association
The shortcomings of traditional MOT strategies:
- poor efficiency
While in this paper, the author proposed a new method to solve the problem which allows detection and appearance embedding to be learned in a shared model (single-shot detector). Further more, the author propose a simple and fast association method.
code
1 Introduction
MOT—— Predicting trajectories of multiple targets in video sequences.
tracking-by-detection—— SDE2 :
- Detection—— Localize targets. (detector)
- Association. (re-ID model)
- Problem—— Inefficient.
Solution: Integrate the two tasks into a single network (Faster R-CNN).
JDE3
- Training Data: collect six public available datasets on pedestrian validation and person search to form a unified multi-label dataset.
- Architecture: FPN
- Loss: anchor classification, box regression and embedding learning (using task-dependent uncertainty).
- A simple and fast association algorithm.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MErnkqnG-1659351495197)(http://balabo-typora.oss-cn-chengdu.aliyuncs.com/balabo_img/image-20220726143401322.png “comparison”)]
2 Related Work
3 Joint Learning of Detection and Embedding
3.1 Problem Settings
Training dataset:
{ I , B , y } i = 1 N \{I, B, y\}_{i=1}^{N} {I,B,y}i=1N
Where
I ∈ R c × h × w I\in R^{c\times h\times w} I∈Rc×h×w : image frame,
B ∈ R k × 4 B\in R^{k\times 4} B∈Rk×4: bounding box, where k k k denotes targets,
y ∈ Z k y\in Z^{k} y∈Zk: identity labels.
JDE predict B ^ \hat{B} B^ and F ^ ∈ R k ^ × D \hat{F}\in R^{\hat{k}\times D} F^∈Rk^×D, where D D D is the dimension.
3.2 Architecture Overview
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jPT7wOYD-1659351495198)(http://balabo-typora.oss-cn-chengdu.aliyuncs.com/balabo_img/image-20220726144315571.png “JDE Architecture”)]
Each dense prediction head is the size of ( 6 A + D ) × H × W (6A+D)\times H\times W (6A+D)×H×W.
- bounding box classification: 2 A × H × W 2A\times H\times W 2A×H×W;
- bounding box regression coefficients: 4 A × H × W 4A\times H\times W 4A×H×W;
- embedding: D × H × W D\times H\times W D×H×W.
3.3 Learning to Detect
The detection branch of JDE is similar to the standard RPN except:
- All anchors are set to an aspect of 1: 3;
- IOU>.5 w.r.t. the ground truth ensures a foreground;
- IOU<.4 w.r.t. the ground truth ensures a background.
Loss:
- foreground/background classification loss ℓ α \ell _\alpha ℓα (cross-entropy);
- bounding box regression loss ℓ β \ell_\beta ℓβ (smooth-L1).
3.4 Learning Appearance Embeddings
Triplet loss is abandoned because:
- huge sampling space;
- making training unstable.
Finally use ℓ C E \ell_{CE} ℓCE (cross-entropy loss).
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SLsfTnRP-1659351495198)(http://balabo-typora.oss-cn-chengdu.aliyuncs.com/balabo_img/image-20220726161206339.png “cross-entropy loss”)]
3.5 Automatic Loss Balancing
The total loss can be written as follow:
L total = ∑ i M ∑ j = α , β , γ w j i L j i \mathcal{L}_{\text {total }} = \sum_{i}^{M} \sum_{j = \alpha, \beta, \gamma} w_{j}^{i} \mathcal{L}_{j}^{i} Ltotal=i∑Mj=α,β,γ∑wjiLji
where M M M is the number of prediction heads and w j i w_{j}^{i} wji, i = 1 , . . . , M i=1,...,M i=1,...,M, j = α , β , γ j=\alpha,\beta,\gamma j=α,β,γ are loss weights.
Simple ways to determine the loss weights:
- Let w α i = w β i w_\alpha^i=w_\beta^i wαi=wβi.
Let w α / γ / β 1 = . . . = w α / γ / β M w_{\alpha/\gamma/\beta}^1=...=w_{\alpha/\gamma/\beta}^M wα/γ/β1=...=wα/γ/βM.
Search for the remaining two independent loss weights for the best performance.
task-independent uncertainty:
L total = ∑ i M ∑ j = α , β , γ w j i L j i L total = ∑ i M ∑ j = α , β , γ 1 2 ( 1 e s j i L j i + s j i ) \mathcal{L}_{\text {total }} = \sum_{i}^{M} \sum_{j = \alpha, \beta, \gamma} w_{j}^{i} \mathcal{L}_{j}^{i}\mathcal{L}_{\text {total }}=\sum_{i}^{M} \sum_{j=\alpha, \beta, \gamma} \frac{1}{2}\left(\frac{1}{e^{s_{j}^{i}}} \mathcal{L}_{j}^{i}+s_{j}^{i}\right) Ltotal=i∑Mj=α,β,γ∑wjiLjiLtotal=i∑Mj=α,β,γ∑21(esji1Lji+sji)Task-independent Uncertainty:
Article: “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics”
multi-task loss: L t o t a l = ∑ i w i L i \mathcal L_{total}=\sum_{i}w_i\mathcal L_i Ltotal=∑iwiLi
Model performance is extremely sensitive to weight selection.
In Bayesian modelling, there are two main types of uncertainty:
- Epistemic4 uncertainty: Due to lack of training data.
- Aleatoric5 uncertainty: Aleatoric uncertainty can be explained away with theability to observe all explanatory variables6 with increasing precision. It can be divided into:
- Data-dependent (Heteroscedastic7 uncertainty): Depends on the input data.
- Task-dependent (Homoscedastic8 uncertainty): It is a quantity which stays constant for all input data and varies between different tasks.
Multi-task loss function based on maximising the Gaussian likelihood with homoscedastic uncertainty:
- f W ( x ) → f^W(x)\to fW(x)→ output of a neural network with weights W W W on input x x x
- For regression task: p ( y ∣ f W ( x ) ) = N ( f W ( x ) , σ 2 ) p\left(\mathbf{y} \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x})\right)=\mathcal{N}\left(\mathbf{f}^{\mathbf{W}}(\mathbf{x}), \sigma^{2}\right) p(y∣fW(x))=N(fW(x),σ2). The mean is given by the model out put.
- For classification task: p ( y ∣ f W ( x ) = s o f t m a x ( f W ( x ) ) p(\mathbf y\mid \mathbf f^W(\mathbf x)=\mathbf{softmax}(\mathbf f^W(\mathbf x)) p(y∣fW(x)=softmax(fW(x))
- In the case of multiple model outputs, we can factorise over the outputs: p ( y 1 , … , y K ∣ f W ( x ) ) = p ( y 1 ∣ f W ( x ) ) … p ( y K ∣ f W ( x ) ) p\left(\mathbf{y}_{1}, \ldots, \mathbf{y}_{K} \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x})\right)=p\left(\mathbf{y}_{1} \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x})\right) \ldots p\left(\mathbf{y}_{K} \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x})\right) p(y1,…,yK∣fW(x))=p(y1∣fW(x))…p(yK∣fW(x)). y n y_n yn means outputs of different tasks.
- Scaled version of Softmax: p ( y ∣ f W ( x ) , σ ) = Softmax ( 1 σ 2 f W ( x ) ) p\left(\mathbf{y} \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x}), \sigma\right)=\operatorname{Softmax}\left(\frac{1}{\sigma^{2}} \mathbf{f}^{\mathbf{W}}(\mathbf{x})\right) p(y∣fW(x),σ)=Softmax(σ21fW(x))
- The log likelihood: log p ( y = c ∣ f W ( x ) , σ ) = 1 σ 2 f c W ( x ) − log ∑ c ′ exp ( 1 σ 2 f c ′ W ( x ) ) \log p\left(\mathbf{y}=c \mid \mathbf{f}^{\mathbf{W}}(\mathbf{x}), \sigma\right)=\frac{1}{\sigma^{2}} f_{c}^{\mathbf{W}}(\mathbf{x}) -\log \sum_{c^{\prime}} \exp \left(\frac{1}{\sigma^{2}} f_{c^{\prime}}^{\mathbf{W}}(\mathbf{x})\right) logp(y=c∣fW(x),σ)=σ21fcW(x)−log∑c′exp(σ21fc′W(x))
3.6 Online Association
A tracklet is described with an appearance state e i e_i ei and a motion state m i = ( x , y , γ , h , x ˙ , y ˙ , γ ˙ , h ˙ ) m_i=(x,y,\gamma,h,\dot x,\dot y,\dot \gamma,\dot h) mi=(x,y,γ,h,x˙,y˙,γ˙,h˙):
- x , y x,y x,y: bounding box center position
- h h h: bounding box height
- γ \gamma γ: bounding box ratio
- x ˙ \dot x x˙: velocity of x x x
For an incoming frame, compute motion affinity matrix A m A_m Am and appearance affinity matrix A e A_e Ae using cosine similarity and Mahalanobis similarity respectively.
linear assignment:
- Hungarian algorithm:
(二)匈牙利算法简介_恒友成的博客-CSDN博客_匈牙利算法
- cost matrix: C = λ A e + ( 1 − λ ) A m C=\lambda A_e+(1-\lambda)A_m C=λAe+(1−λ)Am
Matched m i m_i mi is updated by Kalman filter, and e i e_i ei is updated by e i t = α e i t − 1 + ( 1 − α ) f i t e_{i}^{t}=\alpha e_{i}^{t-1}+(1-\alpha) f_{i}^{t} eit=αeit−1+(1−α)fit
Finally observations that are not assigned to any tracklets are initialized as new tracklets if they consecutively appear in 2 frames. A tracklet is terminated if it is not updated in the most current 30 frames.
/ˈpærədaɪm/ 典范 ↩︎
Separate Detection and Embedding ↩︎
Jointly learns the Detector and Embedding model. ↩︎
/ˌepɪˈstiːmɪk/ 认知的 ↩︎
/ˈeɪliətəri/ 偶然的 ↩︎
解释性的变量 ↩︎
/hetərəusə’dæstik/ 异方差的 ↩︎
/həʊməʊskɪˈdæstɪk/ 同方差的 ↩︎
Towards Real-Time Multi-Object Tracking相关推荐
- 多目标跟踪综述、论文、数据集大汇总 Awesome Multiple object Tracking
Awesome Multiple object Tracking(持续更新) 综述 论文 2022 2021 2020 2019 2018 2017 2016 数据集 综述 Multiple Obje ...
- 跟踪算法基准--Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking
Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking https://arxiv ...
- Multiple Object Tracking:多目标跟踪综述
Multiple Object Tracking:多目标跟踪综述 多目标跟踪综述 多目标跟踪综述 摘要 1. 介绍 2. 算法知识 3. 总结 4. 可学习的资源及代码 摘要 本篇博客是多目标跟踪最综 ...
- 文献学习(part44)--Aberrance suppresse dspatio-temporal correlation filters for visual object tracking
学习笔记,仅供参考,有错必纠 关键词:视觉对象跟踪:相关滤波器:时空信息:彻底的改变 Aberrance suppresse dspatio-temporal correlation filters ...
- Online Object Tracking Benchmark(OTB)目标跟踪系统评估方式
主要涉及到一些评估方式的讲解: 评估数据集: OTB50和OTB100(OTB50这里指OTB-2013,OTB100这里指OTB-2015) Wu Y, Lim J, Yang M H. Onlin ...
- FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking 效果展示
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking 效果展示 公开数据集指标 ...
- MUSTer:Multi-Store Tracker:A Cognitive Psychology Inspired Approach to Object Tracking
题目:MUSTer:Multi-Store Tracker:A Cognitive Psychology Inspired Approach to Object Tracking 来源:CVPR 20 ...
- 论文笔记 SiamMask : Fast Online Object Tracking and Segmentation: A Unifying Approach
论文连接:[1812.05050] Fast Online Object Tracking and Segmentation: A Unifying Approach 论文连接:[1812.05050 ...
- GOT-10k: A Large High-Diversity Benchmark forGeneric Object Tracking in the Wild(论文翻译)
论文地址:https://arxiv.org/abs/1810.11981 Code:GOT-10k: Generic Object Tracking Benchmark 目录 摘要 1.引言 2.相 ...
- Quasi-Dense Similarity Learning for Multiple Object Tracking
QDTrack 论文标题:Quasi-Dense Similarity Learning for Multiple Object Tracking 论文地址:https://arxiv.org/pdf ...
最新文章
- 强化学习大规模应用还远吗?Youtube推荐已强势上线
- 每日程序C语言42-带头结点的尾插法创建链表
- 大数据WEB阶段(十六)JavaEE三大 核心技术之监听器Listener
- ebay如何确定同一电脑登陆了多个账号,以及同一账号登陆过多台电脑?
- 关于音频PCM数据2字节(16位)byte与64位double之间的转换
- Openstack虚拟机实例备份方案测试
- Vuforia3D模型上传
- Oracle PLSQL 从入门到精通
- cognos报表制作(三)Cube开发
- java 高级笔试题_JAVA高级工程师笔试题及答案
- 锐起无盘服务器需要什么配置,锐起无盘pnp硬件配置的实现
- word一键生成ppt 分页_word怎么分页,这3种方法简单快捷
- 计算机科学与技术b类大学名单,双一流a类大学和b类大学名单及学科
- php 截图ppt文件,介绍ppt文件截图并插入
- Machine Learning with Graphs 之 Random Walk with Restarts and Personalized PageRank
- 获取已安装设备的高级信息
- Json概述以及python对json的相关操作(至尊宝错过了紫霞仙子,难道你也要错过python对json的相关操作吗?)
- ps批量磨皮滤镜插件ArcSoft Portrait3+ 中文版瘦脸自动识别人脸win/mac支持2018
- 设计上不花钱的海澜之家,如何打开男人的衣柜?
- 方法论:用代码写故事
热门文章
- 健身房预约小程序平台开发笔记
- pytorch安装与测试
- java void eat_java 重写方法
- 国内top5知名贵金属期货交易平台排名(最新榜单大全)
- Z-附件-E:Create the WW3 ww3_grid.nml from the GRIDGEN .meta file.
- 前端开发Vue项目实战:电商后台管理系统(二)-- 登录退出功能 --主页界面
- idea http请求工具http client(可以代替postman)
- Python基于easyocr和fitz实现的pdf转文字
- Java程序员需要掌握
- 阿里云服务器apache/2.4.27(Unix)配置二级域名 ProxyPassMatch