红外技术, 2018, 40 (1): 47, 网络出版: 2018-03-21
结合多模板的多域卷积神经网络视觉跟踪算法
A Multidomain CNN that Integrates Multiple Models in a Tree Structure for Visual Tracking
视觉跟踪 深度学习 卷积神经网络 多域学习 多模板 visual tracking deep learning Convolutional Neural Network (CNN) multi-domain learning multiple models
摘要
为了适应视觉跟踪过程中目标外观变化,提高视觉跟踪算法的鲁棒性,本文基于卷积神经网络(Convolutional Neural Network,CNN)并结合多域学习法与多模板管理,提出一种通过树形结构管理多模板的多域卷积神经网络(Multi-Domain CNNs with Multiple Models in a tree structure)视觉跟踪算法。首先使用大量已标记目标位置的视频数据预训练多域结构的CNN,使CNN 卷积层可从图像中提取出适用于跟踪任务的特征。然后在跟踪时中对CNN 全连接层进行微调以适应跟踪目标,并使用树形结构管理存储不同时间段的目标模板得到模板树。使用模板树综合评价待检测帧,估计目标位置。最后按照一定规则将新模板添加进模板树,完成模板的更新。实验表明,该算法对跟踪过程中目标外观的变化有着良好的适应性,同时多模板可抑制CNN 在跟踪时产生的模板漂移问题。
Abstract
To solve the problem of visually tracking a target that changes its appearance and improve the robustness of visual tracking, we propose a convolutional neural network (CNN)-based algorithm that combines a multidomain learning framework and multiple models stored in a tree structure. First, the multidomain CNN is pretrained with many videos containing tracking ground truths, so that its convolutional layer can extract features appropriate for visual tracking. During tracking, the fully connected layers are fine-tuned online to fit the target appearance, and the multiple target appearance models are managed in a tree structure. Then, the model tree is used to estimate the target’s state in a new frame. Finally, a new model is updated along a path in the model tree. The algorithm produces outstanding performance when a target abruptly changes its appearance. Furthermore, the model tree can fix the problem of drift during online learning with the CNN.
王鹏翔, 郭敬滨, 谭文斌, 李醒飞. 结合多模板的多域卷积神经网络视觉跟踪算法[J]. 红外技术, 2018, 40(1): 47. WANG Pengxiang, GUO Jingbin, TAN Wenbin, LI Xingfei. A Multidomain CNN that Integrates Multiple Models in a Tree Structure for Visual Tracking[J]. Infrared Technology, 2018, 40(1): 47.