基于双向递归卷积神经网络的单目红外视频深度估计

吴寿川; 赵海涛; 孙韶媛

doi:doi:10.3788/AOS201737.1215003

光学学报, 2017, 37 (12): 1215003, 网络出版: 2018-09-06

基于双向递归卷积神经网络的单目红外视频深度估计下载： 1317次

Depth Estimation from Monocular Infrared Video Based on Bi-Recursive Convolutional Neural Network

吴寿川 ¹赵海涛 ^1,*孙韶媛 ²

作者单位

¹ 华东理工大学信息科学与工程学院, 上海 200237

² 东华大学信息科学与技术学院, 上海 201620

引用该论文

吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报, 2017, 37(12): 1215003.

Shouchuan Wu, Haitao Zhao, Shaoyuan Sun. Depth Estimation from Monocular Infrared Video Based on Bi-Recursive Convolutional Neural Network[J]. Acta Optica Sinica, 2017, 37(12): 1215003.

参考文献

[1] Karsch K, Liu C, Kang S B. Depth transfer: Depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2144-2158.

Karsch K, Liu C, Kang S B. Depth transfer: Depth extraction from video using non-parametric sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2144-2158.

[2] Konrad J, Wang M, Ishwar P, et al. Learning-based, automatic 2D-to-3D image and video conversion[J]. IEEE Transactions on Image Processing, 2013, 22(9): 3485-3496.

Konrad J, Wang M, Ishwar P, et al. Learning-based, automatic 2D-to-3D image and video conversion[J]. IEEE Transactions on Image Processing, 2013, 22(9): 3485-3496.

[3] KongN, Black MJ. Intrinsic depth: Improving depth transfer with intrinsic images[C]. IEEE International Conference on Computer Vision, 2015: 3514- 3522.

KongN, Black MJ. Intrinsic depth: Improving depth transfer with intrinsic images[C]. IEEE International Conference on Computer Vision, 2015: 3514- 3522.

[4] Saxena A, Chung S H, Ng A Y. 3D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53-69.

Saxena A, Chung S H, Ng A Y. 3D depth reconstruction from a single still image[J]. International Journal of Computer Vision, 2008, 76(1): 53-69.

[5] 席林, 孙韶媛, 李琳娜, . 基于, 等. 基于SVM 模型的单目红外图像深度估计[J]. 激光与红外, 2012, 42(11): 1311-1315.

席林, 孙韶媛, 李琳娜, . 基于, 等. 基于SVM 模型的单目红外图像深度估计[J]. 激光与红外, 2012, 42(11): 1311-1315.

Xi Lin, Sun Shaoyuan, Li Linna, et al. Depth estimation from monocular infrared images based on SVM model[J]. Laser & Infrared, 2012, 42(11): 1311-1315.

[6] 许路, 赵海涛, 孙韶媛. 基于深层卷积神经网络的单目红外图像深度估计[J]. 光学学报, 2016, 36(7): 0715002.

许路, 赵海涛, 孙韶媛. 基于深层卷积神经网络的单目红外图像深度估计[J]. 光学学报, 2016, 36(7): 0715002.

Xu Lu, Zhao Haitao, Sun Shaoyuan. Monocular infrared image depth estimation based on deep convolutional neural networks[J]. Acta Optica Sinica, 2016, 36(7): 0715002.

[7] EigenD, PuhrschC, FergusR. Depth map prediction from a single image using a multi-scale deep network[C]. Advances in Neural Information Processing Systems, 2014: 2366- 2374.

EigenD, PuhrschC, FergusR. Depth map prediction from a single image using a multi-scale deep network[C]. Advances in Neural Information Processing Systems, 2014: 2366- 2374.

[8] Liu FY, Shen CH, Lin GS. Deep convolutional neural fields for depth estimation from a single image[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5162- 5170.

Liu FY, Shen CH, Lin GS. Deep convolutional neural fields for depth estimation from a single image[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 5162- 5170.

[9] Zhang G F, Jia J Y, Hua W, et al. Robust bilayer segmentation and motion/depth estimation with a handheld camera[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 603-617.

Zhang G F, Jia J Y, Hua W, et al. Robust bilayer segmentation and motion/depth estimation with a handheld camera[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 603-617.

[10] Akhter I, Sheikh Y, Khan S, et al. Trajectory space: A dual representation for nonrigid structure from motion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(7): 1442-1456.

Akhter I, Sheikh Y, Khan S, et al. Trajectory space: A dual representation for nonrigid structure from motion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(7): 1442-1456.

[11] HaH, ImS, ParkJ, et al. High-quality depth from uncalibrated small motion clip[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 5413- 5421.

HaH, ImS, ParkJ, et al. High-quality depth from uncalibrated small motion clip[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 5413- 5421.

[12] KrizhevskyA, SutskeverI, Hinton GE. ImageNet classification with deep convolutional neural networks[C]. Advances in Neural Information Processing Systems, 2012: 1097- 1105.

KrizhevskyA, SutskeverI, Hinton GE. ImageNet classification with deep convolutional neural networks[C]. Advances in Neural Information Processing Systems, 2012: 1097- 1105.

[13] SimonyanK, ZissermanA. Very deep convolutional networks for large-scale image recognition[C]. International Conference on Learning Representations, 2015: 1- 14.

SimonyanK, ZissermanA. Very deep convolutional networks for large-scale image recognition[C]. International Conference on Learning Representations, 2015: 1- 14.

[14] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770- 778.

He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770- 778.

[15] GirshickR, DonahueJ, DarrellT, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580- 587.

GirshickR, DonahueJ, DarrellT, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580- 587.

[16] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.

Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.

[17] RedmonJ, DivvalaS, GirshickR, et al. You only look once: Unified, real-time object detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779- 788.

RedmonJ, DivvalaS, GirshickR, et al. You only look once: Unified, real-time object detection[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779- 788.

[18] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.

Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.

[19] ChoK, Van MerriënboerB, GulcehreC, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014. 10.3115/v1/D14-117972cd83f2d4301a645db703151c662605http%3A%2F%2Farxiv.org%2Fabs%2F1406.1078http://arxiv.org/abs/1406.1078Abstract: In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

ChoK, Van MerriënboerB, GulcehreC, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014. 10.3115/v1/D14-117972cd83f2d4301a645db703151c662605http%3A%2F%2Farxiv.org%2Fabs%2F1406.1078http://arxiv.org/abs/1406.1078Abstract: In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

[20] ChungJ, GülçehreC, ChoK, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015: 2067- 2075. 5047334d3255d95745cacb034e5b2bc7http%3A%2F%2Fwww.oalib.com%2Fpaper%2F4071471http://www.oalib.com/paper/4071471In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.

ChungJ, GülçehreC, ChoK, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015: 2067- 2075. 5047334d3255d95745cacb034e5b2bc7http%3A%2F%2Fwww.oalib.com%2Fpaper%2F4071471http://www.oalib.com/paper/4071471In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.

[21] LongJ, ShelhamerE, DarrellT. Fully convolutional networks for semantic segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431- 3440.

LongJ, ShelhamerE, DarrellT. Fully convolutional networks for semantic segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 3431- 3440.

[22] KingmaD, BaJ. Adam: A method for stochastic optimization[C]. 3rd International Conference for Learning Representations, 2015.

KingmaD, BaJ. Adam: A method for stochastic optimization[C]. 3rd International Conference for Learning Representations, 2015.

吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报, 2017, 37(12): 1215003. Shouchuan Wu, Haitao Zhao, Shaoyuan Sun. Depth Estimation from Monocular Infrared Video Based on Bi-Recursive Convolutional Neural Network[J]. Acta Optica Sinica, 2017, 37(12): 1215003.

基于双向递归卷积神经网络的单目红外视频深度估计下载： 1317次

关于本站 Cookie 的使用提示

全站搜索

基于双向递归卷积神经网络的单目红外视频深度估计 下载： 1317次

相关论文

相关资讯

关于本站 Cookie 的使用提示

全站搜索

基于双向递归卷积神经网络的单目红外视频深度估计下载： 1317次