激光与光电子学进展, 2018, 55 (5): 050007, 网络出版: 2018-09-11   

深度学习在视觉定位与三维结构恢复中的研究进展 下载: 2096次

Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery
作者单位
火箭军工程大学, 陕西 西安 710025
引用该论文

鲍振强, 李艾华, 崔智高, 袁梦. 深度学习在视觉定位与三维结构恢复中的研究进展[J]. 激光与光电子学进展, 2018, 55(5): 050007.

Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007.

参考文献

[1] Roberts LG. Machine perception of three-dimensional solids[M]. Cambridge: Massachusetts Institute of Technology, 1965: 31- 39.

[2] Barrow H G, Tenenbaum J M. Interpreting line drawings as three-dimensional surfaces[J]. Artificial Intelligence, 1981, 17: 75-116.

[3] 田延冰, 白剑, 黄治. 基于全景环带立体成像系统的深度信息估计[J]. 光学学报, 2013, 33(6): 0611002.

    Tian Y B, Bai J, Huang Z. Depth estimation with a panoramic stereo imaging system[J]. Acta Optica Sinica, 2013, 33(6): 0611002.

[4] FlackJ, FoxS. Rapid 2D-to-3D conversion[C]. SPIE, 2002, 4660: 78- 86.

[5] Chen SE, WilliamsL. View interpolation for image synthesis[C]. Conference on Computer Graphics and Interactive Techniques, 1993: 279- 288.

[6] Fitzgibbon A, Wexler Y, Zisserman A. Image-based rendering using image-based priors[J]. International Journal of Computer Vision, 2005, 63(2): 141-151.

[7] Seitz SM, Dyer CR. View morphing[C]. Conference on Computer Graphics and Interactive Techniques, 1996: 21- 30.

[8] Zitnick CL, Kang SB, UyttendaeleM, et al. High-quality video view interpolation using alayered representation[C]. ACM Transactions on Graphics, 2004, 23( 3): 600- 608.

[9] Lubor L, Häne C. arXiv[J]. Pollefeys M. Learning the matching function. Computer Science, 2015.

[10] Zbontar J. LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 2016, 17(65): 1-32.

[11] 许路, 赵海涛, 孙韶媛. 基于深层卷积神经网络的单目红外图像深度估计[J]. 光学学报, 2016, 36(7): 0715002.

    Xu L, Zhao H T, Sun S Y. Monocular infrared image depth estimation based on deep convolutional neural networks[J]. Acta Optica Sinica, 2016, 36(7): 0715002.

[12] 吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报, 2017, 37(12): 1215003.

    Wu S C, Zhao H T, Sun S Y. Depth estimation from monocular infrared video based on bi-recursive convolutional neural network[J]. Acta Optica Sinica, 2017, 37(12): 1215003.

[13] MayerN, IlgE, HausserP, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4040- 4048.

[14] Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840.

[15] Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024-2039.

[16] EigenD, PuhrschC, FergusR. Depth map prediction from a single image using a multi-scale deep network[C]. International Conference on Neural Information Processing Systems, 2014: 2366- 2374.

[17] ShiJ, PollefeysM. Pulling things out of perspective[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 89- 96.

[18] LiB, ShenC, DaiY, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119- 1127.

[19] LainaI, RupprechtC, BelagiannisV, et al. Deeper depth prediction with fully convolutional residual networks[C]. Fourth IEEE International Conference on 3D Vision, 2016: 239- 248.

[20] LiB, ShenC, DaiY, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119- 1127.

[21] FanX, ZhengK, LinY, et al. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015, 8753: 1347- 1355.

[22] UmmenhoferB, ZhouH, UhrigJ, et al. DeMoN: depth and motion network for learning monocular stereo[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2017: 5622- 5631.

[23] KuznietsovY, StucklerJ, LeibeB. Semi-supervised deep learning for monocular depth map prediction[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2017: 2215- 2223.

[24] LiuB, GouldS, KollerD. Single image depth estimation from predicted semantic labels[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2010: 1253- 1260.

[25] KendallA, MartirosyanH, DasguptaS, et al. End-to-end learning of geometry and context for deep stereo regression[C]. 16 th IEEE International Conference on Computer Vision , 2017: 66- 75.

[26] TulsianiS, ZhouT, Efros AA, et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 209- 217.

[27] Bell A J, Sejnowski T J. The "independent components" of natural scenes are edge filters[J]. Vision Research, 1997, 37(23): 3327-3338.

[28] Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological Cybernetics, 1988, 59(4/5): 291-294.

[29] Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images[J]. Nature, 1996, 381(6583): 607-609.

[30] Salakhutdinov R, Hinton G. Deep Boltzmann machines[J]. Journal of Machine Learning Research, 2009, 5(2): 1967-2006.

[31] Gadelha M, Maji S. arXiv:[J]. Wang R. Shape generation using spatially partitioned point clouds. Computer Science, 2016, 1707: 06267.

[32] Rezende D J. Eslami S M A, Mohamed S, et al. Unsupervised learning of 3D structure from images[J]. Advances in Neural Information Processing Systems, 2016: 4997-5005.

[33] Yan X, Yang J, Yumer E, et al. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision[J]. Advances in Neural Information Processing Systems, 2016: 1696-1704.

[34] JayaramanD, GraumanK. Learning image representations tied to ego-motion[C]. IEEE International Conference on Computer Vision, 2015: 1413- 1421.

[35] KendallA, GrimesM, CipollaR. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]. IEEE International Conference on Computer Vision, 2015: 2938- 2946.

[36] AgrawalP, CarreiraJ, MalikJ. Learning to see by moving[C]. IEEE International Conference on Computer Vision, 2015: 37- 45.

[37] GargR, Vijay K B G, Carneiro G, et al. Unsupervised CNN forsingle view depth estimation: geometry to the rescue[C]. 14 th European Conference on Computer Vision , 2016, 9912: 740- 756.

[38] KendallA, CipollaR. Geometric loss functions for camera pose regression with deep learning[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6555- 6564.

[39] FlynnJ, SnavelyK, NeulanderI, et al. Deepstereo: learning to predict new views from real world imagery: US20160335795[P]. 2018-03-13.

[40] XieJ, GirshickR, FarhadiA. Deep3D: Fully automatic 2D-to-3D video conversion with deep vonvolutional neural networks[C]. 14 th European Conference on Computer Vision , 2016, 9908: 842- 857.

[41] GodardC, Aodha OM, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6602- 6611.

[42] KondaK, MemisevicR. Learning visual odometry with a convolutional network[C]. International Conference on Computer Vision Theory and Applications, 2015: 486- 490.

[43] HandaA, BloeschM, PtruceanV, et al. gvnn: neural network library for geometric computer vision[C]. 14 th European Conference on Computer Vision , 2016, 9915: 67- 82.

[44] 赵洋, 刘国良, 田国会, 等. 基于深度学习的视觉SLAM综述[J]. 机器人, 2017, 39(6): 889-896.

    Zhao Y, Liu G L, Tian G H, et al. A survey of visual SLAM based on deep learning[J]. Robot, 2017, 39(6): 889-896.

[45] WangS, ClarkR, WenH, et al. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]. IEEE International Conference on Robotics and Automation, 2017: 2043- 2050.

[46] LiR, WangS, LongZ, et al. UnDeepVO: monocular visual odometry through unsupervised deep learning[J]. Computer Science, 2017: arXiv:1709. 06841.

[47] Vijayanarasimhan S, Ricco S, Schmid C, et al. arXiv:[J]. motion from video. Computer Science, 2017, 1704: 07804.

[48] Gadelha M, Maji S. arXiv:[J]. Wang R. 3D shape induction from 2D views of multiple objects. Computer Science, 2016, 1612: 05872.

[49] AroraR, LivescuK. Multi-view learning with supervision for transformed bottleneck features[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 2499- 2503.

[50] ShottonJ, GlockerB, ZachC, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2013: 2930- 2937.

[51] ZhouT, BrownM, SnavelyN, et al. Unsupervised learning of depth and ego-motion from video[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6612- 6619.

鲍振强, 李艾华, 崔智高, 袁梦. 深度学习在视觉定位与三维结构恢复中的研究进展[J]. 激光与光电子学进展, 2018, 55(5): 050007. Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007.

本文已被 2 篇论文引用
被引统计数据来源于中国光学期刊网
引用该论文: TXT   |   EndNote

相关论文

加载中...

关于本站 Cookie 的使用提示

中国光学期刊网使用基于 cookie 的技术来更好地为您提供各项服务,点击此处了解我们的隐私策略。 如您需继续使用本网站,请您授权我们使用本地 cookie 来保存部分信息。
全站搜索
您最值得信赖的光电行业旗舰网络服务平台!