深度学习在视觉定位与三维结构恢复中的研究进展 下载: 2096次
鲍振强, 李艾华, 崔智高, 袁梦. 深度学习在视觉定位与三维结构恢复中的研究进展[J]. 激光与光电子学进展, 2018, 55(5): 050007.
Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007.
[1] Roberts LG. Machine perception of three-dimensional solids[M]. Cambridge: Massachusetts Institute of Technology, 1965: 31- 39.
[2] Barrow H G, Tenenbaum J M. Interpreting line drawings as three-dimensional surfaces[J]. Artificial Intelligence, 1981, 17: 75-116.
[3] 田延冰, 白剑, 黄治. 基于全景环带立体成像系统的深度信息估计[J]. 光学学报, 2013, 33(6): 0611002.
[4] FlackJ, FoxS. Rapid 2D-to-3D conversion[C]. SPIE, 2002, 4660: 78- 86.
[5] Chen SE, WilliamsL. View interpolation for image synthesis[C]. Conference on Computer Graphics and Interactive Techniques, 1993: 279- 288.
[6] Fitzgibbon A, Wexler Y, Zisserman A. Image-based rendering using image-based priors[J]. International Journal of Computer Vision, 2005, 63(2): 141-151.
[7] Seitz SM, Dyer CR. View morphing[C]. Conference on Computer Graphics and Interactive Techniques, 1996: 21- 30.
[8] Zitnick CL, Kang SB, UyttendaeleM, et al. High-quality video view interpolation using alayered representation[C]. ACM Transactions on Graphics, 2004, 23( 3): 600- 608.
[9] Lubor L, Häne C. arXiv[J]. Pollefeys M. Learning the matching function. Computer Science, 2015.
[10] Zbontar J. LeCun Y. Stereo matching by training a convolutional neural network to compare image patches[J]. Journal of Machine Learning Research, 2016, 17(65): 1-32.
[11] 许路, 赵海涛, 孙韶媛. 基于深层卷积神经网络的单目红外图像深度估计[J]. 光学学报, 2016, 36(7): 0715002.
[12] 吴寿川, 赵海涛, 孙韶媛. 基于双向递归卷积神经网络的单目红外视频深度估计[J]. 光学学报, 2017, 37(12): 1215003.
[13] MayerN, IlgE, HausserP, et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2016: 4040- 4048.
[14] Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840.
[15] Liu F, Shen C, Lin G, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024-2039.
[16] EigenD, PuhrschC, FergusR. Depth map prediction from a single image using a multi-scale deep network[C]. International Conference on Neural Information Processing Systems, 2014: 2366- 2374.
[17] ShiJ, PollefeysM. Pulling things out of perspective[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2014: 89- 96.
[18] LiB, ShenC, DaiY, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119- 1127.
[19] LainaI, RupprechtC, BelagiannisV, et al. Deeper depth prediction with fully convolutional residual networks[C]. Fourth IEEE International Conference on 3D Vision, 2016: 239- 248.
[20] LiB, ShenC, DaiY, et al. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015: 1119- 1127.
[21] FanX, ZhengK, LinY, et al. Combining local appearance and holistic view: dual-source deep neural networks for human pose estimation[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2015, 8753: 1347- 1355.
[22] UmmenhoferB, ZhouH, UhrigJ, et al. DeMoN: depth and motion network for learning monocular stereo[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2017: 5622- 5631.
[23] KuznietsovY, StucklerJ, LeibeB. Semi-supervised deep learning for monocular depth map prediction[C]. 30 th IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2017: 2215- 2223.
[24] LiuB, GouldS, KollerD. Single image depth estimation from predicted semantic labels[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2010: 1253- 1260.
[25] KendallA, MartirosyanH, DasguptaS, et al. End-to-end learning of geometry and context for deep stereo regression[C]. 16 th IEEE International Conference on Computer Vision , 2017: 66- 75.
[26] TulsianiS, ZhouT, Efros AA, et al. Multi-view supervision for single-view reconstruction via differentiable ray consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 209- 217.
[27] Bell A J, Sejnowski T J. The "independent components" of natural scenes are edge filters[J]. Vision Research, 1997, 37(23): 3327-3338.
[28] Bourlard H, Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition[J]. Biological Cybernetics, 1988, 59(4/5): 291-294.
[29] Olshausen B A, Field D J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images[J]. Nature, 1996, 381(6583): 607-609.
[30] Salakhutdinov R, Hinton G. Deep Boltzmann machines[J]. Journal of Machine Learning Research, 2009, 5(2): 1967-2006.
[31] Gadelha M, Maji S. arXiv:[J]. Wang R. Shape generation using spatially partitioned point clouds. Computer Science, 2016, 1707: 06267.
[32] Rezende D J. Eslami S M A, Mohamed S, et al. Unsupervised learning of 3D structure from images[J]. Advances in Neural Information Processing Systems, 2016: 4997-5005.
[33] Yan X, Yang J, Yumer E, et al. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision[J]. Advances in Neural Information Processing Systems, 2016: 1696-1704.
[34] JayaramanD, GraumanK. Learning image representations tied to ego-motion[C]. IEEE International Conference on Computer Vision, 2015: 1413- 1421.
[35] KendallA, GrimesM, CipollaR. PoseNet: a convolutional network for real-time 6-DOF camera relocalization[C]. IEEE International Conference on Computer Vision, 2015: 2938- 2946.
[36] AgrawalP, CarreiraJ, MalikJ. Learning to see by moving[C]. IEEE International Conference on Computer Vision, 2015: 37- 45.
[37] GargR, Vijay K B G, Carneiro G, et al. Unsupervised CNN forsingle view depth estimation: geometry to the rescue[C]. 14 th European Conference on Computer Vision , 2016, 9912: 740- 756.
[38] KendallA, CipollaR. Geometric loss functions for camera pose regression with deep learning[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6555- 6564.
[39] FlynnJ, SnavelyK, NeulanderI, et al. Deepstereo: learning to predict new views from real world imagery: US20160335795[P]. 2018-03-13.
[40] XieJ, GirshickR, FarhadiA. Deep3D: Fully automatic 2D-to-3D video conversion with deep vonvolutional neural networks[C]. 14 th European Conference on Computer Vision , 2016, 9908: 842- 857.
[41] GodardC, Aodha OM, Brostow GJ. Unsupervised monocular depth estimation with left-right consistency[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6602- 6611.
[42] KondaK, MemisevicR. Learning visual odometry with a convolutional network[C]. International Conference on Computer Vision Theory and Applications, 2015: 486- 490.
[43] HandaA, BloeschM, PtruceanV, et al. gvnn: neural network library for geometric computer vision[C]. 14 th European Conference on Computer Vision , 2016, 9915: 67- 82.
[44] 赵洋, 刘国良, 田国会, 等. 基于深度学习的视觉SLAM综述[J]. 机器人, 2017, 39(6): 889-896.
Zhao Y, Liu G L, Tian G H, et al. A survey of visual SLAM based on deep learning[J]. Robot, 2017, 39(6): 889-896.
[45] WangS, ClarkR, WenH, et al. DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks[C]. IEEE International Conference on Robotics and Automation, 2017: 2043- 2050.
[46] LiR, WangS, LongZ, et al. UnDeepVO: monocular visual odometry through unsupervised deep learning[J]. Computer Science, 2017: arXiv:1709. 06841.
[47] Vijayanarasimhan S, Ricco S, Schmid C, et al. arXiv:[J]. motion from video. Computer Science, 2017, 1704: 07804.
[48] Gadelha M, Maji S. arXiv:[J]. Wang R. 3D shape induction from 2D views of multiple objects. Computer Science, 2016, 1612: 05872.
[49] AroraR, LivescuK. Multi-view learning with supervision for transformed bottleneck features[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014: 2499- 2503.
[50] ShottonJ, GlockerB, ZachC, et al. Scene coordinate regression forests for camera relocalization in RGB-D images[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2013: 2930- 2937.
[51] ZhouT, BrownM, SnavelyN, et al. Unsupervised learning of depth and ego-motion from video[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6612- 6619.
鲍振强, 李艾华, 崔智高, 袁梦. 深度学习在视觉定位与三维结构恢复中的研究进展[J]. 激光与光电子学进展, 2018, 55(5): 050007. Zhenqiang Bao, Aihua Li, Zhigao Cui, Meng Yuan. Research Progress of Deep Learning in Visual Localization and Three-Dimensional Structure Recovery[J]. Laser & Optoelectronics Progress, 2018, 55(5): 050007.