Paper
1 August 2022 Research on unsupervised monocular depth estimation based on hybrid ViT
Chunyan Wei, Qingyu Zhang, Xiaosen Tian, Qingxia Li, Zhigang Jin
Author Affiliations +
Proceedings Volume 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022); 1225702 (2022) https://doi.org/10.1117/12.2639574
Event: 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 2022, Guangzhou, China
Abstract
Currently, depth estimation of 2D image is widely treated as an important technology for environmental perception in autonomous driving, but it still suffers from many issues. From the view of application, this work proposes an unsupervised monocular depth estimation based on hybrid ViT to improve accuracy and reduce cost. Specifically, the technology of convolution and transformer have been combined in this work to encode to extract fine-grained features. Besides, fusing multi-scale features to decode is also adopted to generate multi-scale disparity maps. Then, the loss is calculated based on multi-scale and full-resolution disparity maps, and stereo constraints to realize image reconstruction have been achieved finally. Additionally, experiments have been carried out on the KITTI dataset, and the measured results indicate that compared with the previous works, this work has made progresses in the indicators of error and accuracy, i.e., higher accuracy of 3.4% than baseline, more clear boundaries, fewer artifacts and higher quality of depth maps. It is proved that the combination of hybrid encoding, multi-scale decoder and full-resolution loss can bring significant effect on depth estimation, especially the hybrid encoding.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chunyan Wei, Qingyu Zhang, Xiaosen Tian, Qingxia Li, and Zhigang Jin "Research on unsupervised monocular depth estimation based on hybrid ViT", Proc. SPIE 12257, 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), 1225702 (1 August 2022); https://doi.org/10.1117/12.2639574
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Convolution

Image restoration

Machine learning

Cameras

Artificial neural networks

Computer vision technology

Back to Top