Presentation + Paper
20 June 2021 Towards real-time monocular depth estimation for mobile systems
Author Affiliations +
Abstract
Nowadays, thanks to the development of new advanced driver-assistance systems (ADAS) able to help drivers in driving tasks, autonomous driving is becoming part of our lives. This massive development is mainly due to the great possibilities of guaranteeing higher safety levels that these systems can offer to vehicles that every day travel on our roads. At the heart of every application in this area, there stands the environment’s perception, guiding the vehicle’s behavior. This counts in autonomous driving field and all the applications characterized by a system that moves in the 3D real-world like robotics, augmented reality, etc. For this purpose, an effective 3D perception system is necessary to accurately localize all the objects that compose the scene and reconstruct it in a 3D model. This issue is often faced using LIDAR sensors, which allow an accurate 3D perception offering high robustness in unfavorable light and weather conditions. But these sensors are generally expensive, and thus do not represent the right choice for low-cost vehicles and robots. Moreover, they need to be used in a particular position that does not permit integrating it on the car changing both the appearance and the aerodynamics. Besides, their output is a point cloud data that, due to its structure is not easily manageable with deep learning models that promise outstanding results in various similar predictive tasks. For these reasons, in some applications, it is better to leverage other sensors like RGB cameras to estimate 3D perception. For this purpose, more classic approaches are based on stereo-cameras, RGB-D cameras, and stereo from motion, which generally can reconstruct the scene with less accuracy than LIDARS, but still produce acceptable results. In recent years, several approaches have been proposed in literature which aim to estimate the depth from a monocular camera leveraging deep learning models. Some of these methods use a supervised approach,1, 2 however they mainly rely on annotated datasets which in practice can be labor-expensive to be collected. Thus, some works3, 4 use, on the contrary, self-supervised training procedure leveraging reprojection error. Notwithstanding their good performance, most of the proposed approaches use very deep neural networks, that are power- and resource- consuming and need high-end GPUs to produce results in real-time. For these reasons, these approaches cannot be used in systems with power and computational limits. In this work, we propose a new approach based on a standard CNN proposed in the literature to deal with the image segmentation problem created not to be highly resource-dependent. For the training, we used the knowledge-distillation method using an out of shell pre-trained network as teacher network. We execute large scale experiments to qualitatively and quantitatively compare our results with those obtained with baselines. Moreover, we propose a deep study of inference times using both general-purpose and mobile architectures.
Conference Presentation
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yashar Deldjoo, Tommaso Di Noia, Eugenio Di Sciascio, Gaetano Pernisco, Vito Renò, and Ettore Stella "Towards real-time monocular depth estimation for mobile systems", Proc. SPIE 11785, Multimodal Sensing and Artificial Intelligence: Technologies and Applications II, 117850J (20 June 2021); https://doi.org/10.1117/12.2596031
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
3D modeling

RGB color model

Cameras

Sensors

Data modeling

LIDAR

Motion models

RELATED CONTENT

RCTA capstone assessment
Proceedings of SPIE (May 22 2015)
A multimodal vision sensor for autonomous driving
Proceedings of SPIE (October 07 2019)
Assessment of RCTA research
Proceedings of SPIE (May 05 2017)
A mixture model for robust registration in Kinect sensor
Proceedings of SPIE (March 08 2018)

Back to Top