Poster + Paper
22 November 2024 Leveraging zero-shot models for enhanced mixed reality systems
Author Affiliations +
Conference Poster
Abstract
Significant advancements in Mixed Reality (MR) systems, which blend physical and digital worlds, have been observed in recent years. However, accurate detection and recognition of objects in diverse and dynamic environments remain a key challenge. Traditional methods often struggle to adapt to the variability and complexity of real-world scenarios. This paper proposes a novel approach to address this challenge by leveraging zero-shot models, specifically a Large Multimodal Model (LMM) known for its promising capabilities in Visual Question Answering (VQA), object detection, and segmentation mask generation.

Zero-shot capabilities, the ability to identify objects without prior training, present a potential game-changer for MR systems. However, these abilities can be limited in specific domains, leading to recommendations for fine-tuning the model for optimal performance.

This paper presents a method to fine-tune the LMM for MR systems, focusing on improving object detection and recognition in diverse environments. This approach is demonstrated in a case study involving object detection in MR environments, a domain where foundational models typically do not perform well.

Results show significant improvements in the performance of the MR system, with the fine-tuned LMM demonstrating superior object detection and recognition capabilities. This research opens up new possibilities for the application of zero-shot models in MR, paving the way for more immersive, interactive, and accurate mixed reality experiences. The implications of this research extend beyond MR, offering insights into how zero-shot models can be optimized for various specific domains.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Maksim Sorokin, Dmitry Zhdanov, Andrey Zhdanov, and Madina Sinetova "Leveraging zero-shot models for enhanced mixed reality systems", Proc. SPIE 13239, Optoelectronic Imaging and Multimedia Technology XI, 1323927 (22 November 2024); https://doi.org/10.1117/12.3044527
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mixed reality

Systems modeling

3D modeling

Education and training

Machine learning

Data modeling

Point clouds

Back to Top