Paper
1 December 2023 Visual-language modal hybrid tracking algorithm
Long Cheng, Rui Li
Author Affiliations +
Proceedings Volume 12940, Third International Conference on Control and Intelligent Robotics (ICCIR 2023); 1294037 (2023) https://doi.org/10.1117/12.3010594
Event: Third International Conference on Control and Intelligent Robotics (ICCIR 2023), 2023, Sipsongpanna, China
Abstract
Visual and linguistic modalities provide complementary information for various computer vision tasks. This paper proposes a novel approach for improving the tracking algorithm SiamBAN by integrating visual and language modalities. We introduce the concept of a visual-language modal mixer, which combines visual features and language representations to enhance the tracking performance. Specifically, we leverage a language model to extract semantic features from language descriptions and align them with visual features using a linear layer. The VL Modal Mixer is implemented through the Hadamard product operator, preserving spatial information. The mixed features are then fused with visual features through a residual connection to retain fine-grained visual details. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed method, achieving state-of-the-art performance in terms of accuracy and robustness. Our work contributes to the advancement of multimodal tracking algorithms and opens up new possibilities for integrating visual and linguistic cues in computer vision tasks.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Long Cheng and Rui Li "Visual-language modal hybrid tracking algorithm", Proc. SPIE 12940, Third International Conference on Control and Intelligent Robotics (ICCIR 2023), 1294037 (1 December 2023); https://doi.org/10.1117/12.3010594
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Visualization

Feature fusion

Semantics

Feature selection

Optical tracking

Video

Back to Top