Dual-encoder-based image-text fusion algorithm

Min Xia; Zhonghai Wu

doi:10.1117/12.3035185

19 July 2024 Dual-encoder-based image-text fusion algorithm

Min Xia, Zhonghai Wu

Proceedings Volume 13213, International Conference on Image Processing and Artificial Intelligence (ICIPAl 2024); 132130X (2024) https://doi.org/10.1117/12.3035185
Event: International Conference on Image Processing and Artificial Intelligence (ICIPAl2024), 2024, Suzhou, China

Abstract

Many sectors are challenged by how to effectively represent knowledge in files that contain multiple images closely related to text, and how to make models understand the relationship between images and text. Contrastive Language-Image Pre-training (CLIP) and Bootstrapping Language-Image Pre-training (BLIP) acquire the capability of understanding the image-text relationship through large-scale model pre-training. CLIP not only considers images and their related text but also contrasts images with massive irrelevant text, to improve its capability of generalizing the relationship between images and related text. BLIP enhances its understanding of complex image-text relationships by pre-training and fine-tuning matched image-text pairs. This paper presents an image-text fusion algorithm based on CLIP and BLIP, which gives an accurate and consistent picture of the image-text relevance by fully using CLIP’s image-text generalization capacity and BLIP’s capacity of understanding the complex image-text relationship.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Min Xia and Zhonghai Wu "Dual-encoder-based image-text fusion algorithm", Proc. SPIE 13213, International Conference on Image Processing and Artificial Intelligence (ICIPAl 2024), 132130X (19 July 2024); https://doi.org/10.1117/12.3035185

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available