Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms

Bingyin Tang; Fan Feng

doi:10.1117/1.JEI.33.3.033002

2 May 2024 Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms

Bingyin Tang, Fan Feng

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 3, 033002 (May 2024). https://doi.org/10.1117/1.JEI.33.3.033002

Abstract

We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.

Citation Download Citation

Bingyin Tang and Fan Feng "Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms," Journal of Electronic Imaging 33(3), 033002 (2 May 2024). https://doi.org/10.1117/1.JEI.33.3.033002

Received: 1 January 2024; Accepted: 11 April 2024; Published: 2 May 2024

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
15 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Transformers

Data modeling

Image processing

Performance modeling

Super resolution

Image quality

Education and training

Show All Keywords

Keywords/Phrases

Search In:

Publication Years