2 May 2024 Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms
Bingyin Tang, Fan Feng
Author Affiliations +
Abstract

We introduce a method for efficient and expressive high-resolution image synthesis, harnessing the power of variational autoencoders (VAEs) and transformers with sparse attention (SA) mechanisms. By utilizing VAEs, we can establish a context-rich vocabulary of image constituents, thereby capturing intricate image features in a superior manner compared with traditional techniques. Subsequently, we employ SA mechanisms within our transformer model, improving computational efficiency while dealing with long sequences inherent to high-resolution images. Extending beyond traditional conditional synthesis, our model successfully integrates both nonspatial and spatial information while also incorporating temporal dynamics, enabling sequential image synthesis. Through rigorous experiments, we demonstrate our method’s effectiveness in semantically guided synthesis of megapixel images. Our findings substantiate this method as a significant contribution to the field of high-resolution image synthesis.

© 2024 SPIE and IS&T
Bingyin Tang and Fan Feng "Efficient and expressive high-resolution image synthesis via variational autoencoder-enriched transformers with sparse attention mechanisms," Journal of Electronic Imaging 33(3), 033002 (2 May 2024). https://doi.org/10.1117/1.JEI.33.3.033002
Received: 1 January 2024; Accepted: 11 April 2024; Published: 2 May 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Data modeling

Image processing

Performance modeling

Super resolution

Image quality

Education and training

Back to Top