In this study, we approach the problem of image captioning with cycle consistent generative adversarial networks (CycleGANs). Due to CycleGANs’ ability to learn functions to map between multiple domains and use duality to strengthen each individual mapping with the usage of a cycle consistency loss, these models show great promise in their ability to learn both image captioning and image synthesis and to create a better image captioning framework. Historically, cycle consistency loss was based on the premise that the input should undergo little to no change when mapped to another domain and then back to its original; however, image captioning presents a unique challenge to this concept due to the many-to-many nature of the mapping from images to captions and vice-versa. TextCycleGAN overcomes this obstacle through utilization of cycle consistency in the feature space and is, thereby, able to perform well on both image captioning and synthesis. We will demonstrate its capability as an image captioning framework and discuss how its model architecture makes this possible.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.