학술논문

Home

자료검색

학술논문

검색결과 돌아가기

검색화면

내보내기 프린트

ImageVista: Training-Free Text-to-Image Generation with Multilingual Input Text

Resource Type: Conference
Authors: Kaushar, Shamina; Agarwal, Yash; Saha, Anirban; Pramanik, Dipanjan; Das, Nabanita; Sadhukhan, Bikash
Source: 2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT) Intelligent Data Communication Technologies and Internet of Things (IDCIoT), 2024 2nd International Conference on. :1357-1363 Jan, 2024
Subject: Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Image synthesis
Navigation
Pipelines
Semantics
Refining
Generative adversarial networks
User experience
Generative Adversarial Network (GAN)
Contrastive Language-Image Pretraining (CLIP)
Augmented Contrastive Language-Image Pretraining (AugCLIP)
Frechet Inception Distance (FID)
Inception Score (IS)
Vector Quantized Generative Adversarial Network (VQGAN)
Language

Online Access

Full Text (IEEE)

초록

This study presents a text-to-image generation approach employing the contrastive language-image pretraining (CLIP) + generative adversarial network (GAN) paradigm, focusing on optimizing semantic relevance within a pretrained GAN's latent space. The method facilitates the generation of zero-shot models and allows for modifications using various generators. To address the challenges in CLIP score optimization within the GAN domain, this study introduces the FuseDream pipeline. This pipeline elevates image quality through the AugCLIP score, an optimization strategy for efficiently navigating nonconvex landscapes, and a composite generation technique for mitigating data bias. FuseDream produces high-quality images featuring diverse objects, backgrounds, and artistic styles based on textual prompts. Notably, it achieves top-tier Inception and FID scores on the MS COCO dataset, indicating superior performance. Code modifications are implemented to enhance FuseDream's efficacy in te xt-to-i mage synthesis, contributing to its versatility and robustness. Overall, the study demonstrates the effectiveness of the proposed approach, demonstrating advancements in text-to-image synthesis, particularly in overcoming challenges related to CLIP score optimization within the GAN framework. The FuseDream pipeline has emerged as a comprehensive solution that combines optimization strategies and diverse generation techniques to achieve remarkable results in image synthesis.

공지

DAU Library

학술논문

요약정보

ImageVista: Training-Free Text-to-Image Generation with Multilingual Input Text

Online Access

초록