This study presents a text-to-image generation approach employing the contrastive language-image pretraining (CLIP) + generative adversarial network (GAN) paradigm, focusing on optimizing semantic relevance within a pretrained GAN's latent space. The method facilitates the generation of zero-shot models and allows for modifications using various generators. To address the challenges in CLIP score optimization within the GAN domain, this study introduces the FuseDream pipeline. This pipeline elevates image quality through the AugCLIP score, an optimization strategy for efficiently navigating nonconvex landscapes, and a composite generation technique for mitigating data bias. FuseDream produces high-quality images featuring diverse objects, backgrounds, and artistic styles based on textual prompts. Notably, it achieves top-tier Inception and FID scores on the MS COCO dataset, indicating superior performance. Code modifications are implemented to enhance FuseDream's efficacy in te xt-to-i mage synthesis, contributing to its versatility and robustness. Overall, the study demonstrates the effectiveness of the proposed approach, demonstrating advancements in text-to-image synthesis, particularly in overcoming challenges related to CLIP score optimization within the GAN framework. The FuseDream pipeline has emerged as a comprehensive solution that combines optimization strategies and diverse generation techniques to achieve remarkable results in image synthesis.