This study presents a novel text-driven method for the editing of portraits with robustness in a sparse latent space. Due to the development of GAN and the proposal of many excellent models like StyleGAN, text-driven image editing and image generation have made great progress in recent years, but the task of generating facial images under the guidance of text is still lacking in some special situations. Our model combines and makes good use of two pre-training models, CLIP2Latent and StyleGAN2, to conduct a preliminary exploration of the above task. The latent code of the input portrait is driven to be edited and manipulated in the StyleGAN latent space via a CLIP-based text-driven generation module. Finally, some promising results have been obtained, especially in the sparse region of the generator latent space and when simultaneously changing numerous attributes.