Human-machine dialogue is one of the most challenging tasks in the field of natural language processing, and it is also the basis for the realization of a human-machine inclusive society in the future. At present, the generative dialogue model based on deep learning is prone to generate general responses with single content and no meaningful information, and the existing research considers emotional factors relatively little. Aiming at the shortcomings of existing methods, an emotional dialogue generation model based on Transformer and conditional variational autoencoder(CVAE) is proposed, which is intended to improve the diversity of response information and embed emotional factors in the generated response. This model uses Transformer to extract the semantic features of the text sequence to improve the utilization of the semantic information of the text sequence. To increase the diversity of the response information, the latent variable of the conditional variational self-encoder is introduced into the decoder. In addition, in order to enhance the empathy ability of the model, an emotion perception encoder is used to encode user emotion information, and a pre-trained emotion classification model based on BERT is proposed to detect the emotion information implicit in the utterance. Experiments have shown that the proposed model has a stronger generation ability, more diverse information to generate responses, and a stronger empathy ability.