It is one of the most fundamental and challenging problems for current dialogue generation systems to maintain the consistency of dialogue logic during the conversation. Also, the lack of an open-source annotated persona-based dialogue dataset may lead to insufficient training volume for the model. Besides, the computational time and computational memory required by the attention mechanism have become drastically large. In order to overcome these problems, in this work, we propose a vertical-structure model based on the BERT model with the sentence embedding method. The model generates a raw response based on the sentence embeddings of context and persona and finally revises the raw response according to the persona. Moreover, an understanding task is designed for the BERT decoder to have a better revision ability. Considering the difference between the generation and the understanding models, three kinds of input methods are designed for each part of the model. Comparative and experimental results are presented using publicly available datasets.