Depression is a common mental illness and the second leading cause of disability worldwide. Traditional depression diagnosis requires communication with patients and subjective cooperation of patients, which consumes a lot of manpower, material resources, and time costs. With the accumulation of user data in social media and the development of natural language processing, computer-aided diagnosis is realized, better and objective analysis is provided, and a new idea for the diagnosis of depression is provided. We propose a multimodal model based on EmoBERTa and Transformer and a text preprocessing method for a specific pre-trained model and application context. We use user behavior information and past tweets to detect whether a user has depression. Our model achieves state-of-the-art performance on a publicly available multimodal Twitter dataset.