Emotional Reaction Intensity(ERI) estimation is an important task in multimodal scenarios, and has fundamental applications in medicine, safe driving and other fields. In this paper, we propose a solution to the ERI challenge of the fifth Affective Behavior Analysis in-the-wild(ABAW), a dual-branch based multi-output regression model. The spatial attention mechanism is used to better extract visual features, and the Mel-Frequency Cepstral Coefficients technology extracts acoustic features. Temporal Encoder is composed of Temporal Convolutional Network and Transformer Encoder, which is used to capture the temporal relationship between features. And a method named modality dropout is added to fusion multimodal features. Our approach for ERI challenge achieves Pearson’s Correlation Coefficient of 0.4439 on the validation set and 0.4380 on the test set, which ranks second in the final leaderboard.