Spatiotemporal modeling is the key element of group activity recognition. Most of the proposed methods deduce the relationship between characters based on spatiotemporal information, while ignoring the inter frame information. This paper proposes a dual channel feature processing model. In the first channel, a temporal semantic relation subgraph network is proposed to explore the relationship between video frames. In the other channel, a Personal Feature Walk Module (PFWM) is designed to capture the dynamic information of character features in the video frame. The outputs of the two channels are combined to classify the group activity. In addition, Backbone Exciting Module (BEM) stimulates the backbone’s motion sensitive channel from three aspects of space-time, channel and motion is used. Mixed Pooling Module (MPM) makes the network more efficient to obtain the information in a wide range of receptive fields, and enhances the basic feature extraction of characters in video frames. The model is tested on the most widely used datasets in the field of group activity recognition, and excellent results are obtained, which proves the effectiveness of the model.