The goal of group activity prediction is to infer the group activity involved multiple individuals before it is completely executed. Previous methods focused on capturing pair-wise relationships between individuals, but lacked the exploration of group-wise interactions which can provide global guidance from a macroscopic perspective. To further explore the group-wise interaction, we propose a Group Residual Module (GRM) which constructs a virtual leader node to summarize the group representation and designs a bidirectional message passing mechanism to build the bridge between group and individuals. To capture the spatial-temporal correlation jointly, we propose a Spatial-Temporal Group Residual Network composed of spatial GRMs and temporal GRMs. Different from existing methods that obtain additional information from the complete activity execution, temporal masks in the temporal GRMs are designed to enforce our network to excavate as much discriminative information as possible from the observed activity sequence. Moreover, experimental results show that our network achieves state-of-the-art performance on Volleyball Dataset and Collective Activity Dataset.