In the field of human-robot interaction, action recognition is a challenging problem. In this paper, a residual activation fish-shaped network is proposed for action recognition, which contains 3 parts of fish tail, fish body and fish head. A multi-feature input model with physiognomic Cartesian motion features and intrinsic Geometric features is constructed, which can eliminate the influence of changes in camera depth of field and observation orientation. An extended residual convolution structure is designed to utilize global information to refine coupled useful sub-features, and learn a structured semantic representations on skeletons of each frame. Experimental results show that the proposed method achieves an accuracy of 80.96% on the JHMDB dataset, 95.02% on SHREC14, and 93.16% on SHREC28. In addition, a human-robot interaction experiment is conducted, which verifies the effectiveness of the proposed action recognition method.