Service robots have been widely used in many indoor scenes, but their interaction ability based on action recognition has been developed slowly. In this work, we focus on human action recognition for service robots. We find that the existing action recognition datasets are rarely based on robot perspective, and do not focus on human-robot interaction. Therefore, we propose a multi-modal visual dataset named THU-HRIA dataset on the perspective of service robot, including total of eight human daily actions and interactive actions. Based on this dataset, we generated dataset for model training through data division and expansion, then selected several advanced GCN networks and improved their accuracy through transfer learning and finally compared the impact of various factors on recognition performance. According to the experimental results, we design a prototype of action recognition system, which can carry out end-to-end real-time action recognition on a laptop. We propose a time-frames constraint sampling strategy for this system and prove the feasibility of the system through experiments.