Graph convolutional networks (GCNs) have emerged as dom-inant methods for skeleton-based action recognition. How-ever, they still suffer from two problems, namely, neighbor-hood constraints and entangled spatiotemporal feature repre-sentations. Most studies have focused on improving the de-sign of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a dis-entangled spatiotemporal transformer (DSTT) block to over-come the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition; (ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary advantages of GCN (i.e., local topology, temporal dynamics and hierarchy) and Transformer (i.e., global context and dy-namic attention). HGCT is lightweight and computationally efficient. Quantitative analysis demonstrates the superiority and good interpretability of HGCT.