Crowd counting is generally a pixel-wise regression task and owns the broad application prospects. In order to generate high-quality density maps, recent remarkable works mostly adopt the long skip connection or encoder-decoder architectures which aggregate features via the step-by-step upsampling operations and direct contextual information fusion. Despite the great achievements, the researchers would ignore the negative effects of feature alignment and semantic imbalance caused by the above architectures. Faced with the mentioned problem, we propose a deep architecture called Feature Refinement Network (FRNet) for crowd counting. Our FRNet is composed of two major schemes: the Inter-dimensional Refinement Module (IRM) and the Semantic Refinement Module (SRM). In particular, the IRM attempts to explore the discriminative triple attention and the transformation point offsets to jointly repair the serious spatial dislocation between adjacent features and enhance the description ability. Meanwhile, the SRM explicitly learns the semantic interdependence within features of diverse levels to relieve the semantic imbalance caused by the rough context fusion. Finally, extensive experiments are conducted on multiple crowd counting datasets, the ShanghaiTech, the UCF-QNRF, the NWPU-Crowd, the JHU-CROWD++. The results have indicated the superior performance of our FRNet against the state-of-the-art methods.