Few-shot scene classification aims to develop models that can quickly adapt to new scenes with only a few labeled samples that are not present in training sets. In recent years, convolutional neural networks (CNNs) have made significant advancements in few-shot remote sensing image scene classification tasks. However, most existing approaches focus solely on utilizing high-level embeddings of remote sensing images to learn similarity relations, while neglecting intrinsic hierarchical representations that could be crucial in distinguishing scenes with substantial interclass similarities. To address this limitation, we propose a novel few-shot scene classification method for remote sensing images called hierarchical-relation network (HiReNet). This approach leverages the hierarchical features of a query sample and its corresponding support sample to learn discriminative representations. HiReNet consists of an embedding network and a relation network. The embedding network employs a Siamese architecture to extract representations, while the relation network utilizes these representations for classification. Within the relation network, we introduce a hierarchical relation learning (HRL) structure to capture the hierarchical relations among query and support samples. Additionally, to extract stronger features, we introduce a feature aggregation module that concatenates multilevel features and employs channel attention to re- weight these features. Experimental results demonstrate the superior performance of our HiReNet compared to several state-of-the-art few-shot scene classification methods.