Single object tracking (SOT) based on deep learning methods has been developed by leaps and bounds. Nevertheless, these methods seldom take spectral information into consideration, which does not utilize the spectral properties of objects. Since that spectral imaging can probably distinguish a particular object more discriminatively, a more effective model is expected to be proposed for the exploration of spectral information. To study the advantages of the hyperspectral image in extracting spectral features of the target in a tracking task, this work proposed a novel Spectral-spatial-aware Transformer Fusion Network (SSATFN) for hyperspectral single object tracking, which efficiently combines spectral and spatial features with the template and search region branches. Specifically, the method focuses on multiband feature fusion by Hyperspectral Transformer Self-attention (HTSA) and Hyperspectral Transformer Cross-attention (HTCA). Meanwhile, a multi-scale spectral feature fusion auxiliary branch is utilized for template enhancement. Finally, we present the online tracking learning networks to fine-tune the last two layers of the feature extraction network. Quantitative experiments are conducted on a close-up hyperspectral video dataset, and verified that the proposed SSATFN achieves promising tracking performances, compared with the other state-of-the-art trackers.