Fusing a low-resolution hyperspectral image (LrHSI) with an auxiliary high-resolution multispectral image (HrMSI) is an effective approach to obtaining high-resolution HSI (HrHSI). However, the strong reliance on large training triplets for these data-driven methods severely blocks the practicability in real scenarios. Considering this deficiency, we propose a new unsupervised network for HSI-MSI fusion, which only depends on observed HSI-MSI pair for network optimization and hence gets rid of this critical restriction. Specifically, our network reformulates the fusion problem into a spectral mapping learning problem and consists of three steps, including latent information learning, spectral mapping learning, and target generation. The first step aims to excavate latent model information, serving as the foundation for our method. Based on the knowledge derived from the former step, we design an adaptive fusion module with spectral progressive learning to establish the spectral relationship. Eventually, HrHSI can be generated under the learned spectral mapping. Experiments in simulated dataset confirm the effectiveness of our method.