Multi-view hash representation learning is a crucial technique for multimedia retrieval. However, current methods have limitations in integrating the features of multiple views, leading to limited retrieval accuracy. Most studies use fusion methods such as weighted sum or concatenation, which fail to capture the complementarity and consistency between different views. We propose a novel Deep Fusion Multi-View Hashing (DFMVH) method to address the mentioned problems. For the first time, we propose multi-view hierarchical central learning, which effectively solves the complementarity and consistency issue of multi-view fusion. Additionally, we propose a multi-view feature fusion module based on Transformer and verify its superiority through experiments. Our results show that DFMVH outperforms state-of-the-art methods on benchmark datasets such as MIR-Flickr25K, NUS-WIDE, and MS COCO, achieving a significant mean average precision (mAP) increase of up to 12.15%.