In the current image fusion field, most models driven by deep learning mainly use attention mechanism as the backbone to extract feature images, which often has the problem of a large number of model parameters. Inspired by ConvNeXt, we propose a ConvNeXt-based image fusion algorithm, which uses multiple Residual ConvNeXt Blocks for feature extraction, and the Residual ConvNeXt Block mainly consists of layers of depthwise convolution with 7 × 7 kernel, which can effectively extract the information of the image in spatial dimension. In order to verify the effectiveness and superiority of the proposed network, we select 21 pairs of visible and infrared images from the TNO dataset as the test set and make a comprehensive comparison with seven kinds of networks. The experimental results show that our network achieves the best performance in several evaluation metrics and obtains a subjectively clearer visual result.