Convolutional neural networks have demonstrated a dominating performance in many fields in recent years. A notable trend of CNN is that the architectures are deepening and becoming more complex, which means more parameters, higher memory footprints and longer inference times. These major aspects restrict their use in resource-constrained settings, such as embedded systems and mobile devices. Here, we propose a new architecture with different combinations of the dual backbone with different depths. Each backbone has a maximum of 5 convolutional layers; the number of convolutional layers is between 1 and 5; and the features extracted from the dual backbone are fused via summation before the first fully connected layer. The deeper backbone can increase the receptive fields by stacking more convolutional layers, and the shallower backbone can extract lower-level features, preserve better location information and provide finer details. The well-known FMNIST and CIFAR-10 datasets were selected to evaluate the network's performance. The experimental results show that almost all evaluation indexes of dual backbone networks with different depths are better than those of VGGs, ResNeXt and DenseNet.