The traditional methods for detecting respiratory system diseases have limitations such as invasiveness, high costs, and dosage restrictions. Therefore, it is of great significance to research and develop non-invasive, cost-effective, and convenient methods for lung audio classification. Due to the limited availability and imbalance of publicly available lung audio datasets, data augmentation techniques are used to balance the dataset and reduce the risk of overfitting. To comprehensively and accurately describe the characteristics of lung audio, time-frequency analysis techniques are employed to convert the audio into spectrograms, mel-spectrograms, and MFCC time-frequency representations as feature spectrograms, which are fused as the input to a neural network. A deep separable convolutional neural network with residual connections is proposed, and an attention mechanism is incorporated to enhance the network's focus on important features, aiming to achieve lung audio classification. The performance of the model is evaluated using the ICBHI-2017 dataset for lung audio recognition. In the six-class pathological audio classification task, the accuracy, sensitivity, precision, and F1 score reached 88.5%, 87.9%, 89.4%, and 88.3%, respectively. This approach demonstrates practical application value and promising prospects for lung audio recognition.