Currently, face videos manipulated using deep learning models are widely spread on social media, which violates personal privacy and may disturb social security. In this study, we start by discovering the essential differences between real and fake faces to improve the generalisability of the model. By this, it was found that the artefacts usually differ in size and generally appear in different places in the image. To extract multiscale artefacts and increase the perceptual field of the downsampling layer, we introduce atrous spatial pyramid pooling (ASPP). Considering the drawbacks of ASPP, we designed a cross-level attention (CLA) module to interact the output of the ASPP block with the backbone. Our CLA module allows the network to focus on locally manipulated areas without destroying other features of the model. Experimental results on the large publicly available facial manipulation database Faceforensics++ confirms the effectiveness of our proposed method.