The objective of this paper is to investigate fine-grained 3D face reconstruction. Recently, many methods based on 3DMM and mixed multiple low-level losses for unsupervised or weak supervised learning have achieved some results. Based on this, we propose a more flexible framework that can reserve original structure and directly learn detail information in 3D space, which associates 3DMM parameterized model with an end-to-end self-supervised system. To encourage high-quality reconstruction, residual learning is introduced. Evaluations on popular benchmarks show that our approach can attain comparable state-of-the-art performance. In addition, our framework is applied to a reconstruction system for interaction.