With the advent of global digital transformation, using an intelligent method based on deep learning to extract crucial information from semi-structured documents, as represented by various types of receipts and invoices, has emerged as an imperative measure to ensure business stability, data security, and improved work efficiency. This paper provides a detailed review on deep learning-based techniques for information extraction, with systematic introduction, hierarchical analysis, method comparison, and summary with expectations for future development. The review begins with a comprehensive explication of the defining characteristics of semi-structured documents, along with a detailed introduction to the research background, application areas, and technical challenges related to information extraction from semi-structured documents. Then the review extends to an overview of two developmental stages, i.e. the shift from traditional information extraction to deep learning-based information extraction, followed by discussion about technical architecture and method classification, which elaborates on key technologies in terms of typical datasets, detection and recognition, and information reduction. Lastly, paper summarizes the prospects and development in the field. Future research will focus on strengthening algorithm universal and lightweight, as well as improving information protection capabilities and the diversity of datasets.