GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state
- Resource Type
- Working Paper
- Authors
- Bian, Junyi; Huang, Xiaodi; Zhou, Hong; Zhu, Shanfeng
- Source
- Subject
- Computer Science - Computation and Language
- Language
Extracting summaries from long documents can be regarded as sentence classification using the structural information of the documents. How to use such structural information to summarize a document is challenging. In this paper, we propose GoSum, a novel graph and reinforcement learning based extractive model for long-paper summarization. In particular, GoSum encodes sentence states in reinforcement learning by building a heterogeneous graph for each input document at different discourse levels. An edge in the graph reflects the discourse hierarchy of a document for restraining the semantic drifts across section boundaries. We evaluate GoSum on two datasets of scientific articles summarization: PubMed and arXiv. The experimental results have demonstrated that GoSum achieve state-of-the-art results compared with strong baselines of both extractive and abstractive models. The ablation studies further validate that the performance of our GoSum benefits from the use of discourse information.