A Novel Summarization-based Approach for Feature Reduction Enhancing Text Classification Accuracy
- Resource Type
- Authors
- J. J. C. Prasad Yadav; S. Rahamat Basha; J. Keziya Rani
- Source
- Engineering, Technology & Applied Science Research, Vol 9, Iss 6 (2019)
- Subject
- text classification
Computer science
dimension reduction
Feature extraction
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Feature selection
computer.software_genre
feature selection
Similarity (network science)
lcsh:Technology (General)
Feature (machine learning)
feature clustering
lcsh:T58.5-58.64
lcsh:Information technology
business.industry
Dimensionality reduction
feature extraction
summarization
Automatic summarization
Constraint (information theory)
lcsh:TA1-2040
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
lcsh:T1-995
Artificial intelligence
lcsh:Engineering (General). Civil engineering (General)
business
computer
Natural language processing
Sentence
- Language
- English
Automatic summarization is the process of shortening one (in single document summarization) or multiple documents (in multi-document summarization). In this paper, a new feature selection method for the nearest neighbor classifier by summarizing the original training documents based on sentence importance measure is proposed. Our approach for single document summarization uses two measures for sentence similarity: the frequency of the terms in one sentence and the similarity of that sentence to other sentences. All sentences were ranked accordingly and the sentences with top ranks (with a threshold constraint) were selected for summarization. The summary of every document in the corpus is taken into a new document used for the summarization evaluation process.