Semantic Textual Similarity using Machine Learning and Conceptual Relatedness
- Resource Type
- Authors
- Hira Javed; Shivam Varshney; Priyanka Sharma
- Source
- SSRN Electronic Journal.
- Subject
- business.industry
Computer science
computer.software_genre
Pearson product-moment correlation coefficient
SemEval
symbols.namesake
n-gram
Character (mathematics)
Semantic similarity
Similarity (network science)
symbols
Artificial intelligence
business
computer
Equivalence (measure theory)
Natural language processing
Word (computer architecture)
- Language
- ISSN
- 1556-5068
Large amount of data is available in today’s world which can’t be stored in physical devices. This data contains huge amount of redundant information which could be grouped together and categorized. We present a system which gives the degree of equivalence between two statements i.e. Semantic Textual Similarity (STS). Given two textual fragments, the goal of the system is to determine their semantic similarity i.e. how much are they similar in terms of their meaning. Our system makes use of four different measures of text similarity: 1. Word n-gram overlap. 2. Character n-gram overlap. 3. Se-mantic overlap. 4. Conceptual overlap. Using these measures as features, it trains a sup-port vector regression model on SemEval STS data. Evaluation is done using the Pearson Correlation Coefficient.