Research on Tibetan Part-of-Speech Tagging Based on Transformer
- Resource Type
- Conference
- Authors
- Xiangxiu, Cairang; Qun, Nuo; Renqing, Nuobu; Nyima, Trashi; Zhao, Qijun
- Source
- 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML) Pattern Recognition and Machine Learning (PRML), 2022 3rd International Conference on. :315-320 Jul, 2022
- Subject
- Computing and Processing
Signal Processing and Analysis
Semantics
Information processing
Transforms
Tagging
Feature extraction
Transformers
Natural language processing
pre-trained language model
transformer
Tibetan part-of-speech tagging
- Language
Aiming at the key problems of ambiguity such as polysemy and unregistered words in the Tibetan part-of-speech tagging task, and the currently cited recurrent neural network cannot model long Tibetan texts. This paper proposes Tibetan part-of-speech tagging for a hybrid model of ELMo and Transformer_Encoder. Solved the problem of polysemy and unregistered words in Tibetan that were not solved by static word vectors. ELMo is a bidirectional pre-training model that can extract Tibetan word features in both positive and negative directions. The pre-trained word embeddings are combined with the Transformer’s self-attention mechanism to extract the semantic features of Tibetan sentences to predict the part-of-speech tags of each word. This method perfects the Tibetan part-of-speech tagging task and achieves about 97% accuracy on the dataset used in this paper.