On the Rate of Convergence of a Classifier Based on a Transformer Encoder
- Resource Type
- Periodical
- Authors
- Gurevych, I.; Kohler, M.; Sahin, G.G.
- Source
- IEEE Transactions on Information Theory IEEE Trans. Inform. Theory Information Theory, IEEE Transactions on. 68(12):8139-8155 Dec, 2022
- Subject
- Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Transformers
Convergence
Pattern recognition
Natural language processing
Encoding
Electronic mail
Deep learning
Curse of dimensionality
transformer
classification
rate of convergence
- Language
- ISSN
- 0018-9448
1557-9654
Pattern recognition based on a high-dimensional predictor is considered. A classifier is defined which is based on a Transformer encoder. The rate of convergence of the misclassification probability of the classifier towards the optimal misclassification probability is analyzed. It is shown that this classifier is able to circumvent the curse of dimensionality provided the a posteriori probability satisfies a suitable hierarchical composition model. Furthermore, the difference between the Transformer classifiers theoretically analyzed in this paper and the ones used in practice today is illustrated by means of classification problems in natural language processing.