Query language modeling for voice search
- Resource Type
- Conference
- Authors
- Chelba, C.; Schalkwyk, J.; Brants, T.; Ha, V.; Harb, B.; Neveitt, W.; Parada, C.; Xu, P.
- Source
- 2010 IEEE Spoken Language Technology Workshop Spoken Language Technology Workshop (SLT), 2010 IEEE. :127-132 Dec, 2010
- Subject
- Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
General Topics for Engineers
Vocabulary
Data models
Smoothing methods
USA Councils
Training data
Speech recognition
Transducers
language modeling
voice search
query stream
- Language
The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-vocabulary (OoV) rates below 1% for a one million word vocabulary. We present a comprehensive set of experiments that guided the design decisions for a voice search service. In the process we re-discovered a less known interaction between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that hints at non-stationarity of the query stream, as well as strong dependence on various English locales—USA, Britain and Australia.