Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method
- Resource Type
- Conference
- Authors
- Rusinol, Marcal; Aldavert, David; Toledo, Ricardo; Llados, Josep
- Source
- 2011 International Conference on Document Analysis and Recognition Document Analysis and Recognition (ICDAR), 2011 International Conference on. :63-67 Sep, 2011
- Subject
- Computing and Processing
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
General Topics for Engineers
Visualization
Semantics
Indexing
Image segmentation
Large scale integration
Hidden Markov models
Feature extraction
Word Spotting
Heterogeneous Document Collections
Dense SIFT Features
Latent Semantic Indexing
- Language
- ISSN
- 1520-5363
2379-2140
In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.