Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.