Since an image can be perceived by customers in few seconds, it is an effective medium for advertising and adored by advertisers. Baidu, as one of the lead search companies in the world, receives billions of text queries per day. How to feed attractive images to capture the customers’ attentions is the core task of Baidu image advertising. Traditionally, the query-to-image search is tackled by matching the text query with the image title. Nevertheless, title-based image search relies on high-quality image titles, which are not easy to be obtained or unavailable in some cases. A more reliable solution is to understand the image content and conduct content-based query-to-image retrieval. In this paper, we introduce a text-image cross-modal retrieval for advertising (TIRA) model, which has been launched in Baidu image advertising. The proposed TIRA is built upon the popularly used image classification model, ResNet and the recent state-of-the-art NLP model, BERT. It targets to bridge the modal gap by mapping the images and texts into the same feature space. Meanwhile, we propose to use contrast loss to train the TIRA model, which consistently outperforms existing methods based on pairwise loss or triplet loss. Since the proposed TIRA model directly conducts the content-based query-to-image and image-to-query retrieval, and does not rely on high-quality labeled titles, it significantly enhances the search flexibility. The TIRA model has been deployed in image2X and query2X frameworks of Baidu image advertising. After the launch of TIRA, it has achieved considerable improvement in click-through-rate (CTR) and cost per mille (CPM), which brings considerable revenue increase for advertisers.