Synthetic aperture radar (SAR) imagery has been used as a promising data source for automatic ship detection with its capability of targeting objects in all-day and all-weather conditions. Due to the scarce labeled SAR images, previous research usually adopts optical data-based pre-training model to support SAR ship detectors and has made significant progress. Nonetheless, two fundamental issues persist: (i) SAR images suffer from serious intrinsic speckle noise with less texture features; (ii) Strong class-discriminative biases for imaging perspectives and geometry are introduced in the process of transferring pre-trained model. In this paper, we propose a multimodal contrastive pre-training framework (MCPF) to boost automatic ship detection with both intra- and cross-modal correlations between SAR and optical images. Specifically, MCPF implements multimodal contrastive learning strategy by introducing cross-modal contrastive module (CCM) and intra-modal contrastive module (ICM). CCM aims to pull the matched SAR-Optical pairs together while pushing those non-matched apart by measuring the similarity between latent representations. ICM maximizes agreement between differently augmented views of the same data example within each modality. To the best of our knowledge, MCPF is the first work that takes into account multimodal contrastive representation learning for automatic ship detection. Extensive experiments on benchmark datasets demonstrate that MCPF outperforms the existing methods and achieves the state-of-the-art performance.