Current multimodal deep learning approaches rarely explicitly exploit the dependencies inherent in multiple labels, which are crucial for multimodal multi-label classification. In this paper, we propose a multimodal deep learning approach for multi-label classification. Specifically, we introduce deep networks for feature representation learning and construct classifiers with the objective function which is constrained with dependencies among both labels and modals. We further propose effective training algorithm to learn deep networks and classifiers jointly. Thus, we explicitly leverage the relations among labels and modals to facilitate multimodal multi-label classification. Experiments of multi-label classification and cross-modal retrieval on the Pascal VOC dataset and the La-belMe dataset demonstrate the effectiveness of the proposed approach.