Multi-modal entity alignment aims to identify equivalent entities between two individual multi-modal knowledge graphs, and this technique has an essential role in integrating knowledge from different data sources. However, most previous works directly adopt simple concatenation or weighted sums as their fusion strategy, ignoring the inter-modal interactions of entities, which leads to potential noise introduced from uni-modal features during the feature fusion stage. The reason for this potential noise is that most previous works use separate encoders to encode each uni-modal information and lack inter-modal information interactions, which leads to the existence of highly similar but non-equivalent entities within each uni-modal feature space. This is seen as a potential noise in the feature fusion stage, and as this work follows the convention of one-to-one alignment constraint, this potential noise is bound to impair the model's performance. This paper proposes IMILEA, an Inter-modal Interactions Learning based Entity Alignment approach for exploring inter-modal interactions, to reduce the harm caused by potential noise in the feature fusion process. In addition, this paper also proposes a strategy of negative sample weighting, which can improve the model's performance by increasing the model's attention to the hard-to-distinguish negative samples. Experiments on two public datasets show that the proposed model in this paper provides state-of-the-art performance.