The availability of satellite imagery and the surge in popularity of machine learning approaches in remote sensing have created numerous opportunities to study deforestation detection. However, a large amount of labeled data is required for data-driven deep learning methods to achieve acceptable performance. Moreover, labeled datasets are still limited in quantity and quality and can require several years of data acquisition. In this work, we investigate the generalization across sensor modalities of optical satellites for deforestation detection. We argue that exploiting characteristics shared across satellite data, even if acquired by different sensors on board, can significantly reduce the amount of required labeled data. To this end, we explore the use of transfer learning. We observe that a pre-trained neural network outperforms a network trained from scratch.