This work presents a spatio-temporal Conditional Random Field (CRF) based model for crop recognition from multi-temporal remote sensing image sequences. The association potential at each image site is based on the class posterior probabilities computed by a Random Forest (RF) classifier given the features at the corresponding site. A contrast-sensitive Potts model is used as a label smoothing method in the spatial domain, whereas the interactions in the temporal domain are modeled based on expert knowledge about the possible transitions between adjacent epochs. The CRF based model was tested for crop mapping in two subtropical areas based on a sequences of 9 Landsat and 14 Sentinel-1 images from Ipuã, São Paulo and Campo Verde, Mato Grosso, respectively, two municipalities in Brazil. The experiments showed significant improvements of the accumulated F1 score per class against a mono-temporal CRF approach of up to 50% and 75% for a total of 8 and 11 classes using Optical and SAR images respectively.