In-vehicle systems can lead to high cognitive load that impairs driving performance. Interfaces that can detect and adapt to cognitive load accordingly may alleviate these effects. Previous research explored machine learning models to classify drivers’ cognitive load based on physiological signals but most conducted training and testing on data from the same participants (i.e., within-driver partitioning), which raises generalizability and practical feasibility concerns. In this paper, we explored the performance of widely-used models by training and testing them on data from different subjects (i.e., across-drivers partitioning), and further compared them with a more recent model that is effective for time-series data, the recurrent neural network (RNN). A driving simulator dataset was used to classify 2 levels of cognitive load (external cognitive secondary task vs. no task). All models performed better with within-driver partitioning. RNN outperformed other models with mean accuracies of 88.1% and 85.6% with within-driver and across-drivers partitioning, respectively.