In educational research, user-simulation interaction is gaining importance as it provides key insights into the effectiveness of simulation-based learning and immersive technologies. A common approach to study user-simulation interaction involves manually analyzing participant interaction in real-time or via video recordings, which is a tedious process. Surveys/questionnaires are also commonly used but are open to subjectivity and only provide qualitative data. The tool proposed in this paper, which we call Environmental Detection for User-Simulation Interaction Measurement (EDUSIM), is a publicly available video analytics tool that receives screen-recorded video input from participants interacting with a simulated environment and outputs statistical data related to time spent in pre-defined areas of interest within the simulation model. The proposed tool utilizes machine learning, namely multi-classification Convolutional Neural Networks, to provide an efficient, automated process for extracting such navigation data. EDUSIM also implements a binary classification model to flag imperfect input video data such as video frames that are outside the specified simulation environment. To assess the efficacy of the tool, we implement a set of immersive simulation-based learning (ISBL) modules in an undergraduate database course, where learners record their screens as they interact with a simulation to complete their ISBL assignments. We then use the EDUSIM tool to analyze the videos collected and compare the tool’s outputs with the expected results obtained by manually analyzing the videos.