In the multiple-input multiple-output (MIMO) system, the base station allows the number of users that accessing the system to exceed the maximum number of simultaneously supportable users. In order to eliminate the inter-user interference and make full use of the system resources, the base station is required to periodically select a user subset from all admitted users and send data to the users in the subset. This process is called user scheduling. The channel state information (CSI) of all the admitted users should be obtained for the existing greedy-based user scheduling algorithms to perform user selection. In the MIMO system with time division duplex (TDD) mode, these algorithms will produce massive pilot contamination and reduce the user’s achievable data rate. In this paper, we propose a learning-aided user scheduling algorithm for TDD MIMO system. The proposed algorithm formulates the user scheduling problem as a multi-play multi-armed bandit (MP-MAB) problem, and acquires the optimal scheduled user subset based on the Thompson sampling (TS). The proposed algorithm only needs to estimate the channel matrix of the scheduled users, and the unscheduled users are not required to send the pilot sequence. Through reducing the number of users participating in the pilot training, the proposed algorithm mitigates the pilot contamination caused by solving the user scheduling problem. The numerical results show that the proposed algorithm can obtain better performance than the existing user scheduling algorithms by reducing the pilot contamination.