In adaptive OFDM system, transmitter waveform parameters are adjusted to adapt to wireless channels. Most of the adaptation algorithm relies on prior channel information and channel estimation results. In this paper, a new adaptive OFDM scheme is proposed based on Thompson sampling (TS) algorithm without channel knowledge. In order to find out the optimal waveform in unknown channel, the proposed algorithm establishes a multi-armed bandit (MAB) reinforcement learning model, which uses the receiver feedback to explore unknown reward distribution at each frame. The simulation results show that the proposed algorithm can reach to the best performance within 200 frames and assure rapid convergence.