Efficient implementation of Tomlinson-Harashima precoding (THP) is crucial in massive multiple-input-multiple-output (MIMO) systems with a large number of antennas at the base station (BS) serving many user equipments (UEs). To address the high computational complexity of THP, in this paper, we first propose novel THP update algorithms that can avoid recomputing the THP filters when a new UE arrives or departs. Specifically, by using the Gram-Schmidt process and a series of Givens matrices, the THP filters are computed without full matrix operations. Then we extend the THP update algorithms to a more general scenario when multiple multi-antenna UEs arrive or depart. In this case, the proposed algorithms use both direct and iterative approaches. Moreover, the computational complexity of the proposed algorithms is derived and compared with that of the conventional THP. Finally, to further align with the practical scenario, we analyze and derive the approximate close-form expressions for the sum achievable rate of the proposed algorithms under imperfect channel state information (CSI). Simulation results are provided to illustrate the effectiveness of the proposed algorithms. The impact of quasi-static fading and slow time-varying scenarios with imperfect CSI on the communication performance of the proposed algorithms is also evaluated.