Tensors are naturally suitable for representing high-dimensional data. Tensor train decomposition is an effective data processing method to cope with high-dimensional tensors. It is widely used in many fields, such as recommendation system, data completion and dimension reduction. However, experiments show that the traditional tensor decomposition method is only suitable for processing small-scale data. With the increase of the amount of data, the traditional algorithm will not be able to meet the efficiency of processing data. Therefore, this paper improves the traditional tensor train decomposition algorithm. Based on the most crucial step—SVD in the algorithm process, we first divide the matrix into column blocks, and then, considering the storage characteristics of cache, we call multiple threads to process different submatrix blocks. Each thread calls the one-sided Jacobi algorithm respectively to parallelize the SVD process of the matrix. In this paper, performance comparison experiments are carried out on simulated tensor data. The experimental results demonstrate that this method shows good scalability and can greatly improve the speed of tensor train decomposition.