Multivariate long-term prediction is ubiquitous in many domains, such as meteorology, hydrology, and finance. But, over time, the joint distribution shift of non-stationary time series data will increase, making long-term prediction more challenging. Previous studies were more inclined to design novel architectures, add more inductive bias to capture the overall trend and reduce the non-stationarity of the series to reduce the prediction error. These methods, however, disregard the multivariate correlation between multiple time series. Also, most prediction models use the mean squared error (MSE) as a loss function to help optimize the model. This reduces prediction error but is incapable of capturing abrupt local changes. This paper proposes a model based on the transformer architecture, called the Koopformer, by mixing the Koopman neural operator and spatial-temporal attention for the first time to tackle the above challenges. Specifically, we design a Koopman neural operator to solve the distribution shift problem caused by non-stationarity. The operator uses the learned Koopman space and global shared features to model time series in a less non-linear way and initially obtain robust predictions, resulting in a model with more substantial inductive bias. Second, we suggest a new spatial-temporal attention mechanism that can model time and multivariate dependencies in multiple time series and take into account the effects of local temporal and multivariate correlations on prediction results. We also suggest a new loss function that focuses on the difference between the prediction and the ground truth and the similarity of shape to get more accurate and realistic results. Evaluations on five baseline data sets show that Koopformer performs better than traditional and state-of-the-art methods, predicting more precise and realistic results. We will open-source the code after the paper is accepted.