Traffic prediction is a crucial task in intelligent transportation systems, which can help achieve effective management and optimization of traffic congestion. However, due to the complexity and uncertainty of traffic systems, accurate traffic prediction has always been a challenging problem. The specific challenge of this task is how to model traffic dynamics along the dimensions of temporal and spatial in a reasonable manner while respecting and utilizing the spatial and temporal heterogeneity of traffic data. To address the aforementioned challenges, this paper proposes a new Transformer-based approach for traffic prediction. Specifically, to accurately model complex spatial correlations, we design a spatial Transformer layer combined with clustering, which reduces computational complexity and mitigates the risk of over-fitting. To model dynamic nonlinear temporal correlations, we introduce dilated attention, which benefits from a global receptive field conducive to long-term predictions. To validate the effectiveness of our proposed model, we conduct experiments on four real-world traffic datasets. The experimental results demonstrate that our model outperforms state-of-the-art baselines. Furthermore, we conduct comparative experiments to demonstrate that both the spatial clustering and dilated attention modules contribute to the overall improvement of the model's performance.