In response to the challenges in data association arising from semantic conflicts between detection and re-identification tasks, as well as the difficulty in capturing long-range dependencies with convolutional operations in multi-object tracking methods, we propose a single-stage real-time multi-object tracking method called STAsMOT, based on the Swin Transformer. Initially, we employ a feature extractor composed of Swin Transformer with an FPN (Feature Pyramid Network) using attention mechanisms to extract deep features from all targets, alleviating issues related to capturing long-range dependencies. Subsequently, through cross-correlation, self-correlation attention modules, and a triple-attention module, the fused features extracted by the backbone network are efficiently decoupled and distributed to both detection and re-identification branches. The detection branch is designed using an anchor-free approach for output. We evaluate our approach on publicly available MOT16 and MOT17 datasets, comparing it with state-of-the-art methods. Results indicate that our method offers superior multi-object tracking performance while achieving real-time tracking and demonstrating robustness in complex scenarios.