Unmanned Aerial Vehicles (UAVs) are playing an important role in the development of smart maritime. However, images crowned with small-sized and highly dense cause the accuracy to decrease for ship detection under UAV vision. Aiming at the problem, this paper proposes an improved YOLOv5 to detect ships accurately under UAV vision and combines with deepsort to realize ship tracking. Firstly, we add a detection layer to make full use of shallow features with rich detail information in the part of feature fusion. Then, the coordinate attention is introduced in YOLOv5 to focus on more important feature information. The test results show that the accuracy, recall and average precision of the proposed SA-YOLOv5 are improved by 3.4%, 0.3% and 1.0% compared with YOLOv5. Finally, the deepsort is used as the tracker to realize the real-time ship tracking under UAV vision.