Object detection is a classic problem in computer vision. The main bottleneck of object detection lies in the fusion of multi-scale features. In this paper, we systematically study the design choices of neural network architecture for real-time object detection, and propose an Align-Yolact to improve the instance segmentation accuracy. Firstly, we propose a weighted bounding box, which improves the accurate positioning of the bounding box. Secondly, we add a bi-directional feature pyramid network to the feature fusion, which improves the mask quality and small target accuracy. Owing to these optimizations and better backbones, we achieve the SOTA results including both detection efficiency and accuracy.