Most visual SLAM systems exhibit strong robustness in static scenes. When there are dynamic objects in the scene, these systems tend to treat dynamic objects as static backgrounds for pose estimation, which reduces the accuracy of pose estimation. Based on Yolov5 and Mask R-CNN, this paper proposes a method to eliminate dynamic objects for estimating poses, improving the performance of SLAM system for localization and mapping in dynamic scene. Firstly, Yolov5 is used for the preliminary detection of dynamic objects, the purpose of this step is to obtain the anchor boxes of dynamic objects. Then, these anchor boxes are used as the input of Mask R-CNN to obtain the mask information of dynamic objects. Finally, the results of feature extraction is combined with the mask information to eliminate the point features in the regions which are marked as dynamic objects. The remaining point features are used as input for accurate pose estimation. The method is tested on standard KITTI datasets. Experimental results show that the accuracy of pose estimation with this method is higher than that of the original ORB-SLAM2.