The advancement of microelectromechanical systems (MEMSs) and the Internet of Things (IoT) have enabled a wide range of applications based on smartphones. However, the existing navigation methods using these low-cost MEMS sensors cannot provide acceptable information for location-based applications in various environments. Their technical limitations, such as severe signal attenuation, reflections, blockages, error accumulation, and low quality of images degrade the performance of global navigation satellite system (GNSS), inertial navigation system (INS), and camera. To mitigate these limitations, especially in indoor vehicle navigation, we first analyze the performance of the existing fusion algorithm, then we propose semantic proximity update (SPU) based on a pretrained model of real-time object detection to enhance the integration of GNSS, INS, and visual INS (VINS). SPU consists of the detection of geo-referenced objects and the relative movement to infer the absolute position. The proposed INS/GNSS/VINS/SPU can maintain long-term acceptable accuracy regardless of the indoor/outdoor environment. It only requires the use of smartphone sensors; thus, this scheme has no additional cost for users. Experimental results indicated that the errors of this scheme in horizontal positioning and 3-D positioning were 51.6% and 86.8% lower, respectively, than those of a conventional integration.