Visual perception technology is crucial for autonomous driving. It can detect dynamic and static objects around the vehicle, which is used for downstream tasks such as obstacle avoidance and navigation. LiDAR-based methods face challenges from costly sensors and extensive computational demands. In contrast, camera-based approaches are more cost-effective, but achieving a bird's-eye-view (BEV) requires perspective transformation, often leading to image distortions. To address this, we spotlight our novel contribution: the IPM-Transformer. This transformative framework integrates the mathematical geometry of Inverse-Perspective-Mapping (IPM) with data-driven deep learning, offering a robust solution to mitigate the image distortions and improve the effect of perspective transformations. We have also developed a novel framework that reconstructs a local map, showcasing both the road layout and vehicle occupancy, in the bird's-eye-view using only surround-view images. Our model achieves state-of-the-art (SOTA) results in the vehicle occupancy category on public datasets and achieves SOTA results across all categories on our released HuanYu dataset.