Autonomous vehicles are expected to save almost half-million lives between 2035 to 2045. Moreover, since 90% of the accidents are caused by humans, 9% by weather and road conditions, and only 1% by vehicular failures, autonomous vehicles will provide much safer traffic, drastically decreasing the number of accidents. To perceive the surrounding objects and environment, autonomous vehicles depend on their sensor systems such as cameras, LiDARs, radars, and sonars. Traditionally, decision fusion is performed, implying into first individually processing each sensor’s data and then combining the processed information of the different sensors. In contrast to the traditional decision fusion of the processed information from each sensor, the raw data fusion extracts information from all sensors’ raw data providing higher reliability and accuracy in terms of object and environment perception. This paper proposes an improved sensor fusion framework based on You Only Look Once (YOLO) that jointly processes the raw data from cameras and LiDARs. To validate our framework, the dataset of the Karlsruhe Institute of Technology (KITTI) in partnership with Toyota Technical University generated using two cameras, and a Velodyne laser scanner is considered. The proposed raw data fusion framework outperforms the traditional decision fusion framework with a gain of 5% in terms of vehicle detection performance.