Fusing the raw data from different automotive sensors for real-world environment perception is still challenging due to their different representations and data formats. In this work, we propose a novel method termed High Dimensional Frustum PointNet for 3D object detection in the context of autonomous driving. Motivated by the goals data diversity and lossless processing of the data, our deep learning approach directly and jointly uses the raw data from the camera, LiDAR, and radar. In more detail, given 2D region proposals and classification from camera images, a high dimensional convolution operator captures local features from a point cloud enhanced with color and temporal information. Radars are used as adaptive plug-in sensors to refine object detection performance. As shown by an extensive evaluation on the nuScenes 3D detection benchmark, our network outperforms most of the previous methods.