One of the challenges of RGB-D-based robotic grasping technology is how to make full use of these two complementary heterogeneous data sources while ensuring real-time performance. The existing robotic grasping methods mainly extract information from single-mode images or use a one-branch network to process RGB and depth images. These methods are insufficient in fully fusing the effective information of the two modes, which limits the anti-jamming performance. In this work, we propose Attention Dense Fusion Network (ADFNet), a novel RGB-D based robotic grasping system that directly predicts the optimal grasping pose from RGB-D images. Our system uses a two-branch network with a heterogeneous framework that processes RGB and depth image information separately and retain the original structure of each data source. After applying dense fusion networks at different scales, the high-dimensional features in the RGB and depth branches are embedded and fused at the pixel level. Besides, we incorporate attention mechanisms to effectively suppress the independent background regions of the feature map and enhance the significant features, which improves the prediction accuracy. We conduct qualitative and quantitative experiments on the standard Cornell Grasp Dataset. The experimental results show that ADFNet can effectively improve the prediction accuracy to 98.9 %, which is better than existing methods while ensuing real-time performance.