Malicious traffic detection is an important task in network security, which protects the target network from privacy leakage and service paralysis. The complexity of the network and the hierarchical structure of network traffic, i.e, byte-packet-flow, indicate the diversity of traffic information. Most of the existing work only uses one feature or statistical feature, and cannot learn network traffic from multiple perspectives, i.e, shortsighted, which results in the lack of important information in network traffic. Meanwhile, after obtaining multiple features, the effective fusion of multiple features is also an urgent problem to be solved. In this paper, we propose a Multiple Features Fusion with Attention Networks (MFFAN). According to the hierarchical structure of network traffic, we extract byte, packet, and statistical features from original traffic files to learn traffic from multiple perspectives, overcoming shortsighted. To effectively fuse multiple features, we use the self-attention to learn the intra-feature relationship with each feature and use the co-attention to learn the inter-feature relationship between features. We conduct experiments on the ISCIDS2012 dataset and CICIDS2017 dataset, and the results show that our model achieves an effective fusion of multiple features and high accuracy.