On dataset biases in a learning system with minimum a priori information for intrusion detection
- Resource Type
- Conference
- Authors
- Kayacik, H.G.; Zincir-Heywood, A.N.; Heywood, M.I.
- Source
- Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004. Communication networks and services Communication Networks and Services Research, 2004. Proceedings. Second Annual Conference on. :181-189 2004
- Subject
- Communication, Networking and Broadcast Technologies
Computing and Processing
Learning systems
Intrusion detection
Machine learning
System testing
Intelligent networks
Communication networks
Self organizing feature maps
Fingerprint recognition
Payloads
Neurons
- Language
A critical design decision in the construction of intrusion detection systems is often the selection of features describing the characteristics of the data being learnt. Selecting features often requires a priori or expert knowledge and may lead to the introduction of specific attack biases ntended or otherwise. To this end, summarized network connections from the DARPA 98 Lincoln Labs dataset are employed for training and testing a data driven learning architecture. The learning architecture is composed from a hierarchy of self-organizing feature maps. Such a scheme is entirely unsupervised, thus the quality of the intrusion detection system is directly influenced by the quality of the dataset. Dataset biases are investigated through three different dataset partitions: 10% KDD (default training dataset); normal connections alone; 50/50 mix of attack and normal. The three resulting intrusion detection systems appear to be competitive with the alternative cluster based data-mining approaches.