Not all DGAs are Born the Same – Improving Lexicographic based Detection of DGA Domains through AI/ML
- Resource Type
- Conference
- Authors
- Aravena, L. Torrealba; Casas, P.; Bustos-Jimenez, J.; Capdehourat, G.; Findrik, M.
- Source
- 2023 7th Network Traffic Measurement and Analysis Conference (TMA) Network Traffic Measurement and Analysis Conference (TMA), 2023 7th. :1-4 Jun, 2023
- Subject
- Communication, Networking and Broadcast Technologies
Accesslists
Frequency-domain analysis
Current measurement
Botnet
Telecommunication traffic
Machine learning
Feature extraction
DGA Detection
n-grams
Lexicographic Analysis
DNS
Machine Learning
- Language
Timely identification of DNS queries to Domain Generation Algorithm (DGA) domains is crucial to limit malware propagation and its potential impact, particularly to prevent coordinated activities of botnets. We explore an approach for swift detection of DGA-generated domains by analyzing lexicographic features exclusively derived from the domain name as observed in a DNS query. We propose a reputation-based scoring system for domain names, based on the co-occurrence frequency of $n$-grams with respect to a list of well-known benign domains or whitelist. We further extract meaningful features from domain names and employ machine learning techniques to enhance detection performance. Experimental results on detecting 25 different families of DGA domains reveal that combining reputation scores with other basic lexicographic features largely outperforms current state of the art approaches.