Released
February 14, 2022
Language
Python
license
MPL-2.0
NeDaGen generates realistic datasets for training and evaluating a Network Intrusion Detection System (NIDS) by simulating a real infrastructure and real attacks.
CONTEXT AND BACKGROUND

Contemporary Network Intrusion Detection Systems (NDIS) typically incorporate artificial intelligence (i.e. machine learning and deep learning) techniques. Training such systems requires representative and commonly accepted network-based datasets. Such datasets are rare, however, and frequently lack the required volume of traffic and an appropriate attack diversity. A fundamental cause for this scarcity is that reliable datasets can often not be published due to privacy issues. As a result, most publicly available datasets do not reflect current attack techniques or are subject to data anonymization, resulting in missing or incorrect metadata. Training an IDS with such suboptimal data is generally not very effective.

NeDaGen (Network traffic Dataset Generator) is a flexible and expandable tool that generates labelled network traffic datasets by simulating a real infrastructure and real attacks. It is capable of building user-defined (customizable) networks and simulating both benign and malicious network traffic, thus generating a labelled dataset that can properly facilitate the evaluation of a Network-based Intrusion Detection System (NIDS).

SOFTWARE

NeDaGen is a Python tool that incorporates ContainerLab to deploy containerized networks and atomic-operator to execute MITRE ATT&CK techniques. It creates a heterogeneous enterprise network that consists of internal, external and demilitarized subnets. The internal network features administration, development and operations clients that contain user-specified scripts for the generation of corresponding (benign) network traffic. Through atomic-operator, user-defined definition files can be used to simulate a malicious threat agent, thus introducing malicious network traffic into the dataset. The adversary is configurable and can be arbitrarily placed inside any of the networks. The resulting dataset is delivered in the form of packet captures or in JavaScript Object Notation (JSON).

SOURCE PROJECT

NeDaGen was the result of a student assignment that was issued by the University of Amsterdam and supervised by TNO.

Skip to content