Deep learning (DL) has revolutionized machine learning tasks in various domains, but conventional DL methods often demand substantial amounts of labeled data. Semi-supervised learning (SSL) provides an effective solution by incorporating unlabeled data, offering significant advantages in terms of cost and data accessibility. While DL has shown promise with its integration as a component of modern network intrusion detection systems (NIDS), the majority of research in this field focuses on fully supervised learning. However, more recent SSL algorithms leveraging data augmentations do not perform optimally “out of the box” due to the absence of suitable augmentation schemes for packet-level network traffic data. Through the introduction of a novel data augmentation scheme tailored to packet-level network traffic datasets, this paper presents a comprehensive analysis of multiple SSL algorithms for multi-class network traffic detection in a few-shot learning scenario. We find that even relatively simple approaches like vanilla pseudo-labeling can achieve an F1-Score that is within 5% of fully supervised learning methods while utilizing less than 2% of the labeled data.
|