Graduation Term

2021

Degree Name

Master of Science (MS)

Department

School of Information Technology: Information Systems

Committee Chair

Yongning Tang

Abstract

Machine learning has shown strong potential in improving the performance of an Intrusion Detection Systems (IDS). In a machine learning based IDS, the problem is commonly formulated as a supervised classification, in which various training datasets are used to train a selected model to learn how various network features are related to different types (i.e., benign traffic or a type of network attack) of network traffic. Each training dataset usually includes a large amount of data samples, and each data sample contains many network features and their associated type of traffic called label. Most recent studies focus on developing a better machine learning model to achieve higher performance in an IDS. Very little research has been done in understanding the quality of training datasets, especially mislabeling affects the performance of a machine learning based IDS.In this thesis, we focus on the mislabeling issue in a machine learning based IDS. We first show the impact of mislabeling on the performance of such an IDS. Then, we propose a new algorithm called Heuristic Mislabel Identification (HMI) based on Data Shapley [6] to identify mislabels in training datasets. Based on different mislabeling scenarios, HMI heuristically and iteratively divides a training dataset into multiple groups to narrow down the location or range of mislabels. We have evaluated our method using a widely adopted IDS training dataset (i.e., CICIDS2017). The evaluation results show that HMI can identify 84% random mislabels and 78% mislabels from a single data source. The precision on both experiment above is 100% which means the suspect group must contain mislabeling samples.

Access Type

Thesis-Open Access

Recommended Citation

Li, Bofan, "Identifying Mislabeling in Machine Learning Based Intrusion Detection System" (2021). Theses and Dissertations. 1494.
https://ir.library.illinoisstate.edu/etd/1494

DOI

https://doi.org/10.30707/ETD2021.20220215070317590203.999987

Download

COinS

Theses and Dissertations

Identifying Mislabeling in Machine Learning Based Intrusion Detection System

Graduation Term

Degree Name

Department

Committee Chair

Abstract

Access Type

Recommended Citation

DOI

Search

Browse

Contribute

Theses and Dissertations

Identifying Mislabeling in Machine Learning Based Intrusion Detection System

Author

Graduation Term

Degree Name

Department

Committee Chair

Abstract

Access Type

Recommended Citation

DOI

Share

Search

Browse

Contribute