Date of Award


Document Type


Degree Name

Master of Science (MS)


School of Information Technology: Information Systems

First Advisor

Yongning Tang


The amount of data that is being generated continues to rapidly grow in size and complexity. Frameworks such as Apache Hadoop and Apache Spark are evolving at a rapid rate as organizations are building data driven applications to gain competitive advantages. Data analytics frameworks decomposes our problems to build applications that are more than just inference and can help make predictions as well as prescriptions to problems in real time instead of batch processes.

Information Security is becoming more important to organizations as the Internet and cloud technologies become more integrated with their internal processes. The number of attacks and attack vectors has been increasing steadily over the years. Border defense measures (e.g. Intrusion Detection Systems) are no longer enough to identify and stop attackers.

Data driven information security is not a new approach to solving information security; however there is an increased emphasis on combining heterogeneous sources to

gain a broader view of the problem instead of isolated systems. Stitching together multiple alerts into a cohesive system can increase the number of True Positives.

With the increased concern of unknown insider threats and zero-day attacks, identifying unknown attack vectors becomes more difficult. Previous research has shown that with as little as 10 commands it is possible to identify a masquerade attack against a user's profile.

This thesis is going to look at a data driven information security architecture that relies on both behavioral analysis of SSH profiles and bad actor data collected from an SSH honeypot to identify bad actor attack vectors. Honeypots should collect only data from bad actors; therefore have a high True Positive rate. Using Apache Spark and Apache Hadoop we can create a real time data driven architecture that can collect and analyze new bad actor behaviors from honeypot data and monitor legitimate user accounts to create predictive and prescriptive models. Previously unidentified attack vectors can be cataloged for review.


Imported from ProQuest Sanders_ilstu_0092N_10487.pdf


Page Count