The Impact of Trace Generalization on Anomaly Detection Systems

S. S. Murtaza, A. Hamou-Lhadj, M. Couture, "The Impact of Trace Generalization on Anomaly Detection Systems", To Appear in Telecommunication Systems Journal, Springer.

Prior researchers in the area of anomaly detection focus essentially on applying different algorithms and heuristics to improve the accuracy of the anomaly detection process. However, high false positives, large trace sizes and high processing time are still the main issue in anomaly detection despite the use of different algorithms. This means the problem is not just with the application of algorithms but also with the characteristics of the dataset. In this paper, we investigate whether the removal of contiguous repetitions of system call traces (a form of trace generalization) impacts the accuracy (false positive rate and true positive rate) and performance of an anomaly detection algorithm. We use the sliding window algorithm and Hidden Markov Model (HMM) on traces that do not contain the contiguous repetitions of system calls and on actual traces with repetitions of system calls. When applied to University of New Mexico (UNM) dataset, our findings show that the removal of contiguous repetitions results in trace reduction of 40% and a gain in trace processing time (parsing and model building) of 10-50%. In the case of the sliding window algorithm, we found that removing contiguous repetitions can reduce significantly false positives when applied to the UNM dataset. The false positive rate, however, remains the same in the case of HMM. These findings suggest that removing contiguous repetitions of system calls from traces could be beneficial for anomaly detection. We recognize however that further experiments with other datasets and different types of attacks are needed to confirm our findings.