Title
Detecting Trojans Using Data Mining Techniques
Keywords
Data Mining; Disassembly; Principal Component Analysis; Random Forest; Support Vector Machines; Trojan Detection
Abstract
A trojan horse is a program that surreptitiously performs its operation under the guise of a legitimate program. Traditional approaches using signatures to detect these programs pose little danger to new and unseen samples whose signatures are not available. The focus of malware research is shifting from using signature patterns to identifying the malicious behavior displayed by these malwares. This paper presents the novel idea of extracting variable length instruction sequences that can identify trojans from clean programs using data mining techniques. The analysis is facilitated by the program control flow information contained in the instruction sequences. Based on general statistics gathered from these instruction sequences, we formulated the problem as a binary classification problem and built random forest, bagging and support vector machine classifiers. Our approach showed a 94.0% detection rate on novel trojans whose data was not used in the model building process. © Springer-Verlag Berlin Heidelberg 2008.
Publication Date
12-1-2009
Publication Title
Communications in Computer and Information Science
Volume
20
Number of Pages
400-411
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
78649845943 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/78649845943
STARS Citation
Siddiqui, Muazzam; Wang, Morgan C.; and Lee, Joohan, "Detecting Trojans Using Data Mining Techniques" (2009). Scopus Export 2000s. 11350.
https://stars.library.ucf.edu/scopus2000/11350