Keywords

Data Mining, Malware Detection, Machine Learning, Classification, Instruction Sequences, Signature Extraction, Predictive Modeling, Supervised Learning, Unsupervised Learning, Feature Selection, Feature Reduction

Abstract

This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval and classification techniques and borrows a number of ideas from the field. We used a vector space model to represent the programs in our collection. Our data mining framework includes two separate and distinct classes of experiments. The first are the supervised learning experiments that used a dataset, consisting of several thousand malicious and clean program samples to train, validate and test, an array of classifiers. In the second class of experiments, we proposed using sequential association analysis for feature selection and automatic signature extraction. With our experiments, we were able to achieve as high as 98.4% detection rate and as low as 1.9% false positive rate on novel malwares.

Notes

If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu

Graduation Date

2008

Advisor

Wang, Morgan

Degree

Doctor of Philosophy (Ph.D.)

College

College of Sciences

Degree Program

Modeling and Simulation

Format

application/pdf

Identifier

CFE0002303

URL

http://purl.fcla.edu/fcla/etd/CFE0002303

Language

English

Release Date

September 2008

Length of Campus-only Access

None

Access Status

Doctoral Dissertation (Open Access)

STARS Citation

Siddiqui, Muazzam, "Data Mining Methods For Malware Detection" (2008). Electronic Theses and Dissertations. 3709.
https://stars.library.ucf.edu/etd/3709

Download

Included in

Categorical Data Analysis Commons

COinS

Electronic Theses and Dissertations

Data Mining Methods For Malware Detection

Keywords

Abstract

Notes

Graduation Date

Advisor

Degree

College

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Browse Advisors

Explore

Connect

Electronic Theses and Dissertations

Data Mining Methods For Malware Detection

Author

Keywords

Abstract

Notes

Graduation Date

Advisor

Degree

College

Degree Program

Format

Identifier

URL

Language

Release Date

Length of Campus-only Access

Access Status

STARS Citation

Included in

Share

Browse Advisors

Explore

Connect