ORCID

0009-0002-8144-5302

Keywords

Natural Language Processing (NLP), Android App Permissions, Android Security and Privacy, Malware Detection and Classification, Machine Learning

Abstract

Malicious applications continue to pose significant privacy and security risks within the Android ecosystem, often exploiting user permissions and obscuring data collection practices behind opaque privacy policies. To address these challenges, this dissertation presents a comprehensive framework for enhancing Android app security through dataset construction, permission behavior analysis, and privacy policy classification. The framework systematically investigates key issues such as dataset fidelity, permission misuse, model performance, and the alignment between declared and actual data practices. Through a combination of empirical analysis and machine learning techniques, this work advances the development of more transparent and secure mobile applications. The first part introduces Troid, a large-scale Android dataset of 5,028 applications labeled using VirusTotal and enriched with metadata and static features. This study identifies limitations in existing malware datasets and proposes improvements in labeling, metadata integration, and malware family classification to support reproducible Android security research. The second part leverages Troid to analyze Android permission behaviors through a longitudinal and genre-based study of how permissions are requested across benign and malicious apps. Using semantic categorization and association rule mining, it uncovers frequent permission combinations and highlights trends in usage tied to monetization strategies and privacy risks. The third part also builds on Troid by presenting a machine learning approach to classifying privacy policy segments based on the permissions they describe. By training transformer-based models such as BERT, RoBERTa, and DistilBERT, and combining them via ensemble learning, the study bridges the gap between privacy policies and app behaviors, offering a scalable method for automated policy auditing. Together, these three studies form a unified framework that addresses critical gaps in Android security research. The findings offer practical tools for malware detection, permission transparency, and privacy compliance, supporting the development of safer mobile ecosystems and informing future regulatory and technical interventions.

Completion Date

2025

Semester

Summer

Committee Chair

David Mohaisen

Degree

Doctor of Philosophy (Ph.D.)

College

College of Engineering and Computer Science

Department

Department of Computer Science and Engineering

Format

PDF

Identifier

DP0029507

Language

English

Document Type

Thesis

Campus Location

Orlando (Main) Campus

Share

COinS