Android Malware Detection using LLM
Karunarathna Rajapakshe Mudiyanselage, Madura (2025-07-31)
Android Malware Detection using LLM
Karunarathna Rajapakshe Mudiyanselage, Madura
(31.07.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025080881601
https://urn.fi/URN:NBN:fi-fe2025080881601
Tiivistelmä
Android’s widespread adoption has made it a prime target for mobile malware, posing significant threats to user privacy, financial security and device security. Traditional malware detection approaches, such as signature-based and heuristic
algorithms, frequently fail to keep up with the dynamic nature of Android malware. To solve this issue, this thesis provides a static analysis-based malware detection system that uses fine-tuned transformer models, notably BERT, to categorize Android
apps. The system extracts permission ermissions from AndroidManifest.xml and API call sequences from smali code, which are then treated as textual features suitable for language model input. Three different BERT-based classifiers were trained:
one with permissions, one with API calls, and one with a combined feature set. The final categorization decision is determined using an ensemble majority-voting approach. Experimental results from the CIC-AndMal2017 dataset indicate that the
combined-feature model outperformed both single-feature models and traditional baselines, with an accuracy of 92% and an F1-score of 0.915. The system was deployed as a real-time detection service with a FastAPI backend and a React-based web frontend, allowing for easy malware investigation. This research illustrates the feasibility of using large language models for static malware detection and provides a scalable framework for incorporating future dynamic or hybrid analysis methods.
algorithms, frequently fail to keep up with the dynamic nature of Android malware. To solve this issue, this thesis provides a static analysis-based malware detection system that uses fine-tuned transformer models, notably BERT, to categorize Android
apps. The system extracts permission ermissions from AndroidManifest.xml and API call sequences from smali code, which are then treated as textual features suitable for language model input. Three different BERT-based classifiers were trained:
one with permissions, one with API calls, and one with a combined feature set. The final categorization decision is determined using an ensemble majority-voting approach. Experimental results from the CIC-AndMal2017 dataset indicate that the
combined-feature model outperformed both single-feature models and traditional baselines, with an accuracy of 92% and an F1-score of 0.915. The system was deployed as a real-time detection service with a FastAPI backend and a React-based web frontend, allowing for easy malware investigation. This research illustrates the feasibility of using large language models for static malware detection and provides a scalable framework for incorporating future dynamic or hybrid analysis methods.