Design and Development of a Human-Centered Explainable Malware Classification System Using XAI and LLMs

Ali, Ghazanfar

Design and Development of a Human-Centered Explainable Malware Classification System Using XAI and LLMs

Ali, Ghazanfar

2026-05-19

Diplomityö

Tietotekniikka

Ali_Ghazanfar_Thesis.pdf

4.21 MB

avoin

Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.

Lataukset42

Pysyvä osoite

https://urn.fi/URN:NBN:fi-fe2026060158884

Tiivistelmä

This thesis explores the interpretability challenges of AI-based cybersecurity systems. Artificial intelligence (AI) has significantly improved malware detection compared to traditional signature-based approaches. However, these AI-based systems often operate as “black boxes,” as they do not provide a rationale for their outputs, making the results difficult to trust. Security professionals require clear reasoning to make informed decisions, while non-technical users need simple explanations to understand the outcomes. To address this gap, this research proposes a human centered explainable AI (XAI) framework that combines a classification layer with traditional XAI techniques such as LIME and SHAP. In the final layer, a large language model (LLM) generates clear and interpretable explanations for human users. For this research, a balanced subset of the EMBER-2018 dataset containing Windows Portable Executable (PE) files in JSONL format was used. In the data extraction phase, 618 interpretable static features were extracted. In the classification layer, six models were implemented, with XGBoost reaching the best performance, with 97.0% accuracy and an ROC-AUC score of 0.997. In the XAI layer, LIME and SHAP were applied, identifying the compilation timestamp and high entropy as among the most important features. The LLM-based explanation layer uses lightweight local models (llama3.2:3b and deepseek-r1:1.5b), which take the top XAI features and a structured knowledge base as input. The LLM then converts these technical features into clear, human-understandable explanations for security analysts, security managers, and end users. Since the entire system operates locally without reliance on external cloud services, it enhances data security and eliminates the cost associated with API usage.

malware classification explainable AI machine learning cybersecurity large language models LIME SHAP human-centered AI

Tietueen kaikki tiedot

Design and Development of a Human-Centered Explainable Malware Classification System Using XAI and LLMs

Toimittaja(t)

Pysyvä osoite

Verkkojulkaisu

DOI

Tiivistelmä

item.page.okmtext

Design and Development of a Human-Centered Explainable Malware Classification System Using XAI and LLMs

Toimittaja(t)

Pysyvä osoite

Verkkojulkaisu

DOI

Tiivistelmä

item.page.okmtext

Avainsanat