Vulnerability Analysis of Open-source Software using LLMs: A Case Study of cJSON with Comparative Evaluation

Handun Kuttige, Shashika

Vulnerability Analysis of Open-source Software using LLMs: A Case Study of cJSON with Comparative Evaluation

Handun Kuttige, Shashika (2025-07-29)

Vulnerability Analysis of Open-source Software using LLMs: A Case Study of cJSON with Comparative Evaluation

Handun Kuttige, Shashika

(29.07.2025)

Katso/Avaa

Harshani_Shashika_Thesis.pdf (1.107Mb)

Lataukset:

Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.

suljettu

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025080581130

Tiivistelmä

The increasing utilization of Large Language Models (LLMs) in software engineering has provided novel opportunities for static code analysis and automated vulnerability detection. This thesis evaluates the capability of contemporary LLMs to identify security flaws in the widely used C-based parsing library cJSON in a systematic and reproducible manner.

Key objectives are to assess LLM performance in detecting known vulnerabilities in static source code of Open-Source Software (OSS) libraries, to compare the effectiveness of model architecture and prompt engineering methods, and to cross-check results against a ground-truth collection of certified vulnerabilities.

Five different prompt templates, each mirroring a distinct security analyst profile, were used to test four different LLMs (Llama, Gemma, Qwen, and DeepSeek). The output of every model was evaluated for accuracy and precision in detecting null pointer dereference and buffer overflow vulnerabilities.

The test reveals a significant discrepancy in the detection capability among models and prompts. Although high-impact vulnerabilities were occasionally detected by some models, all performed poorly in terms of precision, were overwhelmed by excessive false positives, and uniformly failed to localize vulnerabilities correctly. The most critical determinant of output quality was prompt design, since specific prompts consistently produced more accurate and useful output than others. This approach demonstrates that model performance is not entirely a question of architecture, but also susceptible to task definition, a factor of practical significance when applying LLMs in secure software development. The findings indicate that current LLMs are unreliable as independent, static analysis tools and require significant human interaction, even though prompt design can reduce noise in some models. This thesis gives a reproducible setup for testing LLMs in software security and provides suggestions for future research and development on automated vulnerability discovery.

Kokoelmat

Pro gradu -tutkielmat ja diplomityöt sekä syventävien opintojen opinnäytetyöt (rajattu näkyvyys) [5310]