Security Analysis of LLM-Generated Web API Backends
| dc.contributor.author | Khan, Abdul Ali | |
| dc.contributor.department | fi=Tietotekniikan laitos|en=Department of Computing| | |
| dc.contributor.faculty | fi=Teknillinen tiedekunta|en=Faculty of Technology| | |
| dc.contributor.studysubject | fi=Tietotekniikka|en=Information and Communication Technology| | |
| dc.date.accessioned | 2026-04-29T22:47:15Z | |
| dc.date.issued | 2026-03-26 | |
| dc.description.abstract | The adoption of Large Language Models (LLMs) in software engineering is changing how code is written, but the security implications for complex systems remain unclear. Previous research has primarily evaluated the security of LLM-generated code on isolated code snippets. However, this narrow scope cannot capture the security risks that emerge in integrated web API backends. To address this, we designed a benchmarking framework derived from a data-driven triangulation of Stack Overflow discussions, GitHub implementations and OWASP security risks. This yielded five representative tasks: Authentication, Role-Based Access Control, File Uploads, Payment Processing and Webhook Handling. We evaluated three state-of-the-art models (GPT-5.2, DeepSeek V3.2 and Gemini 2.5 Pro) using a multi-layered assessment methodology combining static application security testing (SAST), dynamic application security testing (DAST), and manual penetration testing. Scanning the generated APIs for vulnerabilities with SAST tools mainly revealed configuration-level issues. However, upon conducting manual penetration testing, we identified mass assignment exposures, insecure execution ordering, and server-side request forgery vectors. The outputs from the LLMs also pointed towards a disconnect between functional correctness and secure logic. The model that demonstrated high build success (DeepSeek V3.2) produced the most vulnerable code in our trials, introducing severe logic flaws, such as broken object-level authorization (BOLA). On the contrary, the model that struggled most with syntax (Gemini 2.5 Pro) defaulted to safer but less functional implementations. This thesis formally terms this pattern as the human-in-the-loop paradox, in which syntactically sound and well-structured code generated by an LLM may conceal deep architectural vulnerabilities that are not revealed by build success or surface-level inspection. These findings indicate that relying on build success or static analysis alone may create a misleading illusion of correctness. Based on the findings from the literature review and security analysis experiments, the thesis presents recommendations to assist individuals and organizations in utilizing LLM-generated API backends more effectively. We suggest that code produced by current LLMs should not be treated as a trusted draft, but rather as untrusted input from an external system, much like user input in a web form. | |
| dc.format.extent | 144 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/60085 | |
| dc.identifier.urn | URN:NBN:fi-fe2026041628161 | |
| dc.language.iso | eng | |
| dc.rights | fi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.| | |
| dc.rights.accessrights | avoin | |
| dc.subject | large language models | |
| dc.subject | software security | |
| dc.subject | web development | |
| dc.subject | security vulnerabilities | |
| dc.subject | static analysis | |
| dc.subject | dynamic analysis | |
| dc.subject | systematic literature review | |
| dc.title | Security Analysis of LLM-Generated Web API Backends | |
| dc.type.ontasot | fi=Diplomityö|en=Master's thesis| |
Tiedostot
1 - 1 / 1