Agentic AI for Autonomous Risk Triage
and False Positive Reduction in Static
Security Analysis
University of Turku
Department of Computing, Faculty of Technology
Master’s Degree Programme in Information and Communication Technology
Cyber Security Engineering
June 2025
Author:
Sumit Bakane
Supervisor:
Saku Lindroos (University of Turku)
Tahir Mohammad (University of Turku)
The originality of this thesis has been checked in accordance with the University of Turku quality assurance
system using the Turnitin OriginalityCheck service.
UNIVERSITY OF TURKU
Department of Computing, Faculty of Technology
Sumit Bakane: Designing Agentic AI for Autonomous Risk Triage and False
Positive Reduction in Static Security Analysis
Master’s Degree Programme in Information and Communication Technology, 88 p.
Cyber Security Engineering
June 2025
To detect the flaws early in the software development cycle, static security analy-
sis techniques are largely adopted in modern software development. These tools,
however, have poor risk prioritization abilities, fragmented results, and high false
positives that result in substantial manual effort and delayed or even ignored fix-
ing. Additionally, as software development increasingly relies on automated security
tools, the challenge of managing high volumes of false alerts has become critical.
Through the integration of rule-based heuristics and large language model-based
reasoning in a modular, agent-based framework, this thesis proposes a novel Agentic
AI system that aims to work on these limitations. The system normalizes the output
of various static analysis tools, such as GitLeaks (detection of secrets), OWASP
Dependency-Check (SCA), and Semgrep (SAST), into one schema prior to sending
it through a pipeline of autonomous agents. This normalisation process makes this
approach tool-friendly, as any security tool can be easily integrated with Agentic AI
architecture through normalisation layer.
By employing a multi-layered architecture within the CI/CD Pipelines, the system
processes raw outputs into actionable insights, significantly improving developer ef-
ficiency and trust in security tools. Empirical evaluation demonstrates a 35.29%
reduction in false positives, alongside improved risk prioritization and user engage-
ment through a web-based dashboard.
In addition, the system is compliant with popular secure development guidelines,
including OWASP SAMM, BSIMM, and NIST SSDF, that ensure compliance with
standards, auditing, and traceability. Overall, this research contributes to the field
of application security by providing a scalable, intelligent solution that aligns with
industry standards and enhances the overall security posture of software develop-
ment lifecycles.
Keywords: Static Security Analysis, False-positive reduction, Agentic AI, Applica-
tion Security, Agentic AI in Application Security, Autonomous Risk Triage
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Research objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Scope of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature review 8
2.1 Secure Software Development Lifecycle (SSDLC) . . . . . . . . . . . . 8
2.1.1 Security involved in the phases of SDLC . . . . . . . . . . . . 9
2.1.2 Importance of early-stage security (Shift-Left approach) . . . . 11
2.1.3 Role of AppSec in modern development environments . . . . . 12
2.2 Risk Management in Software Development . . . . . . . . . . . . . . 14
2.2.1 Introduction to threat modeling . . . . . . . . . . . . . . . . . 14
2.2.2 Risk Triage: Risk Assessment and Risk Prioritization . . . . . 16
2.2.3 How inaccurate prioritization leads to wasted resources . . . . 17
2.3 Static Security Analysis Techniques . . . . . . . . . . . . . . . . . . . 17
2.3.1 Fundamentals of static code analysis (SAST and SCA) . . . . 18
2.3.2 Importance of SBOM in open-source security . . . . . . . . . . 18
i
2.3.3 Benefits of Static Security Analysis over Dynamic Testing . . 19
2.3.4 Limitations of static approaches compared to dynamic methods 20
2.4 Security Scanning Tools and Their Challenges . . . . . . . . . . . . . 21
2.4.1 Overview of common and open-source tools . . . . . . . . . . 21
2.4.2 Use cases: SAST, SCA, and security scanner tools . . . . . . . 22
2.4.3 Problem of fragmented outputs and tool interoperability . . . 23
2.5 False Positives in Static Analysis . . . . . . . . . . . . . . . . . . . . 24
2.5.1 Impact on developer workflows and security posture . . . . . . 26
2.5.2 Existing approaches to reduce false positives . . . . . . . . . . 27
2.6 AI and LLMs in AppSec . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.1 Applications of AI in cybersecurity . . . . . . . . . . . . . . . 29
2.6.2 Introduction to LLMs and their potential in AppSec . . . . . . 31
2.7 Agentic AI: Concepts and Use Cases . . . . . . . . . . . . . . . . . . 32
2.7.1 Agentic AI use cases in software engineering, security, and
automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7.2 Potential of combining LLMs with agentic architectures . . . . 36
2.8 Security Standards and Risk Benchmarking . . . . . . . . . . . . . . 37
2.8.1 Secure-SDLC Frameworks and Standards . . . . . . . . . . . . 40
2.8.2 OWASP Top 10 and how static tools align with it . . . . . . . 43
2.8.3 Importance of standardized scoring in prioritization . . . . . . 45
2.9 Research Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.1 Identified gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.9.2 Justification for approach as proposed in this thesis . . . . . . 47
3 Methodology 49
3.1 Proposed Solution Architecture . . . . . . . . . . . . . . . . . . . . . 49
3.1.1 Overview of Functional Layers . . . . . . . . . . . . . . . . . . 49
3.2 Agent Architecture and Reasoning Logic . . . . . . . . . . . . . . . . 52
ii
3.2.1 Decision Making Flow . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Tool and Dataset Selection . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 Alignment with Industry Standards and Frameworks . . . . . 55
3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.1 False Positive Reduction Rate (FPRR) . . . . . . . . . . . . . 56
3.4.2 Triage Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4.3 Time Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Implementation 58
4.0.1 Technology Stack and Tooling . . . . . . . . . . . . . . . . . . 58
4.1 Input Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.1 Folder Structure and Scanning Setup . . . . . . . . . . . . . . 61
4.1.2 Secrets and Configuration Management . . . . . . . . . . . . . 63
4.1.3 Normalization and Data Preparation . . . . . . . . . . . . . . 63
4.1.4 Summary of Input Pipeline Workflow . . . . . . . . . . . . . . 64
4.2 Agentic AI Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Triage Orchestrator . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.2 LLM Communication Layer . . . . . . . . . . . . . . . . . . . 67
4.2.3 False Positive Detection Agent . . . . . . . . . . . . . . . . . . 68
4.2.4 Risk Scoring Agent . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.5 Risk Explainer Agent . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.6 Remediation Agent . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.7 Shared Memory and Feedback Loop . . . . . . . . . . . . . . . 71
4.3 Output Pipeline and Web Dashboard . . . . . . . . . . . . . . . . . . 72
4.3.1 Secure Result Transmission . . . . . . . . . . . . . . . . . . . 73
4.3.2 API Server and Database Integration . . . . . . . . . . . . . . 74
4.3.3 Dashboard Rendering and Visualization . . . . . . . . . . . . 74
4.3.4 System Integration and Security . . . . . . . . . . . . . . . . . 75
iii
5 Results 78
5.1 Testing Dataset and Criteria . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Overview of the Performance Evaluation Process . . . . . . . . . . . . 79
5.3 False Positive Reduction Results . . . . . . . . . . . . . . . . . . . . . 80
5.4 Risk Statistics and Classification Summary . . . . . . . . . . . . . . . 80
5.5 Evaluation of Risk Prioritization Accuracy . . . . . . . . . . . . . . . 82
6 Conclusion 83
6.1 Summary of Key Contributions . . . . . . . . . . . . . . . . . . . . . 83
6.2 Revisiting Research Questions . . . . . . . . . . . . . . . . . . . . . . 84
6.2.1 Research Question 1 (RQ1) . . . . . . . . . . . . . . . . . . . 84
6.2.2 Research Question 2 (RQ2) . . . . . . . . . . . . . . . . . . . 85
6.2.3 Research Question 3 (RQ3) . . . . . . . . . . . . . . . . . . . 85
6.2.4 Research Question 4 (RQ4) . . . . . . . . . . . . . . . . . . . 86
6.3 Limitations of the Study . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References 89
iv
List of Figures
2.1 Phases of Software Development Lifecycle (SDLC) . . . . . . . . . . . 9
2.2 Phases of Secure Software Development Lifecycle (SSDLC) . . . . . . 10
2.3 Evolution of AppSec in Modern Software Development . . . . . . . . 13
2.4 Components of SBOM . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Proposed System Architecture . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Agentic AI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Security Tools for Pipelines . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Github Actions Pipeline (triggered) . . . . . . . . . . . . . . . . . . . 61
4.2 Semgrep GHA Configuration . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 OWASP DC GHA Configuration . . . . . . . . . . . . . . . . . . . . 62
4.4 Gitleaks GHA Configuration . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Data flow Diagram of the system . . . . . . . . . . . . . . . . . . . . 65
4.6 Implmented Agentic AI Architecture . . . . . . . . . . . . . . . . . . 66
4.7 Output Pipeline Data Flow . . . . . . . . . . . . . . . . . . . . . . . 72
4.8 structure of result_.json file . . . . . . . . . . . . . . . . . . . . . . . 73
4.9 Output Redirect in GHA Summary . . . . . . . . . . . . . . . . . . . 74
4.10 Frontend WebApp: Noesiz Dashboard Page-1 . . . . . . . . . . . . . 75
4.11 Frontend WebApp: Noesiz Dashboard Page-2 . . . . . . . . . . . . . 76
4.12 Output Pipeline Configuration . . . . . . . . . . . . . . . . . . . . . . 77
v
List of Tables
2.1 Overview of STRIDE Threat Model . . . . . . . . . . . . . . . . . . . 15
2.2 Overview of DREAD Threat Model . . . . . . . . . . . . . . . . . . . 15
2.3 Open-source SAST Tools . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Open-source SCA Tools . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Secret Scanning Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Key Takeaways from Secure SDLC Standards and Frameworks . . . . 43
2.7 Overview of OWASP Top 10 (2021 Edition) . . . . . . . . . . . . . . 44
2.8 SAST and SCA supporting OWASP Categories . . . . . . . . . . . . 45
4.1 Infrastructure and Platform used for Implementation . . . . . . . . . 59
4.2 Programming Languages and Frameworks used in Architecture . . . . 59
4.3 Security Tools used in Architecture . . . . . . . . . . . . . . . . . . . 59
4.4 APIs and External Sources in use by the Agentic AI . . . . . . . . . . 60
4.5 Threshold for CVE-based Risk Scoring . . . . . . . . . . . . . . . . . 70
5.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.2 False-Positive Reduction by Tool . . . . . . . . . . . . . . . . . . . . 81
5.3 Tool-wise Final True Positives and Risk Distribution . . . . . . . . . 81
vi
List of acronyms
AI Artificial Intelligence
API Application Programming Interface
AppSec Application Security
CI/CD Continuous Integration and Continuous Deployment
CIS Center for Internet Security
CSIRT Computer Security Incident Response Team
CVE Common Vulnerabilities and Exposures
CVSS Common Vulnerability Scoring System
DAST Dynamic Application Security Testing
ENISA European Union Agency for Cybersecurity
EPSS Exploit Prediction Scoring System
EUVD European Union Vulnerability Database
FP False Positive
GHA GitHub Action
IAST Interactive Application Security Testing
vii
IDEs Integrated Development Environment
KPIs Key Performance Indicators
LLMs Large Language Models
NIST National Institute of Standards and Technology
NLP Natural Language Processing
NVD National Vulnerability Database
OASIS Organization for the Advancement of Structured Information Standards
OWASP Open Worldwide Application Security Project
PoC Proof-of-Concept
RAG Retrieval-Augmented Generation
SAMM Software Assurance Maturity Model
SARIF Static Analysis Results Interchange Format
SAST Static Application Security Testing
SBOM Software Bill of Materials
SCA Software Composition Analysis
SDLC Software Development Life Cycle
SMEs Small and Medium-sized Enterprises
SOC Security Operations Center
SSDLC Secure Software Development Lifecycle
TP True Positive
viii
1 Introduction
In this age of technology, software supports almost every facet of the modern life,
ranging from communication and finance to healthcare and critical national infras-
tructure. Around 5.56 billion people, which represents 67.9% of the total world’s
population, were internet users, as reported on February 2025 by Statista [1].
Moreover, almost 90% of organizations worldwide are actively undergoing some
kind of technological revamp, indicating that the digital transformation has become a
universal goal [2]. In addition to this, the European Union has also set two ambitious
goals for 2030, with the first one being, over 90% of SMEs should attain at least
a basic level of digital intensity. Secondly, 75% of businesses must adopt advanced
technologies like Cloud Computing, Big Data Analytics, or Artificial Intelligence
(AI) [3].
These trends backed up by the data, clearly pointing to a generational tran-
sition towards a world that is becoming more and more dependent on the digital
applications, often referred to as Software. This huge dependency on software also
concerns its security. Robust Application Security (AppSec) becomes essential as
societies become more reliant on complex software systems. It is also becoming a
crucial component of resilience and trust in this age of technology.
As the complexity of system increases, it creates more room for vulnerabilities,
and as digital threats evolve in complexity and volume, it is now essential for indi-
1.1 MOTIVATION 2
viduals, businesses, and governments to ensure secure software development.
1.1 Motivation
In the software development life cycle (SDLC), incorporating the security practices
in early phases, also commonly referred to as “Shifting Security Left” is considered
essential. According to National Institute of Standards and Technology (NIST),
identifying and fixing security issues early in the SDLC saves resources including
cost, efforts and time as compared to post-deployment fixes [4].
This has also given rise to the principle of “Secure-by-Design” and embedding
security practices from the very beginning of the SDLC [5]. This shift-left approach
has made the use of software composition analysis (SCA) and static application
security testing (SAST) technologies very important, as it helps identifying vulner-
abilities in the earlier phases, even before the deployment of software. However,
these tools are not fully perfect. Their tendency to generate a large number of false
positives is a frequent problem in AppSec domain [6]. False positives are kind of
alerts that are flagged as security issues but are actually irrelevant or not exploitable.
This overload not only consumes time, causes fatigue, but also could hide up the
genuinely critical vulnerabilities [7]. It is usually the responsibility of the security
and development teams to go through these results and do the filtering manually.
But the main problem with manual triage is that, it’s a time-consuming process,
inefficient, and prone to errors [8].
The rise of reasoning-based automation, autonomous AI agents, and large lan-
guage models (LLMs) presents a unique opportunity to redesign this triage process.
Autonomous systems that can understand, filter, and reason about security results
can be called as Agentic AI, and it can help significantly reducing the workload for
human operators. These algorithms can also help prioritize genuine risks, filter out
1.2 PROBLEM STATEMENT 3
irrelevant noise, and deliver results in a more unified and understandable way.
This thesis is motivated by the need to enhance the way, how security issues are
triaged and managed. The aim is to greatly lower the false positives, save developer
time, and improve overall visibility into software security threats by utilizing Agentic
AI leveraging LLMs.
1.2 Problem statement
Despite multiple significant advancements in static security analysis and vulnera-
bility detection tools, software development as well as security engineering teams
continue to deal with three interrelated and persistent challenges that undermine
effective security implementation
First is about high volume of false positives. SAST tools often generate an ex-
cessive amount of alerts, and shockingly most of which are false positives [9]. These
false positives force developers and security engineers to spend excessive time man-
ually verifying issues, which leads to alert fatigue, wasted effort, and an increased
likelihood of overlooking real threats, most often when dealing with the tight dead-
lines [10].
Secondly, time-intensive and manual risk triage. The overall process of assessing
and separating false positives from genuine risks generated as a result from security
tools, takes a lot of time [9]. Since manual triage mostly relies on individual ex-
pertise and reasoning, it not only slows down the development cycle but also adds
inconsistencies across the workflows [10].
Third is the lack of unified visibility across tools. Multiple SAST, SCA, and
CI/CD-based security scanning solutions are used by many development teams
across various organizations working under modern development life-cycle (i.e uses
1.4 RESEARCH OBJECTIVE 4
CI/CD). However, these tools produce different results that have various priorities,
formats, and categorization. This leads to lack of visibility across the whole security
landscape and as a result, it is challenging to pinpoint the most important problems
or efficiently track remediation progress [11].
1.3 Research question
To address all the challenges outlined in problem statement, this thesis aims to find
answers to the following research questions:
1. Research Question 1 (RQ1): How can reasoning agents and LLMs be used to
differentiate between true and false positives in static analysis results?
2. Research Question 2 (RQ2): How can results from multiple security tools be
combined and standardized to present a unified and consistent view of risks?
3. Research Question 3 (RQ3): How can a web-based interface be designed to
efficiently support developer workflows, offering clear visualization of triaged
risks and actionable remediation insights?
4. Research Question 4 (RQ4): To what extent can such a system reduce the
overall triage time, minimize the manual effort, and improve the developer
trust and engagement with security tools?
1.4 Research objective
To tackle the urgent issues presented by the noisy and unreliable results from static
security analysis tools, this thesis introduces an Agentic AI framework that combines
LLMs, multi-step reasoning, and self-directed decision-making. The study defines
the following four core objectives:
1.5 SCOPE OF THE WORK 5
• To investigate at how LLMs and reasoning agents may be combined to analyse
security scan results and reliably differentiate between false positives and real
positives in static security tools.
• To build a unified triage framework that uses an agentic reasoning pipeline
and shared memory model to aggregate, normalise, and present outputs from
various static analysis tools in a consistent and a more organised manner.
• To design and develop a safe, production-ready web interface that provides
remediation guidance, severity breakdowns, and the contextual risk insights,
efficiently supporting developer workflows and improving the overall usability
and clarity of the triaged results.
• To evaluate the system in terms of its ability to reduce the false positives, de-
crease the manual risk triage time, and enhance the developer trust in security
tools by conducting empirical testing on real-world vulnerable codebases and
comparing the results with manual analysis.
1.5 Scope of the work
The scope of this research is focused on providing a practical and research-driven
contribution to the field of AppSec by demonstrating how agentic AI can trans-
form the risk triage process, particularly in the static security analysis. The core
contributions of this work are as follows:
1. Designing of Agentic AI based solution for Security Triage: This research
proposes and implements an AI agent that automatically filters and prioritizes
security issues identified by static analysis tools using large language models
(LLMs) and multi-step reasoning.
1.6 STRUCTURE OF THE THESIS 6
2. Proof-of-Concept (PoC) System for False Positive Reduction: To demonstrate
the AI system’s capability to reduce false positives in real or simulated SAST/SCA
tool outputs, a completely working prototype is created, allowing developers
to concentrate more on true risks.
3. Interactive Dashboard for Enhanced Threat Visibility: The thesis delivers a
web-based application that provides actionable insights and remediation meth-
ods, evaluates true vs. false positives, and visualizes triaged outcomes, and all
this in one centralized platform.
4. Unified Integration of Multi-Tool Outputs: An approach has been built to
handle the common problem of fragmented visibility in multi-tool situations
by processing and normalizing outputs from several static analysis tools.
5. Empirical Evaluation: Key performance indicators (KPIs) like the false posi-
tive reduction rate and time savings are used to assess the effectiveness of the
system.
1.6 Structure of the thesis
This thesis is organized into six chapters, each providing a comprehensive under-
standing of the research, design, implementation, and evaluation of an agentic AI
system for the autonomous risk triage in static security analysis.
Chapter 1: Introduction - provides an overview of the research context, the mo-
tivation behind it, then identifies the core problems, outlines the research questions
and objectives. The scope and contributions of the thesis are also presented.
Chapter 2: Literature Review - highlights the foundational concepts of secure
software development, static analysis, security testing and application security. It
1.6 STRUCTURE OF THE THESIS 7
also addresses the issue of false positives, thoroughly examines the role of AI and
LLMs in cybersecurity, and identifies existing gaps in current solutions.
Chapter 3: Research Methodology - presents the proposed agentic AI solution,
including system and agent architecture, data sources, reasoning logic, and the eval-
uation framework. The selected security tools, datasets, and alignment with estab-
lished security standards are also described.
Chapter 4: System Implementation - describes the practical implementation of
the prototype system. This covers the ingestion, processing, and classification of
security findings and the display of filtered outputs through a web-based dashboard.
Chapter 5: Results - presents the evaluation of the system’s performance. It
includes false positive reduction statistics, classification accuracy, and a comparison
between manual and agentic triage.
Chapter 6: Conclusion - summarizes key findings, addresses the research ques-
tions, discusses limitations of the current approach, and explores potential directions
for future enhancements.
Statement on the usage of AI: This thesis is the original work of the author.
All content has been written by the author, using information from various scientific
and academic sources that have been properly cited. The thesis includes a working
prototype, and during its development, the author encountered some programming-
related errors. AI tools like ChatGPT and GitHub Copilot were utilised to resolve the
issues. Apart from this, the entire process, from conceptualisation to the development
and deployment of the prototype, was done by the author alone.
2 Literature review
The fundamental concepts of secure software development, static analysis, security
testing, and AppSec are highlighted in this chapter. Along with addressing the
problem of false positives, it also carefully investigates the function of AI and LLMs
in cybersecurity.
2.1 Secure Software Development Lifecycle (SS-
DLC)
Software Development Lifecycle (SDLC) is a structured process that is used by
development teams to create the software systems from the very scratch till its
completion. Some examples of traditional SDLC models such as Waterfall, Agile,
and DevOps, place a strong emphasis on phases like requirements gathering, design,
development (which includes coding and building), then deployment and testing,
and maintenance at the final phase. Figure 2.1 shows the different phases of SDLC
and brief information about each phase. These traditional SDLCs are effective
at managing the delivery of functional software, but they often lacks in security
consideration, making security a post-development concern or an afterthought [12].
In order to mitigate the security risks, the Secure Software Development Life-
cycle (SSDLC) was introduced. It embeds the security elements in each phase of
SDLC, which makes SSDLC a proactive approach rather than a reactive one unlike
2.1 SECURE SOFTWARE DEVELOPMENT LIFECYCLE (SSDLC) 9
Figure 2.1: Phases of Software Development Lifecycle (SDLC)
SDLC. The Open Web Applications Security Project (OWASP) states that as part
of the standard workflow, an SSDLC includes some common practices such as code
scanning, threat modeling, secure design review, and security testing [13].
Due to the growing complexity of software systems and the ever-evolving threat
landscape, modern software engineering has gone through a paradigm shift, as can
be verified by the switch from SDLC to SSDLC. The objective is not only to create
functional software, but to make sure that the software is secure by design.
2.1.1 Security involved in the phases of SDLC
In an SSDLC, security is not only a one-time checkpoint but rather a continuous
process that is integrated into each phase of SDLC. As seen in Figure 2.2, every stage
of the SDLC has a distinctive set of security procedures and objectives embedded
into it.
The security components associated with each phase are discussed below:
• Requirements Phase: All the security requirements should be gathered at
2.1 SECURE SOFTWARE DEVELOPMENT LIFECYCLE (SSDLC) 10
Figure 2.2: Phases of Secure Software Development Lifecycle (SSDLC)
this stage which includes requirements based on business, legal, and com-
pliance needs. Well-defined functional and non-functional requirements, in-
cluding data protection, access control, and authentication can also be added
depending on whether new feature is being added or change in existing feature.
A number of maturity models are available, including the Security Assurance
Maturity Model (SAMM), Building Security in Maturity Model (BSIMM),
and Capability Maturity Model Integration (CMMI) to support structured
approaches in order to capture and manage the security needs. It also helps
teams to improve security during the requirement phase [14].
• Design Phase: Here, threat modeling is applied to identify potential secu-
rity flaws in architecture. Threat models like STRIDE (widely used), DREAD
(useful for risk scoring) and PASTA (useful in complex environment) and tech-
niques like DFD-based analysis help evaluate risk exposure and design appro-
priate countermeasures before coding begins [12], [15].
• Development and Building Phase: Secure Coding Guidelines can be fol-
2.1 SECURE SOFTWARE DEVELOPMENT LIFECYCLE (SSDLC) 11
lowed at this stage to avoid common software defects like buffer overflows,
injection attacks, and insecure APIs. Additionally, in order to ensure that
security coding standards are followed, Static Testing tools (SAST, SCAs)
should be used [12], [15].
• Testing and Deployment Phase: In addition to functionality testing, se-
curity testing methods such as DAST and Fuzz testing ensures the software is
resilient against known attack vectors, as it scans for vulnerabilities in appli-
cation runtime. Moreover, Center for Internet Security (CIS) benchmarks can
be followed to ensure that the deployment or production environment stays
safe from threats which includes configuration scanning, verification of security
controls, and policy level security rules like access management [12], [15].
• Maintenance Phase: Security does not end at deployment. Software secu-
rity need to be maintained even after production. So to handle vulnerabilities
found after a release, regular security patching, monitoring, and incident re-
sponse procedures must be at the place [12], [15].
SSDLC approach of integrating security practices at every stage of software de-
velopment comes with several advantages. One of the main advantage is the cost re-
duction in patching vulnerabilities, and increase in risk visibility throughout SDLC.
It is 100 times cheaper to fix the security issue if it is found much earlier in SDLC,
which supports the idea of a fully integrated SSDLC [16].
2.1.2 Importance of early-stage security (Shift-Left approach)
The “Shift-Left” security concept on incorporating security procedures into software
development as early as possible, i. e. shifting security left which means focusing on
security before working on the each stage of SDLC. In the past, security assessments
were mostly conducted after deployment or during testing, which resulted in high
2.1 SECURE SOFTWARE DEVELOPMENT LIFECYCLE (SSDLC) 12
remediation costs and also the delayed releases [12]. The Shift-Left methodology
promotes integrating security into the planning, design, and development stages in
order to identify vulnerabilities early on, which simplifies the fixing process, takes
less time, and is less expensive.
According to an IBM System Science Institute report, the cost of fixing a vulner-
ability in the design stage is around one-sixth of what it would be in post-production
[16]. This shows the operational and the financial benefits of early security adop-
tion. Shift-Left approach may help security and development teams work together
more efficiently, as it automate secure coding techniques, and reduces the need for
reactive security reviews. Developers are able to receive real-time security feedback
using tools like pre-commit hooks, static analyzers, and scanners integrated into
their Integrated Development Environments (IDEs) or in CI/CD pipelines, trans-
forming security into a continuous and proactive discipline [17].
2.1.3 Role of AppSec in modern development environments
AppSec is now a core pillar of modern software engineering, and has evolved from
being a specialized post-development task. In recent DevSecOps and Agile contexts,
where continuous deployment and rapid iterations are standard procedures, AppSec
acts as a facilitator rather than a bottleneck.
Initially, security was mostly network-focused, depending on firewalls and other
perimeter defenses and trusting that apps were secure inside internal networks. A
reactive security posture focused on patching misconfigurations and security issues
after deployment resulted from the early 2000s rise of web applications, which un-
covered application-layer vulnerabilities. Organizations started to realize how inef-
fective this strategy was by the early 2010s, which led to a wide shift to proactive
security integration in the development process. As part of an overall effort to incor-
2.1 SECURE SOFTWARE DEVELOPMENT LIFECYCLE (SSDLC) 13
porate security from the very start, secure SDLC processes such as threat modeling,
code reviews, and static analysis were developed. In mid-2010s, with the emer-
gence of the DevSecOps powered by DevOps and Agile, AppSec has adapted to
continuous delivery by integrating automated security tools into CI/CD pipelines
and encouraging shared accountability fostering collaboration between development
and operations teams [12]. Figure 2.3 examines the phases of evolution as well as
limitations and focus areas for each stages.
Figure 2.3: Evolution of AppSec in Modern Software Development
Development teams, security engineers, and operations teams work closely to-
gether in modern AppSec practices to deliver the efficient, robust and secure appli-
cations. This involves automating policy enforcement, enabling developer-first tools
and integrating security scanners into CI/CD pipelines which helps in the early de-
tection and fixing of vulnerabilities. The overall goal of this is to create secure-by-
default systems without compromising the development speed. As a result, AppSec
is now a proactive, collaborative, and continuous process that supports the creation
2.2 RISK MANAGEMENT IN SOFTWARE DEVELOPMENT 14
of secure and scalable software rather than a reactive gatekeeper function.
2.2 Risk Management in Software Development
In software development, risk management is a key process which helps in identify-
ing, assessing, and eliminating possible issues that may impact the success, security,
or functionality of a software system. Threat modeling is an important component
in this process, which involves systematically identifying potential security threats,
vulnerabilities, and attack vectors within the system. After threats have been identi-
fied, risk assessment is carried out to determine each threat’s impact and likelihood,
enabling teams to understand their severity. Based to this analysis, prioritisation
helps in efficient resource allocation by addressing the most important threats first.
Together, these practices ensure that potential problems are managed proactively,
reducing the likelihood of project failure or security breaches.
2.2.1 Introduction to threat modeling
Threat modeling is one of the fundamental method in secure software design and
a recommended security practice related to the SDLC design phase. Commonly
used threat models such as STRIDE, DREAD, PASTA, are used to systematically
identify, list, and prioritize potential risks that could bring damage to the system if
unresolved.
Microsoft created the widely used threat model known as STRIDE, which eval-
uates a system’s complex architecture based on the six threat categories: Spoofing,
Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation
of Privilege [18]. Table 2.1 lists these six primary threats, their definitions, and se-
curity control and mitigation strategies [19]. A system’s architecture and data flow
is generally used to qualitatively evaluate possible risks using this threat model.
2.2 RISK MANAGEMENT IN SOFTWARE DEVELOPMENT 15
DREAD, which stands for Damage Potential, Reproducibility, Exploitability,
Affected Users, and Discoverability which can be seen as five questions about each
potential threat. Table 2.2 describes each of this category briefly. It is a methodology
which ranks threats, by assigning severity and priority level to the identified threats.
DREAD is a quantitative approach as it uses numerical scores to assess and contrast
risks [20].
Threat Definition Security Control Mitigation Strat-
egy
Spoofing Falsely posing as
someone or some-
thing other than
myself.
Authentication Enforce strong au-
thentication and
avoid storing or ex-
posing secrets.
Tampering Making changes to
the disk, network,
memory, or else-
where.
Integrity Use integrity checks
and secure authoriza-
tion to prevent unau-
thorized changes.
Repudiation Not taking account-
ability for an action:
may or may not be
true.
Non-Repudiation Enable accountability
with digital signa-
tures, timestamps,
and audit logs.
Information Disclo-
sure
Giving information to
an unapproved entity.
Confidentiality Encrypt data, limit
access via authoriza-
tion, and avoid un-
necessary secret stor-
age.
Denial of Service Exhausting the re-
sources required to
provide services.
Availability Mitigate with authen-
tication controls, re-
quest filtering, and
QoS mechanisms.
Elevation of Privilege Permitting someone
to perform unautho-
rized tasks.
Authorization Enforce least privi-
lege and restrict ac-
cess based on mini-
mal necessary permis-
sions.
Table 2.1: Overview of STRIDE Threat Model
Category Description
Damage Potential To what extent might this threat harm the system?
Reproducibility Is this threat easy to replicate?
Exploitability How much experience and effort is needed to exploit the threat?
Affected Users How many users could be affected if the threat is realized?
Discoverability How simple is it to identify the vulnerability?
Table 2.2: Overview of DREAD Threat Model
2.2 RISK MANAGEMENT IN SOFTWARE DEVELOPMENT 16
Process for Attack Simulation and Threat Analysis, or PASTA for short, is a
risk-centric threat model. On a more strategic level, it involves the threat modeling
process, which includes threat analysis and also the mitigation strategies. As it inte-
grates attacker modeling with business risk, it is both a qualitative and quantitative
threat model at the same time [21].
Teams can use these models to identify the attack surfaces, assesses the likeli-
hoods, and develop mitigation mechanisms early in the development lifecycle. The
overall objective is to understand what might go wrong and then proactively design-
ing countermeasures mechanism before the software is developed.
2.2.2 Risk Triage: Risk Assessment and Risk Prioritization
Risk Triage is the practice of categorizing and prioritizing security threats according
to their potential impact and possible likelihood to exploitation [22]. This involves
deciding which vulnerabilities in software security need attention so that it can be
fixed right away by evaluating the severity of vulnerabilities reported by tools or
manual reviews.
Effective risk triage consists of two primary activities:
• Risk Assessment: Identifying the kind and scope of a vulnerability like
the data that could be compromised, the asset it impacts, and whether the
vulnerability is accessible from outside the system or not [23].
• Risk Prioritization: Determining a risk score or severity level to plan for
remediation activities. Common Vulnerability Scoring System (CVSS) and Ex-
ploit Prediction Scoring System (EPSS) are some of the common frameworks
that offer standardized scoring procedures based on impact and exploitability
[24].
2.3 STATIC SECURITY ANALYSIS TECHNIQUES 17
Risk triage guarantees that security teams concentrate on the most serious con-
cerns first rather than treating every issue with the same urgency, which can be
relate very similar to medical triage in an emergency department, when patients are
treated depending on urgency [25].
Triage helps prevent resource dilution by ensuring that teams concentrate on
vulnerabilities with the highest business and technical risks, even in the presence
of thousands of security findings [26]. Improper triage raises the possibility that
critical issues might go unnoticed, and also slows down the remediation process.
2.2.3 How inaccurate prioritization leads to wasted resources
One of the most expensive inefficiencies in modern AppSec is inefficient risk prioriti-
zation [27]. Teams that treat all vulnerabilities with equal importance, often spend
a lot of time fixing low-risk problems while leaving major ones unfixed.
A 2025 Veracode State of Software Security report states that 49.9% of organi-
zations have high-severity issues that have not been addressed, and 74.2% of organi-
zations have security flaws that have been unremeditated for more than a year [28].
The reason behind this often redirects to the triage bottlenecks and alert fatigue.
This imbalance creates a false sense of security and shows inadequate risk coverage.
Too many false positives can cause developers to ignore scanner results, reducing
trust in security tools and potentially causing more harm than the vulnerabilities
themselves.
2.3 Static Security Analysis Techniques
The method of analyzing source code against issues without running the program is
known as static code analysis. The goal is to identify potential vulnerabilities, issues
2.3 STATIC SECURITY ANALYSIS TECHNIQUES 18
with the quality of the code, or even security policy violations before the program
is executed and not seen by the compiler [29].
2.3.1 Fundamentals of static code analysis (SAST and SCA)
Two dominant approaches in static security analysis are:
1. SAST focuses on examining the actual codebase to find security issues includ-
ing buffer overflows, injection flaws, and insecure usage of APIs [30].
2. SCA focuses on finding known vulnerabilities in open-source dependencies,
libraries, and third-party components that are used in a project [30].
These tools offer several benefits when combined with CI/CD pipelines, and
most importantly can offer i) a good scalability; ii) the detection of known and
documented vulnerabilities; and iii) reporting which illustrates the location of the
problem for an easier fix, thus cutting down on time-to-fix [30].
“Shift-Left” security strategies strongly depends on SAST and SCA as it enable
the identification and fixing of vulnerabilities during development. According to
The Github Octoverse 2024-The State of Open Source Software report, it was found
out that over 39 million secret leaks were identified by developers on GitHub using
a static analysis technique called secret scanning [31]. Additionally, according to
Synopsys 2024 Open Source Security and Risk Analysis Report, 96% of the apps
that were examined have open-source libraries [32]. These numbers highlight the
growing importance of static security analysis.
2.3.2 Importance of SBOM in open-source security
Software Bill of Material (SBOM) is an organized inventory of all the components
used in an application, that involves third-party libraries and dependencies. SBOMs
2.3 STATIC SECURITY ANALYSIS TECHNIQUES 19
are a fundamental element of visibility, traceability, and vulnerability management in
modern software development, where the reuse of open-source resources are common
[33]. Components of SBOM can be seen in Figure 2.4.
Figure 2.4: Components of SBOM
As supply chain attacks have become more prevalent, SBOMs and related secu-
rity has become much more important. Maintaining a well-defined SBOM enables
organizations to promptly determine if they are impacted by newly disclosed vul-
nerabilities such as Log4Shell and take appropriate countermeasure [33].
In order to promote proactive, scalable open-source risk management, SBOMs
with SCA tools, function together to automatically discover known vulnerabilities,
license compliance issues, and outdated packages.
2.3.3 Benefits of Static Security Analysis over Dynamic Test-
ing
Compared to dynamic testing techniques, static security analysis has several ben-
efits, particularly in the early phases of development. SAST examines the source
2.3 STATIC SECURITY ANALYSIS TECHNIQUES 20
code, bytecode, or binaries without even running the program, unlike Dynamic
Application Security Testing (DAST), which examines applications while program
execution. This makes it possible to find issues long before the application is released
[30].
By detecting vulnerabilities early in the development process, SAST technologies
allow developers to address problems before they’re deeply rooted in the application.
Since the codebase is directly analyzed, all execution pathways, including those
that are difficult to access dynamically (like vulnerabilities in libraries), may be
investigated. Faster feedback loop is one of the main advantage. As these static
analysis tools when Integrated into CI/CD pipelines or IDEs, provides real-time
feedback, making them well-suited for modern Agile and DevOps environments [30].
2.3.4 Limitations of static approaches compared to dynamic
methods
Static analysis methods have significant drawbacks that limit their usefulness when
used alone, notwithstanding their benefits. Their incapacity to monitor runtime
behavior is one of the main issues. This means that issues that only show up during
and after execution, such as session management faults, authentication bypasses,
and logic errors, could go undetected.
The high false positive rate in static analysis is again a serious problem, as
vulnerabilities that are detected might not be exploitable in practical situations [34].
This often leads to developer fatigue and wasted remediation efforts. Furthermore,
code that significantly depends on runtime configuration, reflection, or dynamic
language features is usually difficult for static tools to analyze. This can make
control and data flows difficult to observe during static testing [30].
Static analysis is enhanced by dynamic testing, which simulates real-world at-
2.4 SECURITY SCANNING TOOLS AND THEIR CHALLENGES 21
tacks on running applications about actual exploitability. Therefore, while static
analysis is a beneficial in early-stage detection, but its limitations highlight the
importance of using it as part of a hybrid or layered security testing strategy.
2.4 Security Scanning Tools and Their Challenges
There are variety of static security testing tools available in the market, that can
be employed in to the modern software development workflows. As a result, vul-
nerabilities can be found at much earlier phases of the SDLC. These tools can be
broadly classified into SAST, SCA and Secret detection tools.
2.4.1 Overview of common and open-source tools
Different static analysis open-source tools is further categorised based on some clas-
sification factors such as programming languages support, integration with CI/CD
pipelines, OWASP top 10 mapping support, compatibility with SARIF and ease of
setup [30]. Table 2.3 describes various open-source SAST tools and compares them
using the key classification factors. SCA tools scans for the vulnerabilities in the
third-party libraries, and comparison of such tools are presented in Table 2.4. Var-
ious Secret Detection/Scanning tools are good at identifying accidental exposure of
sensitive data like hardcoded API Keys or crendentials. Such tools are discussed in
table 2.5.
SAST
Tools
Supported
Languages
CI/CD
Friendly
OWASP
Top 10
SARIF
Compatiability
Setup
Simplicity
Semgrep CE Multi-Language Yes Yes Yes Easy
SonarQube CE Multi-Language Yes Yes No Moderate
Bandit Python Yes Yes Yes Easy
ESLint JavaScript Yes Yes Yes Easy
Table 2.3: Open-source SAST Tools
2.4 SECURITY SCANNING TOOLS AND THEIR CHALLENGES 22
SCA
Tools
Supported
Languages
CI/CD
Friendly
OWASP
Top 10
SARIF
Compatiability
Setup
Simplicity
OWASP DC Multi-Language Yes Yes Yes Easy
Trivy CE Multi-Language Yes Yes Yes Easy
Syft+Grype Multi-Language Yes Yes Yes Moderate
Table 2.4: Open-source SCA Tools
Scanning
Tools
Supported
Languages
CI/CD
Friendly
OWASP
Top 10
SARIF
Compatiability
Setup
Simplicity
GitLeaks
Language
Agnostic
Yes Yes Yes Easy
Detect-Secrets
Language
Agnostic
Yes Yes No Moderate
TruffleHog
Language
Agnostic
Yes Yes No Moderate
SecretLint
Language
Agnostic
Yes Yes No Easy
Table 2.5: Secret Scanning Tools
2.4.2 Use cases: SAST, SCA, and security scanner tools
In the software development lifecycle, security scanning tools have different but
related functions. There are various type of tools available and each of the kind is
designed to identify particular types of vulnerabilities.
• SAST Tools examine the source code of the program to find code-level prob-
lems such as improper data handling, hardcoded credentials, or SQL injection.
They are useful for secure code enforcement and early vulnerability identifi-
cation. During the coding stage, developers frequently utilize SAST, which is
integrated into IDEs or pre-commit hooks [17].
2.4 SECURITY SCANNING TOOLS AND THEIR CHALLENGES 23
• SCA Tools focus more on third-party risks. Open-source packages are widely
used, and SCA tools detect license issues, known vulnerabilities (CVEs), and
outdated dependencies. They help ensure that reused components do not
introduce systemic risk [30].
• Secrets Detection Tools identify sensitive information that might have been
accidentally committed to version control, such as hardcoded tokens, API
keys, and passwords. These tools are vital in preventing the data exposure or
unauthorized access due to developer oversight [30].
Different attack surfaces related to static analysis are addressed by of these categories
of tools. Together, these tools enable secure development techniques throughout
the SDLC and allow a layered defense strategy that covers the security of code,
components, and credentials. Not a single tool can identify each and every defect
or security issue [35]. Instead, a combination of various methods can help cover the
static vulnerability landscape [35].
2.4.3 Problem of fragmented outputs and tool interoper-
ability
Security teams frequently combine SAST, SCA, secrets detection, and license com-
pliance techniques in enterprise settings. These tools produce results in various
formats, severities, and taxonomies. This complicates remediation operations and
results in fragmented visibility over the security posture.
To address interoperability challenges in static analysis tools, the Static Analysis
Results Interchange Format (SARIF) was introduced as a standardized, JSON-based
schema for representing static analysis tool outputs. SARIF aims to unify reporting
across tools by defining a consistent structure for rules, locations, severity levels,
2.5 FALSE POSITIVES IN STATIC ANALYSIS 24
and remediation guidance. It is currently supported by various tools and recognized
as an OASIS1 standard [36].
Although not all tools provide native support, SARIF compatibility varies across
vendors. It is often necessary to use custom parsers or normalization layers when
integrating results from legacy tools or combining different types of scanners (such
as SCA with SAST) even with SARIF. This lack of uniformity causes silos between
the development and security teams and slows down triage.
2.5 False Positives in Static Analysis
The main causes of false positives in static analysis are the static modeling, over-
generalized detection rules, and lack of program context. These problems are par-
ticularly common in applications that generally use dynamic constructs or large,
and modular codebases. According to the Orca 2022 Cloud Security Alert Fatigue
Report, 43% of the surveyed organizations across various sectors, using cloud en-
vironment for their operations have 40% false-positive rate in general. False alerts
are far more than just a nuisance, as they actually have a significant impact on the
overall economy of an AppSec program [37].
In SCA, the main cause of false positives is the dedicated vulnerability reporting
based on dependency graphs. These tools may flag a vulnerability for a library even
when the vulnerable functionality is not called by the application or if it is addressed
by some other methods, such as secure configurations or runtime controls [30].
Some of the additional Key Causes include:
• Lack of Runtime Context: Static tools are unable to evaluate conditions
1OASIS stands for the Organization for the Advancement of Structured Information Standards,
and is a non-profit multinational collaboration that develops and promotes open standards for the
global information society.
2.5 FALSE POSITIVES IN STATIC ANALYSIS 25
that are resolved at runtime, such as dynamic routing and user-specific logic,
as they examine code without executing the application [30].
• Incomplete Data Flow Modeling: Many SAST tools have limited under-
standing of the complex data and control flows, particularly across microser-
vices environments, which often leads to assumptions that may not reflect
actual behavior [38].
• Conservative Heuristics: Rule sets are often designed to identify as many
potential vulnerabilities as possible (broadcast range), even at a cost of preci-
sion and accuracy, which leads to overreporting [39].
• Third-Party Dependencies Misclassification: Particularly in cases where
dependency trees are deeply nested or SBOMs are too inclusive, SCA tools may
flag vulnerabilities in unused or unreachable code paths of dependencies [30].
• Code Patterns That Are Secure by Design: Some secure coding tech-
niques can syntactically mirror insecure patterns. For example, sanitized input
passed through an abstraction layer may still be marked as a potential injec-
tion point, which leads in generating false alert.
• Furthermore, Muthukrishnan et al. [40] further point out that cross-tool in-
telligence sharing is often prevented by the compartmentalized nature of De-
vSecOps tools, each of which uses its own AI/ML model. This increases the
possibility of false positives since SAST and SCA tools generate isolated pre-
dictions without understanding the larger CI/CD or runtime environment [40].
All these causes highlight the necessity for contextual reasoning and enhanced in-
telligence when filtering the results of static analysis, an area where agentic AI can
provide significant improvements.
2.5 FALSE POSITIVES IN STATIC ANALYSIS 26
2.5.1 Impact on developer workflows and security posture
When false positives becomes common and continue to occur in security scans,
then it leads to alert fatigue among developers, causing frustration and erosion of
trust in security tooling. Developers often waste their significant time triaging these
non-issues, which disrupts the sprint velocity and delays the legitimate remediation
efforts [40].
Notably, a high false positive rate was regarded as the top frustration by 42% of
the 900 security practitioners surveyed as per Tines 2023 Voice of the SOC report
[41]. Excessive noise from scanners can create delays in releases or can cause teams
to bypass the security checks in continuous delivery environments where speed and
reliability are critical [40]. This reduces the efficacy of AppSec programs, which
degrades the security posture at the organizational level. Additionally, it raises the
possibility of vulnerability debt, which is nothing but the accumulation of unresolved
issues over time that complicates long-term risk [37].
Additionally, this false alerts tolerance has wider security implications. This
tolerance increases the possibility of developers missing real vulnerabilities when
they ignore or disable specific scans because of getting false alerts too often [30].
Additionally, the lack of a unified security intelligence view affects the organization’s
ability to perform timely patching, maintain compliance, and also perform effective
root cause analysis [40].
Production insights are rarely funneled back into earlier development stages fur-
ther weakens the DevSecOps feedback loop [38]. This leads to the same scanning
inefficiencies throughout several release cycles, as lessons learnt from the operational
problems are not used to improve the vulnerability detection logic [40].
2.5 FALSE POSITIVES IN STATIC ANALYSIS 27
2.5.2 Existing approaches to reduce false positives
Over the years, both academic research and industry practices have proposed several
approaches to reduce the false positives in static analysis. These can be broadly
discussed as follows:
• Most tools allow for suppressing specific rules or adjusting sensitivity thresh-
olds to suppress the generation of specific type of output. By calibrating rules
to the context of the application, many irrelevant alerts can be filtered out
easily. However, this approach requires significant manual effort and expertise
to draft and execute such rules. [42]
• Combining static analysis and LLMs is another option to reduce the false
positives. The integration of LLMs with static analyzers enables more context-
aware and accurate code review generation. Methods such as Retrieval-Augmented
Generation (RAG), Data-Augmented Training, and result concatenation have
shown potential in bringing LLM outputs into line with analyzer feedback, en-
hancing the precision and interpretability of findings. By directly integrating
the static analyzer results into LLM inference contexts (RAG, in particular)
it helps in effectively reducing false positives [43].
• Neuro-symbolic approaches like Model Generated Code Queries (MoCQ) lever-
age the strengths of both LLMs and symbolic reasoning that helps reducing
the false positives. These systems use LLMs to automate the extraction of vul-
nerability queries, and use feedback from static analysis to iteratively refine
them. By establishing predictions based on symbolic validation, these mod-
els significantly reduce the errors that are common in purely learning-based
systems [44].
• Empirical analyses of the output of SAST tools shows that some rule patterns
and warning types tend to be more prone to errors. Targeting these high-
2.6 AI AND LLMS IN APPSEC 28
noise categories with stricter heuristics or improved semantic analysis has been
shown to reduce the overall false positives. Inconsistent findings can also be
filtered out by combining different tools and cross-validating the results using
standardized formats like SARIF [45].
• Using SAST in combination with dynamic validation methods or Interactive
Application Security Testing (IAST) is another useful approach to solve the
problem of false positives. Through correlation between static results and
runtime information, tools can verify if the flagged code is truly vulnerable in a
live environment or not. False positives that arise from static-only assumptions
are reduced with the use of this kind of hybrid approach [30].
• A comprehensive approach to reduce the false positives is provided by uni-
fied AI/ML frameworks trained on organisation-specific DevSecOps data. To
enable context-aware alert filtering, these models take into account historical
SAST results, code history, test coverage, and operational incidents. Such uni-
fied systems improve root cause analysis and alert relevance throughout the
CI/CD pipeline in addition to lowering false positives [40].
In summary, current approaches to reduce false positives in the static security
analysis ranges from LLM-powered augmentation, neuro-symbolic refinement, dy-
namic validation, developer-centric suppression handling, to AI/ML-based contex-
tual modeling. These strategies indicate a trend towards the intelligent, integrated,
and context-aware static analysis frameworks that can better aligned with developer
workflows and support real-world application behavior.
2.6 AI and LLMs in AppSec
In cybersecurity, AI has become a transformative force that has completely changed
how organizations detect, prevent, and respond to cyberthreats. AI has played an
2.6 AI AND LLMS IN APPSEC 29
important role in protecting the digital infrastructure as a result of its ability to
manage massive amounts of data. It helps in recognizing pattern, and adaptively
learn from emerging threats as cyberattacks become more prevalent and complex.
2.6.1 Applications of AI in cybersecurity
The integration of AI into AppSec has initiated a paradigm shift from conventional,
reactive testing approaches to more proactive, intelligent, and context-aware secu-
rity mechanisms. By utilizing Machine Learning (ML), Natural Language Processing
(NLP), and predictive analytics, AI enables systems to automatically identify, assess,
and address vulnerabilities in software applications, with very less human interven-
tion. These technologies serves as the foundation for Adaptive Application Security
Testing (AAST), a recent and cutting-edge approach that dynamically adjusts the
testing methods in real-time based on user interactions, application behavior, and
emerging threats [46].
Following are the use cases of AI in AppSec domain:
• AI-Driven Vulnerability Identification: Machine learning models are now
central to the modern vulnerability detection strategies. These models are
trained using both the supervised and the unsupervised learning techniques
to identify security flaws by analyzing patterns in code, system behavior, and
historical vulnerability data. Supervised models use labeled datasets to iden-
tify code segments as secure or insecure, while unsupervised models detect
anomalies that might point to vulnerabilities that haven’t been identified yet.
This predictive capability allows for the identification of zero-day vulnerabili-
ties and logic flaws that static rules-based systems may overlook at some point
[46].
• NLP for Security Insights: NLP is essential for improving the overall
2.6 AI AND LLMS IN APPSEC 30
contextual understanding of system logs and code. NLP tools help identify the
small indications of security flaws or configuration errors by scanning developer
comments, issue tickets, and system error logs. This makes it possible for
security systems to generate the more personalised vulnerability reports that
improves the precision, prioritization, and recommend remedial actions better
aligned with developers’ workflows [47].
• Risk Assessment and Predictive Scoring: AI systems often goes beyond
simple detection, and helps in assessing the potential impact and exploitability
of the identified vulnerabilities. A wide range of inputs, such as system ar-
chitecture, user access levels, operational context, and historical exploit data,
are taken into account by predictive risk scoring mechanisms in order to pri-
oritize the ranking of vulnerabilities relative to the likelihood of an actual
threat. By combining these models with threat intelligence feeds (like MITRE
ATT&CK), vulnerabilities that are actively targeted or exploited can be iden-
tified with much precision [46].
• AI-Enabled Remediation Suggestions: AI also helps with detailed re-
mediation by providing developers code-level recommendations based on best
practices and previous patches. The time between detection and resolution
can also be greatly reduced by using sophisticated techniques that can even
automatically generate patches or suggest “drop-in” code fixes within IDEs.
Through reinforcement learning, these systems improve over time by continu-
ously adapting to the developer feedback and resolution outcomes [46].
• Integration into CI/CD and DevSecOps: The integration of AI-enhanced
security testing into CI/CD pipelines is seamless and in accordance with De-
vSecOps principles [40]. SAST, DAST, and IAST are all included in adaptive
testing frameworks. These frameworks are driven by contextual signals such
2.6 AI AND LLMS IN APPSEC 31
as feature deployments, code commits, and the API usage patterns. As a re-
sult, on-demand security testing becomes more intelligent, reducing the false
positives and test redundancy, and accelerating the delivery of secure software
[46].
The implementation of an AI-driven adaptive security testing framework re-
sulted in a 32% improvement in vulnerability detection and a 45% decrease in the
false positives, according to a case study within the mid-size fintech company [46].
Additionally, this implementation also encouraged cultural changes in the develop-
ment teams, establishing security as a fundamental part of the development process
rather than being a last-minute requirement. However, labeled data, careful fea-
ture engineering, and limited generalization are all requirements for classic machine
learning systems, which are currently being addressed by LLMs.
2.6.2 Introduction to LLMs and their potential in AppSec
LLMs have emerged as the transformative tools across a variety of domains, partic-
ularly in the field of software engineering and cybersecurity. LLMs are designed to
understand and generate human-like language, but their capabilities extend far be-
yond just NLP. They can demonstrate a deep understanding of the code semantics,
logical structures, and even the complex software vulnerabilities.
The application of LLMs in security domains has considerably increased in re-
cent years because of their exceptional contextual and semantic analysis capabilities
over static analysis techniques. Although, SAST are good at identifying well-known
vulnerability patterns using rule-based techniques, but it faces limitations with com-
plex, context-dependent, or unique threats. On the other hand, LLMs are able to
detect vulnerabilities that the traditional tools can miss since they can infer potential
risks based on deeper structural patterns [48].
2.7 AGENTIC AI: CONCEPTS AND USE CASES 32
Particularly in the area of AppSec, hybrid approaches such as combining LLMs
with SAST results and incorporating RAG techniques produce much better out-
comes [43]. These strategies dynamically integrates the up-to-date vulnerability
information into the LLMs reasoning process. For instance, by combining the out-
puts of the traditional SAST tools with tailored vulnerability reports retrieved via
semantic and structural similarity searches, LLMs can overcome both their privacy
and recency limitations. In addition to enhancing LLMs’ detection capabilities, this
integration makes it possible for a more adaptable and privacy-preserving security
architecture [48]. The integration paves the way for LLM-supported vulnerability
scanning systems.
In summary, LLMs hold significant potential to enhance the traditional secu-
rity tools, provided their limitations are mitigated through careful designs, such as
localized deployments, knowledge retrieval integrations, and structured prompting
informed by static analysis. As the field matures, LLMs have the potential to be-
come a central component in the next generation of intelligent, and context-aware
cybersecurity solutions.
2.7 Agentic AI: Concepts and Use Cases
The term “Agentic AI” describes a class of AI systems that can perceive, reason
and act on their own in a predetermined environment to achieve the specified goal
without requiring the explicit instructions at every step. Unlike passive models,
which generate output in response to prompts, agentic systems behave in a goal-
directed manner, usually by decomposing the tasks, planning steps, invoking tools,
and learning from the feedback loops.
Agentic AI and AI agents may sound similar, but when it comes to their capabili-
ties, they are very different. AI agents are task-specific which follow predefined rules
2.7 AGENTIC AI: CONCEPTS AND USE CASES 33
to handle repetitive tasks such as scheduling meetings or responding to customer
queries. On the other hand, agentic AI demonstrates a higher degree of autonomy,
and has the ability to navigate through complex environment and pursue long-term
goals. By using iterative planning and adaptive reasoning, it is able to modify its
strategies in real time and learn from feedbacks [49], [50].
While AI agents are more constrained and reactive in scope, agentic AI actively
perceives its environment, by integrating with various technologies, and reacting
proactively to changes. In general, AI agents are better at certain, defined tasks,
whereas agentic AI is made for dynamic, and more autonomous tasks [51].
In software, Agentic AI gathers information from multiple sources, such as
databases, APIs, and even online searches, to generate perception and context for
decision-making [52]. A reasoning engine (often powered by LLMs), memory compo-
nents, tool integration modules, and a decision-making layer with action sequencing
capabilities are the usual components of these systems. Because of their emergent
behavior, which enables them to operate semi-autonomously, agentic systems are
ideal for tasks involving iterative problem solving and dynamic decision-making.
2.7.1 Agentic AI use cases in software engineering, security,
and automation
Agentic AI makes use of adaptive intelligence, which allows it to react dynamically
to complex situations and learn from dynamic contexts, in contrast to traditional
automation or rule-based systems. It has applications in a number of IT domains,
including cybersecurity, software engineering, IT ecosystem automation and orches-
tration, etc.
2.7 AGENTIC AI: CONCEPTS AND USE CASES 34
Cybersecurity: Threat Detection & Response, and Identity Management
By improving the real-time threat response and automating the routine SOC tasks,
agentic AI is significantly advancing cybersecurity operations. AI agents are helping
organizations in addressing both internal and external threats more accurately and
quickly by streamlining processes such as anomaly detection, incident response, and
alert triage [53], [54]. Renowned SOC tools like CrowdStrike’s Charlotte AI, then
ReliaQuest’s GreyMatter platform, have took a step forward and are using Agentic
AI technologies to accelerate their threat detection capabilities with precision [49].
Agentic AI also strengthens the identity and access management. Twines dig-
ital employee, Alex, proactively identifies and mitigates the vulnerabilities related
to unauthorized accesses, reducing operational strain on IT and the cybersecurity
teams. As organizations adopt zero trust models, AI agents dynamically adjusts
access privileges based on the behavior and real-time risk assessments [49].
Software Engineering: Proactive AppSec and Secure SDLC
Agentic AI is revolutionizing the field of software development and is one of the key
element that is delivering a shift from reactive to proactive AppSec. Incorporating
AI agents into the SDLC enables workflows to automatically search code repositories,
detect the vulnerabilities, and even suggest or implement context-aware fixes [49].
Agentic AI tailors its analysis to each application’s specific architecture, in con-
trast to standard approaches which assign vulnerabilities, the standard severity
scores. This makes prioritizing and fixing security issues more precise [49]. Moreover,
these AI systems reduces the time and human efforts required for the vulnerability
management, minimizing the risks of both the oversight and error in manual coding
processes.
Perhaps the most significant aspect of agentic AI is it’s capable of automated
2.7 AGENTIC AI: CONCEPTS AND USE CASES 35
vulnerability remediation. Rather than relying only on developers to manually
address each security issues, AI agents can generate fixes that are both safe and
non-disruptive, substantially accelerating the secure code deployment. Additionally,
agentic AI systems are less susceptible to hallucinations as they have been trained
for specific goal oriented security tasks and operates mostly in defined contexts,
which also enhances reliability [55].
Automation and Orchestration in Security Ecosystems
In addition to automating particular set of tasks, agentic AI is increasingly playing
a key role in the coordination of complex, and multi-layered security ecosystems.
For example, 11 specialized AI agents created by Microsoft and its partners are
integrated into Microsoft’s Security Copilot to assist with a range of tasks, such
as vulnerability prioritization, breach reporting, and phishing triage. These agents
increases the accuracy for both the junior and the experienced professionals and
improve mean time to react (MTTR) by up to 30% [49].
Looking ahead, agentic AI is also set to contribute towards the international
cybersecurity coordination. Platforms such as Microsoft’s envisioned Cyber Eagle
initiative seeks to aggregate the threat intelligence from international sources includ-
ing the dark web and Interpol in order to simulate attack scenarios and cooperatively
modify the defenses in real time [56]. These advancements highlight the potential
of agentic AI as a framework for collaborative, cross-border cyber defense systems
as well as a tool for organizational security.
2.7 AGENTIC AI: CONCEPTS AND USE CASES 36
2.7.2 Potential of combining LLMs with agentic architec-
tures
The integration of LLMs with agentic AI architectures represents a promising tra-
jectory in the development of highly autonomous and intelligent systems. Although
traditional LLMs are powerful at generative and static reasoning, they often fall
short in contexts requiring memory retention, dynamic interaction, multi-step plan-
ning, and real-time tool integration. Agentic architectures addresses these limi-
tations by embedding LLMs within the modular systems that enable autonomous
decision-making, reflective reasoning, and procedural adaptability.
Agentic frameworks like the Agent Laboratory, AutoGPT, and Agentic RAG use
LLMs as their core reasoning engines and enhance them with features like memory
persistence, tool invocation, and environmental perception. This combination en-
ables more reliable task execution, especially in domains requiring sequential tools
use and long-horizon planning. For example, the Agentic RAG paradigm introduces
the planning and the reflection loops within retrieval-augmented systems, which en-
able LLMs to refine their outputs iteratively using external knowledge sources and
also using intermediate feedback mechanisms [57].
Furthermore, agentic architectures mitigate key challenges of the standalone
LLMs which includes hallucinations and static knowledge limitations, by incorporat-
ing real-time data retrieval, environmental feedback, and usage of external execution
engines. The outcome is an emerging class of systems that can both generate and
implement the solutions in dynamic, and real-world contexts [57].
Future developments in areas like automated research, personalized digital as-
sistants, and autonomous scientific discovery are projected to be inspired by the
interaction between LLMs and agentic designs. Future developments could include
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 37
scalable agent collaboration strategies, reinforcement learning-trained adaptive rea-
soning pipelines, and robust safety layers that monitor the tool use and behavioral
trajectories. The combination of LLMs and agentic architectures is therefore a sig-
nificant step in development of self-directed and general-purpose AI systems [57].
2.8 Security Standards and Risk Benchmarking
Security standards and scoring systems provides a common language for identify-
ing, categorizing, and prioritizing the software vulnerabilities. These systems are
essential for the proper risk communication, compliance, and automated triage. An
organization can utilize a single vulnerability management policy when it standard-
izes vulnerability scores across all of its hardware and software platforms. This policy
may be similar to a service level agreement (SLA) that specifies the timeframe for
validating and resolving a specific vulnerability [58].
The most widely adopted standards includes
• Common Vulnerabilities and Exposures (CVE): It is maintained by
MITRE, and assigns publicly known vulnerabilities with distinct identities so
that they may be consistently referenced across different tools and databases.
The vulnerabilities are discovered then assigned and published by the global
organizations, that have partnered with the CVE Program. CVE Records are
published by these partners itself to provide consistent vulnerability descrip-
tions. These CVE Records are then ready to be utilized by cybersecurity and
information technology professionals to make sure they are discussing about
the same issue and to coordinate their efforts to prioritize and fix the vulner-
abilities [59]. Every vulnerability entry in CVE comes with CVE-ID, which
contains the year it was released and is given a unique identification number.
and it follows the format “CVE-YYYY-NNNNN”. The CVE database is ba-
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 38
sically an indexing tool rather than a source of comprehensive technical or
exploit information.
• European Union Vulnerability Database (EUVD): It is a recent ini-
tiative which is aimed at complementing the global efforts in vulnerability
disclosure, particularly in the context of European cybersecurity regulations.
EUVD, which was established under the EU Cybersecurity Act and is managed
by ENISA, aims to establish a trusted and sovereign platform for collecting and
disseminating data regarding hardware and software vulnerabilities within the
EU market. Unlike CVE, which caters the global market, EUVD places more
emphasis on compliance with the EU’s regulatory and industrial landscape. It
also encourages vendors, researchers, and national authorities to responsibly
report vulnerabilities by integrating with coordinated vulnerability disclosure
(CVD) methods. Although EUVD currently uses CVE identifiers for consis-
tency, it also has the potential to introduce region-specific categories, adding
a variety of viewpoints on the relevance and impact of vulnerabilities to the
global ecosystem [60].
• Common Vulnerability Scoring System (CVSS): Developed by FIRST,
CVSS provides a numerical score (0-10) along with qualitative severity ratings
such as Low, Medium, High, and Critical which indicates the severity of a
vulnerability based on factors like exploitability, impact, and the scope. The
most recent version of CVSS is v4.0. A collection of base metrics (such as
attack vector, complexity, and privileges required), temporal metrics (such
as exploit code maturity and remediation level), and environmental metrics
(such as a potential impact within a specific organization) are used by CVSS
to determine the final scores. Automated vulnerability management tools and
security advisories commonly use these scores to prioritize the remediation
efforts. Despite its usefulness, CVSS has been criticized for being static, which
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 39
means that it would not accurately reflect the real-world risk associated with
a vulnerability as conditions evolve [61].
• Exploit Prediction Scoring System (EPSS): Maintained by the FIRST
and estimates the probability that a particular vulnerability will be exploited
in the wild within the next 30 days. By providing a dynamic and a prob-
abilistic method, it addresses the limitations of static scoring systems like
CVSS. Exploit activity, vulnerability details (obtained from CVE database),
and social signals are among the real-world data that EPSS utilizes to train its
machine learning models, which is the core behind predicting the exploitabil-
ity. The most recent version of EPSS is v4.0. As a result of calculating EPSS
for particular vulnerability, it generates a resulting score, which is expressed
in percentage, and it helps organizations in prioritizing the patching efforts
based on the actual threat landscape, rather than just theoretical risk alone
[62]. Instead of addressing every vulnerability with equal urgency, this data-
driven approach reflects a shift toward predictive vulnerability management,
where organizations can focus more on potential exploitation scenarios.
A multi-layered ecosystem for vulnerability classification and scoring can be
made possible by using CVE, EUVD, CVSS, and EPSS. As a central reference
point, CVE offers a standardized identifier for vulnerabilities. By incorporating the
classification within a regional legal framework, EUVD supports national Computer
Science Incident Response TeamCSIRTs) and critical infrastructure operators in the
European Union (EU), ultimately complementing CVE.
On top of these classification systems, CVSS adds a technical severity assessment,
offering a way to compare the vulnerabilities based on their intrinsic characteristics.
Meanwhile, EPSS introduces a predictive, real-world dimension to vulnerability pri-
oritization by giving an exploitability score, enabling security teams to allocate
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 40
resources more effectively.
A more thorough and strategic approach to vulnerability management is sup-
ported by the integration of these systems, where identification (CVE/EUVD), tech-
nical severity (CVSS), and exploit likelihood (EPSS) all work together to produce
a risk-informed remediation plans.
These solutions enable security teams and tools to rank vulnerabilities according
to precision-driven objective metrics instead of relying on the personal opinions.
For agentic AI, aligning with these standards provides the anchor points for better
decision logic, enabling more accurate and interpretable triage.
2.8.1 Secure-SDLC Frameworks and Standards
As discussed in section 2.1, static security analysis is the integral part of secure
SDL. Likewise, a number of industry frameworks transform these high-level “secure
by design” principles into concrete tasks that can be easily incorporated into each
stage of the SDLC.
The practices that are most relevant to automated risk triage and false-positive
reduction are highlighted below by taking an in-depth study of each framework. An
AI-first solution could employ the repeatable controls, metrics, and data feeds pro-
vided by the frameworks below, which synthesize the decades of secure engineering
practice, to suppress the false positives early and to elevate only risk-bearing errors.
A. NIST Secure Software Development Framework (SSDF, SP 800-218v1.1)
The NIST SSDF provides a foundational set of software security best practices.
It puts these practices into four categories: producing well-secure software (PW),
protecting software (PS), responding to vulnerabilities (RV), and preparing the orga-
nization (PO). Establishing clear security requirements helps in creating guidelines
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 41
that can guide AI tools to focus more on critical issues. Managing the information
about third-party components, such as libraries, also helps AI to understand where
risks come from [12], [63].
Key points for AI:
• Static analysis rules can be improved by using specified security requirements.
• Utilize information from the software bill of materials (SBOM) to understand
the third-party risks.
• SSDF is a perfect standard for assessing AI-driven triage capabilities because
it explicitly calls for automation wherever possible.
B. Microsoft Security Development Lifecycle (SDL)
Microsoft SDL is a detailed process that involves security focused steps like secure
design reviews, threat modeling and testing at every stage of the development cycle.
It works effectively with Microsoft tools and can supply the information that AI
agents can further utilize to improve their assessment ability. Exporting threat
models, for instance, could help AI in understanding the potential attack paths
throughout the code [12], [64].
Key points for AI:
• Utilize threat modeling data to improve AI’s code analysis.
• Utilize the findings from fuzz testing to modify the risk rankings according to
the ease of exploiting a vulnerability.
• Since CI/CD “quality gates” are used to enforce the security checkpoints, SDL
is a model implementation of shift-left security at hyperscale cloud velocity.
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 42
C. OWASP Software Assurance Maturity Model (SAMM)
A maturity model called OWASP SAMM aids businesses in assessing and enhancing
their software security procedures. It addresses topics including operations, design,
implementation, governance, and verification. AI systems can use these maturity
levels to modify how strictly they filter the findings, and organizations can also track
their progress [12], [65].
Key points for AI:
• Utilize maturity scores as input to enhance the reduction of the false positives.
• AI models can be trained on real issues using defect management data.
D. Building Security In Maturity Model (BSIMM)
The foundation of BSIMM is the observation of what many companies actually do
in practice to improve the overall software security. It lists the common activities
and how often they are done within the organisation. This can help AI in giving
priority to controls that have been shown to work well in real-world scenarios. For
example, if many companies automate the static analysis checks, AI can he focus
more on that first [12], [66].
E. ISO/IEC 27034 – Application Security Guidelines
ISO 27034 is a set of standards which promotes the integration of security practices
into SDLC while taking the business and regulatory environments into account. It
assists in tailoring security measures to the importance of the application, which
helps AI assess vulnerabilities based on the software’s criticality [12]. ISO 27034,
which emphasizes auditability and process evidence, connects SDLC operations to
the overall governance of ISO 27001.
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 43
The key important points from each of the standards and frameworks that have
been addressed are covered in Table 2.6, which can be further used for designing the
solution.
SSDLC Standard
/Framework
Key Takeaway for Designing Agentic AI Solution
NIST SSDF Clear tasks and supporting documentation, such as SBOMs and
specif policies, can assist AI to filter out irrelevant static analysis
alerts.
Microsoft SDL Utilize the testing data from development tools and threat models
to enhance AI’s ability to identify and prioritize real risks while
reducing noise.
OWASP SAMM (v2.1) Utilize KPIs and maturity scores that enable AI to modify its
filtering level in flagging out the potential issues based on organi-
sation specific security level.
BSIMM Utilize empirical data on others security practices, to help AI focus
on the most crucial findings.
ISO/IEC 27034 Provides information on the business impact so AI can assess the
severity of vulnerabilities based on the actual value of the asset,
and not simply on the technical specifications.
Table 2.6: Key Takeaways from Secure SDLC Standards and Frameworks
2.8.2 OWASP Top 10 and how static tools align with it
Maintained by the Open Worldwide Application Security Project (OWASP), the
OWASP Top 10 is a commonly recognized list of the most critical web application
security risks, which can be seen as the backbone of AppSec. It provides developers,
security professionals, and organizations with a useful manual for understanding
the most common vulnerabilities and with the prevention strategies. Early in the
development lifecycle, static analysis tools can be very useful in identifying many of
these risks, way before the deployment or release. Table 2.7 discusses the OWASP
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 44
Top 10 risks as published in 2021 [67].
Rank Risk Description
A01:2021 Broken Access Control Failures in enforcing proper user per-
missions and roles.
A02:2021 Cryptographic Failures Problems with a weak or a lack of en-
cryption.
A03:2021 Injections SQL, command, and other input injec-
tion flaws.
A04:2021 Insecure Design Architectural defects or unsafe default
behaviors.
A05:2021 Security Misconfiguration Unpatched flaws, unnecessary features,
or Insecure settings.
A06:2021 Vulnerable & Outdated Components Using Outdated libraries or libraries
with known vulnerabilities.
A07:2021 Identification & Authentication Fail-
ures
Issues in login, session management,
etc.
A08:2021 Software & Data Integrity Failures Insecure code or data pipelines that lack
validation or integrity checks.
A09:2021 Security Logging & Monitoring Failures Missing or ineffective logging and alert-
ing mechanisms.
A10:2021 Server Side Request Forgery Forcing server to make unintended re-
quests.
Table 2.7: Overview of OWASP Top 10 (2021 Edition)
Static Security Testing tools perform effectively when they are in line with known
vulnerability patterns and secure coding techniques. Table 2.8 shows how SAST and
SCA tools typically perform across the some of the OWASP Top 10 categories. Same
table 2.8 shows that SCA is crucial for risk management in third-party components,
whereas SAST is more suitable for examining code-level vulnerabilities. Combining
both strategies or improving them with AI agents is advantageous for several high-
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 45
impact OWASP categories, particularly A01, A04, A06, and A09.
OWASP Risk
SAST
Support
SCA
Support
Remark
A01: Broken Access Control Partial No Difficult to detect without run-
time or business logic context.
A02: Cryptographic Failures Yes No SAST can detect weak crypto,
hardcoded keys, insecure modes.
A03: Injections Yes No SAST excels at finding SQL, com-
mand, LDAP, & other injections.
A04: Insecure Design Partial No Mostly architectural; needs threat
modeling or manual review.
A05: Security Misconfiguration Partial Partial SAST can check code misconfigs;
SCA can detect insecure defaults.
A06: Vulnerable & Outdated
Components
No Yes SCA is designed to flag known
CVEs in libraries & dependencies.
A07: Identification & Authenti-
cation Failures
Partial No SAST detects common auth flaws;
struggles with session flow issues.
A08: Software & Data Integrity
Failures
Partial Partial Some tools catch unsafe deserial-
ization or tampered packages.
A09: Security Logging & Moni-
toring Failures
No No Mostly a runtime issue; out of
scope for SAST/SCA.
A10: Server Side Request
Forgery
Yes No SAST can detect unsafe URL calls
from untrusted input.
Table 2.8: SAST and SCA supporting OWASP Categories
2.8.3 Importance of standardized scoring in prioritization
In order to prioritize vulnerabilities in a consistent and scalable manner, standard-
ized score systems such as CVSS, EPSS, and OWASP classifications are essential.
These benchmarks provide a common framework that goes beyond individual inter-
2.8 SECURITY STANDARDS AND RISK BENCHMARKING 46
pretation in environments with hundreds or thousands of security findings.
Key advantages include:
• Consistency Across Teams and Tools: By using standard definitions of
severity, impact, and likelihood, security engineers, developers, and automa-
tion systems can make decisions more consistently.
• Automated Risk Triage: Programmatically ranking issues, setting remedi-
ation SLAs, and initiating alerts or workflows can all be done with numerical
and categorical ratings. For instance, vulnerabilities that have a CVSS of 7.0
or higher are often marked for immediate attention.
• Risk Aggregation and Reporting: Standardized scores makes it easier to
aggregate data across projects and platforms, which enables dashboards and
compliance reports to promptly highlight critical issues.
• AI Interpretability: Standardized scores acts as ground truth anchors for
agentic AI systems, assisting in decision-making, output formatting, and align-
ing automated triage in accordance with human expectations.
• When applying the standardized scoring methodology across static security
testing, SARIF should also be taken into account. By doing this, the interop-
erability issue will be resolved.
By integrating these scoring systems into the Agentic AI’s decision logic and
reasoning functions, it ensures that triage outcomes are not only accurate but also
transparent, defensible, and industry-aligned.
2.9 RESEARCH GAP ANALYSIS 47
2.9 Research Gap Analysis
The limitations of current security technologies, the current status of secure software
development methods, and the evolving role of AI in security activities have all
been discussed in the preceding sections. Although each of these factors enhances
the software security posture individually, there are still several significant gaps,
particularly regarding efficiently and accurately classifying static analysis outputs
at scale.
2.9.1 Identified gaps
Significant issues with current static security analysis methods include fragmented
outputs from separate programs without integrated reasoning for exploitability eval-
uation and consistently high false positive rates that require manual triage. Since
the tools rely more on static severity rather than dynamic contextual indications,
current prioritization techniques are frequently inadequate. Additionally, there is a
significant gap between automated security decisions and well-established Secure-
SDLC frameworks, which makes compliance and traceability more difficult, and the
potential of Agentic AI with LLMs for autonomous, intelligent triage is still largely
unrealized.
2.9.2 Justification for approach as proposed in this thesis
This thesis introduces a Agentic AI–based risk-triage system which provides a cen-
tralised, intelligent, and standards-compliant solution for automated security triage
in an attempt to bridge the above gaps. The proposed system:
• Utilises LLMs with task-planning capability to automatically process and clas-
sify static analysis results.
• Eliminates redundancy and fragmentation through consolidation of results
2.9 RESEARCH GAP ANALYSIS 48
from multiple scanning tools (SAST, SCA, secrets detection) into a single
triage layer.
• Employs frameworks of reasoning that combine heuristics from secure devel-
opment guidelines (e.g., SSDF, SAMM, BSIMM) and standardised scoring
models (CVSS, EPSS).
• provides a web-based dashboard that presents risk categorisations, filtered
results, developer-specific insights, and actionable fix recommendations.
• Allows human-in-the-loop validation, which allows security and development
teams to review, validate, and improve the system’s decisions over time.
Alongside reducing the expense of false positives, this combined approach is
expected to enhance security triage’s speed, accuracy, and transparency. The sys-
tem surpasses traditional automation in that it enables contextual reasoning and is
aligned with secure-SDLC principles, which helps develop a software security lifecy-
cle that is more intelligent and robust.
3 Methodology
Hierarchical and multi-layered design of the proposed methodology is expected to
solve the identified gaps in the field of static security analysis, including the large
volume of false positives, long and time consuming triage processes, and fragmented
output from tools. The system is built on an Agentic AI paradigm that inspects and
ranks scan data independently by combining rule-based heuristics with LLM-based
reasoning. The modular design of the architecture ensures the scalability, maintain-
ability, and seamless integration with actual CI/CD pipelines and also makes this
approach production ready.
3.1 Proposed Solution Architecture
The five primary layers in this system architecture are comprised of the Input
Pipeline, Agentic AI Core, Memory and Feedback Mechanism, Output Pipeline,
and Web-Based Dashboard. Altogether, these layers processes the raw, unstruc-
tured security findings and turn them into prioritised, actionable information that
provides end users with clear, filtered findings and also supports them with remedi-
ation information.
3.1.1 Overview of Functional Layers
Figure 3.1 illustrates the complete system architecture, clearly outlining the inter-
action and connectivity among its layers and components. Each layer within this
3.1 PROPOSED SOLUTION ARCHITECTURE 50
architecture is responsible for distinct and critical functions, including:
• The input pipeline: It is the entry point of the system, where the security
scanners, such as static analysis, dependency scanning, and secrets detection
tools are called, their results are then aggregated, and the resulting data gets
normalised into one internal representation. This normalisation step is neces-
sary to enable underlying agents to reason uniformly across data sources, as
different tools produce different outputs.
• Agentic AI Core: The underlying building block of the system is the Agen-
tic AI Core. It is composed of a number of AI agents, each of which is designed
to carry out a different task in the risk triage pipeline. Starting with the false
positives detection, these agents operate in sequence and collaboration to cre-
ate risk score, its explaination, and generating remediation. It enables the
system to scale expert-level decision-making by combining lightweight heuris-
tic reasoning with the flexible reasoning abilities delievered by LLMs. The
accuracy and clarity of the final output are then continuously improved as
each agent helps to better understand each reported issue.
• Contextual understanding is enabled by the system’smemory and feedback
component, which is often utilised along the entire pipeline. Agents takes
help from and constructs upon intermediate results generated in earlier stages,
instead of treating every problem in isolation. This makes it possible to mimic
iterative human judgement, increase the overall consistency, and incorporate
past reasoning into future decisions. Memory components stores key metadata
such as classification outcomes, risk levels, and partial LLM outputs, which
can be helpful to agents to deliver accurate results.
• After performing the risk triage and explainatory operations, the resulting
data is then securely transmitted to a centralized backend for storage and
3.1 PROPOSED SOLUTION ARCHITECTURE 51
visualisation through the Output Pipeline. Cryptographic methods ensure the
confidentiality of result data during transit, and the whole process is designed
to maintain integrity and secrecy across the network boundaries. The pipeline
can be employed in safe production environments as it can effortlessly integrate
with the databases and web-based services.
• The final layer of the system is the Web Dashboard, which renders the result
and serves it to end user. It delivers both unfiltered and filtered results, along
with prioritization of filtered ones, and provides the remediation suggestions
by following best security practices for developers, and provides full contextual
information for each issue. With the help of statistical insight and visualisa-
tions, users easily understand the security posture of their codebase, analyse
the risk trend, and asses the impact of remediation efforts over time.
Figure 3.1: Proposed System Architecture
Extensibility is a primary factor in the design of the architecture. This architecture
supports the addition of new scanning tools, additional agents, and evolving scoring
3.2 AGENT ARCHITECTURE AND REASONING LOGIC 52
models (e.g., modified CVSS/EPSS logic), which makes it flexible to use. Moreover,
this architecture offers a scalable foundation for modern secure software develop-
ment pipelines through its augmentation with autonomous agents and reliance on
standardised frameworks.
3.2 Agent Architecture and Reasoning Logic
To minimise false positives, categorise threats, and support the security decision-
making, a coordinated set of AI agents makes up the core of the proposed system.
The basis for this architecture is agentic AI, which holds up the independent com-
ponents and agents to operate separately, yet toward a shared objective. Each agent
performs a specific well-defined set of tasks while being a component of a broader,
goal-driven system. Through this cooperative and the modular approach, the system
is able to address the difficult reasoning problems while maintaining the explainabil-
ity, flexibility, and transparency. Figure 3.2 shows the Agentic AI architecture that
this proposed solution follows.
3.2.1 Decision Making Flow
Input normalisation is the first step, where the reports from tools are transferred
using parsers, and then the system begins by normalising all the data received into
a uniform internal representation. All subsequent agents used in the architecture,
regardless of the role they play, will be able to reason over a uniform structured
dataset due to this normalisation process.
The process of risk assessment begins with the false positive identification after
normalisation, where the system identifies and filters out the potential results that
are not likely to be actual security threats. This is a combination of advanced logic
3.2 AGENT ARCHITECTURE AND REASONING LOGIC 53
Figure 3.2: Agentic AI Architecture
informed by LLMs with some rule-based heuristics. To properly reduce the noise
while maintaining the flexibility needed to process ambiguous or edge scenarios, the
system uses the probabilistic AI-based logic with deterministic filters.
After filtering out the false positives, the remaining true positives are then
handed over to the risk scoring agent to assess the risk. By employing widely
accepted measures such as the CVSS, EPSS, false positive detection agent evaluates
the severity and the exploitability of each issue. The agent uses rule-based reason-
ing that considers contextual factors, such as the type of vulnerability, its path in
the codebase, and the sensitivity of the impacted asset, when there is no standard
scoring information available. This module, unlike past agents, employs a limited
quantity of LLMs and primarily dependent on logical and numerical computing since
3.3 TOOL AND DATASET SELECTION 54
risk scoring is founded on structured quantitative methods.
After calculating the risk score, agents are dependent on getting the explanation
behind the score and also the remediation strategy to resolve the issue. Remedial
guidance comes with accurate reference links from sources such as the OWASP
Prevention Cheat Sheet to enhance its effectiveness and readability.
Memory and feedback loops provide a contextual foundation to the underlying
agents, upon which they can start from previous evaluations and maintain consis-
tency within the agents’ decision, supporting multi-stage reasoning and maintaining
accuracy. This architecture simulates human experts’ decision-making process using
a knowledge corpus to inform their selections.
This architecture provides a scalable and an intelligent method of automated
security triage by combining the rule-based methods with AI-driven reasoning and
then arranging them into a logical, memory-aware agent sequence. It enables the
organisations to focus on the most critical risks.
3.3 Tool and Dataset Selection
A set of popular static security analysis tools was selected to effectively demonstrate
and analyse the proposed system. This selection ensures comprehensive coverage
of the three principal threat surfaces, exposed secrets, insecure dependencies, and
source code flaws, that are commonly encountered in the modern systems. In addi-
tion, domain knowledge from the secure software development practices is integrated
into the created datasets and rulesets.
Three tools were chosen based on their popularity, open-source nature, reliability,
and ease of integration within automated pipelines, as shown in Figure 3.3 and
described in Tables 2.3, 2.4, and 2.5. These tools are then connected to proposed
3.3 TOOL AND DATASET SELECTION 55
Figure 3.3: Security Tools for Pipelines
Agentic AI (named Noesiz):
1. The primary SAST tool is Semgrep Community Edition (CE). It supports
multiple language with having 1,062 rules maintained by the community, and
also provides the feature to create custom rules.
2. SCA utilises OWASP Dependency-Check (DC) to scan project dependencies
and match them to known vulnerabilities. System is configured to fetch data
directly from the National Vulnerability Database (NVD) using NVD API,
which has more than 300,000 entries per scan.
3. GitLeaks is selected to discover hardcoded secrets in the codebase, such as
cloud credentials, authentication tokens, and API keys. It supports both
general-purpose and service-specific (e.g., AWS, Discord, etc) secret detection.
3.3.1 Alignment with Industry Standards and Frameworks
The solution incorporates external data sources to improve vulnerability benchmark-
ing in addition to using the data produced by static analysis tools. Metrics from the
CVSS and EPSS are specifically used to enhance discoveries linked to known CVEs.
These ratings, which the triage agents receive via trusted APIs, enables associated
agent to sort vulnerabilities by their severity and exploitability.
Well-established frameworks outlined in Section 2.8.1 act as a reference for creat-
3.4 EVALUATION METRICS 56
ing custom rules and filters to ensure the consistency between the reasoning logic of
the AI agents and standard secure development practices. These guides offer data
for both the underlying logic employed to qualify severity based on vulnerability
type and contextual code data, as well as rule content used in static analysis.
3.4 Evaluation Metrics
This research employs a collection of quantitative and qualitative measures, which
are aligned with the primary objectives of the system to measure the performance
of the proposed agentic AI system. These metrics are grounded in both academic
evaluation standards and organisation-specific practical security engineering needs,
ensuring that the results can be meaningfully interpreted by practitioners and re-
searchers alike.
3.4.1 False Positive Reduction Rate (FPRR)
The False Positive Reduction Rate (FPRR) is an important assessment metric em-
ployed to measure how effectively the system is addressing the reported issue. FPRR
quantifies the proportion of first-flagged results that are correctly eliminated by the
system as not being alarming while preserving the integrity and importance of actual
security threats. This measure provides a clear picture of how effectively the system
can reduce noise without compromising on the detection accuracy. A high FPRR
indicates how well the system reduces pointless alarms, which saves the triage time
and boosts the trust in the precision and dependability of the tool’s output.
3.4.2 Triage Accuracy
Triage accuracy is also an important evaluation metric that checks how accurately
the system is able to label reported issue as true positives or false positives. This
3.4 EVALUATION METRICS 57
evaluation is conducted by manually checking the system’s AI-based labels against
those generated by experienced security professionals.
3.4.3 Time Efficiency
Processing time is reported as an added measure of system effectiveness but not
benchmarked against a fixed performance target. Traditional triage approaches
often involve an intensive manual review process that frequently takes hours or even
days. On the other hand, the proposed system is designed to operate autonomously
and typically completes the overall triage process within a few minutes, which is
roughly how long modern CI/CD pipelines take to run.
Nevertheless, because several outside influences like hardware configurations,
CI/CD pipeline settings, LLM reaction time, and architectural variances can easily
have a substantial impact on the execution time, it should be mentioned that pro-
cessing time is not utilised for direct comparative purposes against other systems.
4 Implementation
The entire prototype1 of the proposed architecture, as can be seen in figure 3.1 is
designed to operate inside a Github Actions (GHA) pipeline in order to replicate
a development environment that is almost ready for production. The end-to-end
process involves integral components like codebase scanning, AI-based triage, and
result visualisation through an online dashboard accessible to the end-user.
Moreover, the framework utilises production-level components such as cloud-
hosted dashboards, API servers, secure key management, and a PostgreSQL database
backend to enable scalability and maintainability. The solution is designed for cloud-
native deployment instead of local run-time, which aligns more towards modern De-
vSecOps practices and allows easy integration into next-generation software delivery
pipelines.
4.0.1 Technology Stack and Tooling
The system is built using a carefully selected stack of languages, tools, APIs, and
platforms that ensure performance, maintainability, and security as mentioned in
Table 4.1, 4.2, 4.3, and 4.4.
1All the source code and assets required for this project is stored on Github, and is available
at: https://github.com/sumitUTU/Noesiz
CHAPTER 4. IMPLEMENTATION 59
Tool/Platform Purpose/Description
GitHub Used for version control and automation via
GHA.
Runner.com Hosts the API server and frontend web dash-
board.
Cloudflare Handles domain management and security
(e.g., HTTPS DDoS protection).
Google Cloud Platform Provides LLM (Gemini API), Generative Lan-
guage API, and external integrations
Noesiz.tech Registered domain pointing to the deployed
application and dashboard.
Google Trust Services Provides the SSL certificate for secure HTTPS
communication.
Table 4.1: Infrastructure and Platform used for Implementation
Tool/Platform Purpose/Description
Python Core backend logic, agent implementation, and
API communication.
Flask Web framework used for serving the dashboard
and APIs.
JavaScript, HTML, Tailwind CSS For frontend visualisation.
SQL (PostgreSQL) Database for storing triaged results.
Table 4.2: Programming Languages and Frameworks used in Architecture
Tool/Platform Purpose/Description
Semgrep CE Used as the SAST engine for detecting insecure
coding patterns.
OWASP Dependency-Check Analyses third-party libraries and flags known
CVEs.
GitLeaks Detectshardcoded secrets, API keys, and cre-
dentials in source code.
Table 4.3: Security Tools used in Architecture
4.1 INPUT PIPELINE 60
Tool/Platform Purpose/Description
Gemini 2.5 Flash (via Gemini API) The LLM used to power all agent reasoning
and responses.
FIRST EPSS API For dynamic exploit prediction scoring.
NIST CVSS API To fetch severity scores and vulnerability
metadata.
Table 4.4: APIs and External Sources in use by the Agentic AI
These tools are used in combination with custom Python scripts and logic driven
heuristics to enable the autonomous functioning of the system, from ingesting raw
scan results to returning filtered, actionable outputs.
4.1 Input Pipeline
The system’s initial step, the Input Pipeline, is tasked with gathering, categorising,
and preparing raw data to the point that the agentic AI framework can further
process it. It acts as the system’s input layer, adding particular security scanning
tools directly into the CI/CD process of GHA. Each tool generates the report in
JSON format. However, the structure and parameters of these reports vary across
tools. To address this inconsistency, a custom parser is employed to transform and
standardize the outputs into a uniform schema.
The pipeline, established to reflect real development processes, adheres to stan-
dard CI/CD triggers, such as pull requests or merges to main branches, where static
analysis is typically performed automatically. Likewise, as soon as there is a valid
commit in the repository along with a push request, it will automatically trigger
the GHA, and so the Agentic AI. Figure 4.1 shows the successfully triggered GHA
pipeline on a push, and it runs all the jobs, and stored the report from the tools in
its artifact storage.
4.1 INPUT PIPELINE 61
Figure 4.1: Github Actions Pipeline (triggered)
4.1.1 Folder Structure and Scanning Setup
The application code resides within a particular project directory, for example, src,
where the system executes. Three security scanning tools are executed as part of
the GHA pipeline workflow upon run:
• Semgrep CE: Employs a combination of default and custom rulesets to scan
source code. GHA configuration for semgrep used in pipeline can be seen at
Figure 4.2.
Figure 4.2: Semgrep GHA Configuration
• OWASP Dependency-Check: Searches for known vulnerabilities listed un-
der CVEs by scanning project dependencies. GHA configuration for OWASP
Dc used in pipeline can be seen at Figure 4.3.
4.1 INPUT PIPELINE 62
Figure 4.3: OWASP DC GHA Configuration
• GitLeaks: Searches for hardcoded secrets within the codebase, including
source files, env vars, and config files. GHA configuration for OWASP Dc
used in pipeline can be seen at Figure 4.4.
Figure 4.4: Gitleaks GHA Configuration
Each tool generates a formatted output, typically in JSON, and it then gets
stored in GHA artifact, as can be seen in figure 4.1. The downstream components
of the AI triage system consume these output files for further processing once they
are temporarily held in the GHA run-time environment.
4.1 INPUT PIPELINE 63
4.1.2 Secrets and Configuration Management
GitHub Secrets are utilized to securely store sensitive credentials, such as the shared
secret for the API server, the Gemini API key, and the NVD API key. These are
never stored to disk or viewed in plaintext logs; rather, they are injected into the
pipeline during execution. Additionally, the customized configurations are included
in each scanner, like specific rulesets (121 custom rules) were added on top of 1062
community rules in Semgrep SAST, to check and flag insecure practices. With
the NVD API key, the pipeline fetches the latest vulnerability data from the NVD
database during each execution,which ensures OWASP Dependency-check to cover
newly discovered threats. Also, to minimise noise and optimise efficiency, only
relevant folders and files are targeted. Each tool is configured to output reports in
the JSON format for streamlined processing later.
4.1.3 Normalization and Data Preparation
The system includes a normalising layer to ensure consistency since the output of
every scanning tool is generated in a unique format. To align these outputs to
one internal schema and avoid confusion during LLM-based analysis, some special
parsing is still required, despite the fact that the selected tools allow SARIF and
generate JSON reports. Parser scripts are used to perform such operations.
Subsequent to parsing each of these reports from Semgrep, OWASP Dependency-
Check, and GitLeaks, the most important information is reformatted into a uniform
format, with the help of parsers, and then stores it in memory and feedback loop.
The Agentic AI Core accepts this normalized output, ensuring downstream reasoning
and uniform triage.
4.2 AGENTIC AI CORE 64
4.1.4 Summary of Input Pipeline Workflow
In order to prepare and deliver the enhanced security information to the Agentic
AI Core, the input pipeline performs a series of sequential actions. The following
constitute this data flow:
1. Code push: When code is pushed to the GitHub Repository by a developer,
the GHA pipeline is initiated.
2. Tool execution: The pipeline starts the Semgrep CE, GitLeaks, and OWASP
Dependency-Check scanners, which scan the particular folders of the reposi-
tory where the developer has committed changes.
3. Report: JSON reports are produced by each tool after a successful scan.
4. Parsing and normalisation: The outputs from the tools are then processed
by custom parser scripts (gitleaks_parser.py, owasp_dc_parser.py, and sem-
grep_parser.py) and transformed into a standardised, consistent structure.
5. Data handoff: The Agentic AI Core takes in the normalised results for au-
tomated triage and reasoning.
The visual representation of the data flow can be seen in figure 4.5. The out-
puts from different security tools are effectively integrated and streamlined by this
optimised pipeline, thereby laying the groundwork for automated and smart risk
analysis.
4.2 Agentic AI Core
The central intelligent layer of the proposed system is the Agentic AI core. It pro-
cesses the normalized outputs from static security scanners, applies layered reasoning
on it, and then outputs a filtered and prioritized list of risks along with the reasoning
4.2 AGENTIC AI CORE 65
Figure 4.5: Data flow Diagram of the system
explaination. Built on the principles of agent-based architecture, this core consists
of multiple autonomous agents, each one is responsible for handling one part of the
triage process, while moving towards the common objective.
To improve the quality of decision-making, these agents work within a sequential,
memory-aware pipeline where context is built up at each step, which is again read
up and write to by all agents. The architecture acts the same way as human security
engineers would, by evaluating and ranking vulnerabilities in terms of integration
with heuristics, rule-based elimination, and LLM-driven reasoning.
A master orchestrator governs the core, scheduling each agent’s run, related
4.2 AGENTIC AI CORE 66
scripts, and decision logic, as illustrated by Figure 4.6. The elements of the Agentic
AI Core are the False Positive Detection Agent which detects and filters out results
that are not applicable or not exploitable. Then the Risk Scoring Agent which is
more focused on logical rules, and deploys CVSS, EPSS, and mathematical logic to
derive the risk score. Each outcome then is supported by human-readable context
evidence and reasoning furnished by the Risk Explanation Agent. Lastly, the Re-
mediation Agent offers actionable remediation recommendations, which are guided
by best security practices in secure coding. The most important component which
serves all this agent is the Memory & Feedback Loop, that facilitates the iterative
improvement, remembers the previous triage results, and preserves agent state.
Figure 4.6: Implmented Agentic AI Architecture
The goals, implementation rationale, and interaction patterns of each of the com-
ponents of the Agentic AI Core are discussed in detail in the subsequent subsections.
4.2 AGENTIC AI CORE 67
4.2.1 Triage Orchestrator
Inside the Agentic AI Core, the Triage Orchestrator is the principal controlling mod-
ule. It is responsible for orchestrating the sequential and successful execution of each
agent component, ensuring that the triage process, all together from input normal-
isation to the final risk output, moves in a deterministic, traceable, and systematic
manner. It also maintains and manages the logs, again following the security best
practices.
The orchestrator, used in main.py, is the primary entry point to the AI pipeline.
It sets up the runtime environment, manages the ingestion of scanner outputs
(through parsers), and calls each agent sequentially in the correct run order when
it is called during the CI/CD process with GHA.
4.2.2 LLM Communication Layer
A specialised Python program called llm_client_api.py makes sure that the com-
munication with the Gemini API is secure, reusable and dependable. Through
the abstraction of API complexities, this package provides an easy-to-use and un-
ambiguous interface for communication with LLMs. It takes care of error retries,
response parsing, prompt building, authentication using securely held API keys (via
GitHub Secrets), and fault-tolerant and consistent integration across the Agentic AI
pipeline.
Additionally, by decoupling LLM communication from agent logic, this compo-
nent is critical to maintaining modularity. The llm_client_api.py package consoli-
dates the interaction with the Gemini model so that agents focus on what they want
to ask, while the communication layer handles how to ask it.
4.2 AGENTIC AI CORE 68
4.2.3 False Positive Detection Agent
Implemented in false_positive_agent.py, it is a specific module created to address
the problem of false positives in security scan results. Through the combination of
contextual analysis powered by LLM and using tool-specific heuristics, it decides
whether a given reported issue is a true positive (TP) or a false positive (FP). Its
primary objective is to minimise noise without removing actual vulnerabilities, and
this will enhance the overall triage process’s accuracy and efficiency.
Tool-Specific Heuristics
The False Positive Detection Agent initially applies a sequence of deterministic
heuristics derived from domain knowledge and known properties of every scanning
tool before leveraging the LLM for context analysis. The heuristics listed below are
customised for every tool:
• Semgrep CE: Findings labeled with “dead code” in their message or those
assigned a severity level of INFO are filtered, especially if they appear in non-
runtime files. Code fragments flagged in unreachable branches or legacy test
files are deprioritized.
• OWASP Dependency Check: Dependency-related vulnerabilities that are
only discussed in test scenarios are often mentioned as potential false positives
since they could not be exploited.
• Gitleaks: Secrets appearing in test, sample, or documentation files and with
entropy scores less than some cutoff (e.g. 3.5) are likely to be non-sensitive
placeholders. Example: DUMMY_SECRET=“123” would not appear in a
test file like env/.env.example.
The false_positive_agent.py script already has these filtering rules integrated into
it, making for efficient pre-screening before applying the LLM to more advanced
4.2 AGENTIC AI CORE 69
triage.
LLM-Based Contextual Judgment
Some results needs further analysis even after applying the initial heuristic. The
agent employs contextual reasoning through the LLM to address issues. It forms
a structured prompt using data and metadata of the issue and calls LLM through
llm_client_api.py. The LLM is requested to respond if the problem would pose
a security threat in a production environment or not. The agent interprets the
outcome and derives a binary classification from the LLM response, either true
positive or false positive, and why.
Output and Memory Updates
After evaluation, the agent updates two files in memory and feedback loop:
• true_positive_raw.json: Contains detailed reasoning (including LLM re-
sponses) and the original finding context. This acts as shared memory for
other agents to learn from.
• result.json: Contains only the filtered, confirmed findings, moving forward
for risk scoring and remediation.
False Positive Detection Agent plays a foundational role in the AI triage system. It
ensures that the rest of the pipeline focuses on action
4.2.4 Risk Scoring Agent
The other security findings that remain after the algorithm suppresses false positives
need to be prioritised based on their risk or exploitability. Prioritisation of these
findings by the use of a mix of industry-standard criteria (e.g., CVSS and EPSS)
and custom rule-based logic for non-CVE issues. Risk Scoring Agent, is responsible
4.2 AGENTIC AI CORE 70
for this operation which is deployed in the risk_score_agent.py file. As a result,
every issue receives a final risk rating from the agent after scoring, Critical, High,
Medium, or Low.
Scoring for CVE-Based Issues
For vulnerabilities identified by tools like OWASP DC which map directly to recog-
nised CVEs, the agent appends metadata from CVSS and EPSS.These data are
obtained programmatically through the NIST CVSS API for severity scores, and
FIRST EPSS API for the likelihood of exploitation. The agent then applies a rule-
based classification based on combined thresholds, as can be seen in Table 4.5.
CVSS Score EPSS Score Assigned Risk Level
>=9.0 >0.7 Critical
>=7.0 >=0.5 Medium
>=4.0 >=0.2 High
Else Else Low
Table 4.5: Threshold for CVE-based Risk Scoring
Scoring for Non-CVE Findings
Some scanners like Semgrep and GitLeaks, detect the issues that do not have a CVE
identifier. For such cases, the Risk Scoring Agent uses some custom classification
rules, that consider the type of vulnerability (e.g., hardcoded credentials, insecure
cryptography, unsafe deserialization), File context (e.g., if found in production vs.
test files), and Severity tags reported by the tool itself (e.g., HIGH, INFO).
4.2.5 Risk Explainer Agent
While quantitative risk scores and raw triage are necessary for automated security
processes, they often do not have narrative explanations, especially for managers,
4.2 AGENTIC AI CORE 71
developers, and product owners. Without explanation, security results can be incor-
rectly understood, assigned less weight, or even ignored. This is addressed by the
Risk Explainer Agent, which is scripted in risk_explainer_agent.py and generates
comprehensible, human-readable explanations of all confirmed risks.
The agent reads all of the filtered and scored problems, and formulates a struc-
tured prompt for the Gemini LLM through the llm_client_api.py module based on
the information the agent has gathered from the feedback loop for every problem.In
response, the LLM provides a customised and simple explanation, which is then
parsed by the agent and written back into both memory files. By answering the
implicit “why should I care?” question that lies behind each reported issue, this
agent is a significant contributor to developer enablement.
4.2.6 Remediation Agent
The final component of the agentic AI, the Remediation Agent, is scripted in reme-
diation agent.py and is responsible for context-sensible recommendations for how to
resolve each issue. While many security scanners provide broad recommendations
(such as “delete hardcoded credentials”), they often don’t contain actionable details
that are appropriate to the specific scenario where the vulnerability occurs. In this
case, the remediation agent can deliver accurate remediation that is contextual.
4.2.7 Shared Memory and Feedback Loop
True_positive_raw.json and result.json are two key files that are utilised to build a
shared memory mechanism and feedback loop. The actual positive raw.json file is an
agent-working memory bridge. Every agent contributes its output to this file once
it has finished what it was tasked to do. These files collectively form the system’s
feedback loop. Agents effectively “learn” from one another’s output by reading
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 72
from and writing to True_positive_raw.json, permitting a modular but coordinated
process of reasoning. In agentic AI applications, this feedback also enhances the
overall quality and accuracy of AI-based decisions.
4.3 Output Pipeline and Web Dashboard
The scored and filtered results (result.json) are transmitted to the output pipeline for
storage and visualisation in front-end mode after the Agentic AI Core has completed
its analysis and triage. This stage is required for developers and security teams to be
able to access and use the results. Figure 4.7 shows the flow of result data returned
by Agentic AI for storing and rendering output dashborad.
Figure 4.7: Output Pipeline Data Flow
There are three main purposes of the output pipeline:
1. Deliver the CI/CD pipeline output securely to the dashboard server.
2. Store those results in an orderly, queryable fashion using a backend database.
3. Present the results on a web front-end where users can receive remedial rec-
ommendations, review details, and see summary statistics.
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 73
Figure 4.8 shows the basic structure of result returned by Agentic AI, which then
used by frontend server to serve it to the end-users via interactive dashboard.
Figure 4.8: structure of result_.json file
4.3.1 Secure Result Transmission
The end result.json file is transmitted to the external web service on Render.com
by GHA pipeline using send_to_noesiz.py script. The same script generates a
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 74
16-character random unique hexadecimal identifier (UNIQUE_ID) and generates a
HMAC signature that signs the payload securely using the SHA-256 algorithm and
a shared secret key.
The authenticity of the request is ensured by this cryptographic signature. For
ensuring that only authorized agents can post to the database, the server side verifies
the signature of the incoming request with the same shared key and HMAC. A 200
OK shows successful storage after the POST request is successfully validated at API
server. Following the 200 Ok response from the server, GHA automatically serves
the output link to the user as can be seen in Figure 4.9.
Figure 4.9: Output Redirect in GHA Summary
4.3.2 API Server and Database Integration
The backend web server is hosted with Render.com and built with the Python flask
framework. Following authentication, the JSON payload is received, the relevant
fields are parsed, and the data is put in a PostgreSQL database managed by the same
hosting. It is also possible to load the dashboard view (/GHA/show/unique_id) and
fetch the raw JSON (/GHA/api/unique_id) using the unique ID that indexes the
saved data.
4.3.3 Dashboard Rendering and Visualization
HTML, Tailwind CSS, and Flask template are utilized to build the front-end dash-
board as can be seen in Figure 4.10 and Figure 4.11. The system requests the saved
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 75
result, extracts the relevant fields, and shows the following when a user accesses a
specific report URL (e.g., /GHA/show/5e8f3a......):
• Holistic issue perspective: User can see a list of all issues generated by
security tools on the left, and issues filtered by agentic AI on the right pane.
Also, the details of each issue can be seen by clicking “Details”, and about
how to resolve the issue by clicking on the “Fix” button.
• Data visuals: Pie charts of raw issue distribution by tool, View of validated
risks by risk level, with split of True Positives (TP) and False Positives (FP),
can also be seen.
Figure 4.10: Frontend WebApp: Noesiz Dashboard Page-1
4.3.4 System Integration and Security
The dashboard resides within the noesiz.tech domain, and Cloudflare manages TLS
certificates, DDoS protection, and basic firewall rules to protect from abuse and bot
traffic. The pipeline structure for the entire system is as follows:
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 76
Figure 4.11: Frontend WebApp: Noesiz Dashboard Page-2
• The pipeline is initiated and the Agentic AI Core is executed within the GHA.
• Once result.json is ready, it is securely transmitted to the API server through
send_to_noesiz.py.
• The API Server authenticates and stores the result in PostgreSQL. The fron-
tend dashboard applies dynamic routes to fetch and display the triaged results
live on https://www.noesiz.tech/GHA/show/UNIQUE_ID.
This ensures a seamless shift from code scanning to real-time, actionable, and ob-
servable results. Figure 4.12 shows, how different cloud services are configured to
deliever the end result.
4.3 OUTPUT PIPELINE AND WEB DASHBOARD 77
Figure 4.12: Output Pipeline Configuration
5 Results
The performance evaluation of the system is presented in this chapter. It compares
manual versus agentic triage and contains information on false positive reduction
and classification accuracy.
5.1 Testing Dataset and Criteria
The performance and stability of the suggested Agentic AI system for automated
risk triage and false positive reduction were assessed using a set of publicly available
intentionally vulnerable software repositories. This offered a varied testing envi-
ronment that reflected the security flaws and the coding patterns found in the real
world.
Custom scripts or part of code from total of ten vulnerable open-source appli-
cation codebases were chosen from OWASP’s list of vulnerable applications and
GitHub. Each of these codebases included the known vulnerabilities, common mis-
configurations, and instances designed to trigger the static analysis tools. These
repositories provided a wide range of risk profiles, library usages, and the code
quality, which served as a thorough basis for evaluating the system’s flexibility.
The following metrics have been identified for each dataset:
• The total number of the raw issues reported by all of the tools.
5.2 OVERVIEW OF THE PERFORMANCE EVALUATION PROCESS 79
• The number of issues that were filtered out and labelled as false positives.
• Final true positives are grouped into three risk categories: high, medium, and
low.
• False Positive Reduction Rate (FPRR)
5.2 Overview of the Performance Evaluation Pro-
cess
Using GHA, the pipeline was triggered ten times, once for each vulnerable dataset, to
replicate the actual CI/CD scenarios. Every execution included the Scanning of the
target repository for security flaws. Then running Agentic AI triage process within
the pipeline, which parsed the results, removed low-confidence and false-positives
findings, and applied structured reasoning for scoring and explanation. Final step in
the pipeline is transferring the processed output to the Noesiz web dashboard and
storing it. Lastly, a sample of AI-filtered problems is manually reviewed to evaluate
the accuracy and alignment with the expected results.
The results of the ten runs are summarized in Table 5.1.
Dataset Name Raw
Issues
True
Posi-
tives
False
Posi-
tives
FPRR High Medium Low
DVWA 112 74 38 33.9% 16 39 19
Juice Shop 126 112 14 11.1% 22 83 21
WebGoat 138 92 46 33.33% 23 48 21
NodeGoat 106 70 36 33.96% 14 39 17
Vulnerable Flask App 177 77 40 34.19% 13 42 22
Security Shepherd 129 88 41 31.78% 21 44 23
PythonGoat 114 75 39 34.21% 18 39 18
RailsGoat 121 79 42 34.71% 16 43 20
DSVW 104 70 34 32.69% 12 38 20
Custom Repo 137 91 46 33.58% 19 50 22
Average 126.4 82.8 37.6 31.34% 17.4 46.5 20.3
Table 5.1: Summary of results
5.4 RISK STATISTICS AND CLASSIFICATION SUMMARY 80
5.3 False Positive Reduction Results
Reducing the number of false positives generated by static security scanners which
are known to provide noisy or irrelevant findings in complicated codebases was a
primary objective of this thesis. The evaluation showed that by using contextual
filtering and intelligent reasoning, the suggested Agentic AI system greatly increases
the signal-to-noise ratio in static security analysis.
Key Observations:
• The system’s ability to filter noise without excluding significant findings was
validated by the average false positive reduction rate of roughly 31.34% across
all datasets.
• About 82.8% of the initial alerts have been classified as true positives for
each run. These were then further categorised into High, Medium, and Low
categories using scoring logic embedded in Agentic AI.
• The majority of issues fall into Medium severity (46.5 avg), followed by Low
(20.3 avg) and High (17.4 avg), according to risk distributions, which show a
balanced triage.
• The assessment shows that the system can manage large scanner output vol-
umes while providing dependable, actionable security insights with a signifi-
cant reduction in manual overhead.
5.4 Risk Statistics and Classification Summary
A structured classification of all validated hazards into three main categories, high,
medium, and low based on a combination of static heuristics, EPSS probability,
and CVSS score thresholds is part of the Agentic AI system’s final output. By
5.4 RISK STATISTICS AND CLASSIFICATION SUMMARY 81
prioritising the most important vulnerabilities, developers and security experts may
make sure that remediation efforts are in line with the actual risk impact.
The distribution of risk severity for the remaining issues is shown in Table 5.2
Risk Level Average Count Percentage of total TPs
Critical 0 NA
High 17.4 21.01%
Medium 46.5 56.15%
Low 20.3 24.51%
Total 84.2 100%
Table 5.2: False-Positive Reduction by Tool
This distribution shows that most of the issues that the system retained were
medium-priority risks, which usually involved using obsolete third-party dependen-
cies, insecure APIs, or poor cryptography techniques. Even if they can’t be exploited
right away, these problems present serious long-term security risks if left ignored.
The three integrated tools Semgrep, OWASP Dependency-Check, and GitLeaks,
were also analysed to ascertain the origin of the true positives. A representation of
the segmentation is provided in Table 5.3.
Tool Avg. TPs/Dataset Percentage of Total Tps
Semgrep (SAST) 38.2 47.7%
OWASP DC (SCA) 21.4 26.7%
Gitleaks (Secret Scan) 20.5 25.6%
Table 5.3: Tool-wise Final True Positives and Risk Distribution
Semgrep accounted for the highest number of triaged issues, many of which
were related to insecure code constructs and poor logic vulnerabilities. GitLeaks
was particularly effective in identifying the credential exposures, while OWASP
5.5 EVALUATION OF RISK PRIORITIZATION ACCURACY 82
Dependency-Check identified the third-party libraries with some known vulnera-
bilities and associated metadata.
5.5 Evaluation of Risk Prioritization Accuracy
A manual validation exercise was conducted to assess the precision of the system’s
automated risk classification. A random subset of the filtered issues was reviewed to
evaluate their relevance, severity, and contextual accuracy, based on the established
security standards and practical considerations.
The manual review confirmed that all the issues that the system had labelled as
High were well-prioritized, such as sql injections. Furthermore, it was also found that
the findings in the middle category of risk, such as poor cryptography or default un-
safe configurations—were well-ranked and were having well-supported explanations.
The prioritized findings, when added to the benefits of faster processing and
orderly explanation, tended to have a high level of consistency with manual ex-
pectation. The AI-based triage exhibited extensive explanation that enabled un-
derstanding, especially in edge situations, and showed higher consistency across
concerns with the similar structures.
6 Conclusion
The sheer volume of false positives and the human effort required for sifting through
static analysis results are two of the important problems of modern AppSec, that are
addressed in this thesis. The demand for advanced as well as autonomous triage sys-
tems is increasing as software development pipelines increasingly rely on automated
security tools following DevSecOps culture. To mitigate the key issues, the research
created and implemented a new system that integrates heuristics, planning logic,
shared memory, and LLMs for automatically filtering, classifying, and explaining
security flaws discovered through static analysis.
6.1 Summary of Key Contributions
Several security scanner results, including those from Semgrep (SAST), OWASP
Dependency-Check (SCA), and GitLeaks (secret scanner), were accommodated by
the system’s design, and contributes to :
• The design and implementation of an autonomous AI pipeline with self-directed
decision-making abilities, such as risk assessment, false positive filtering, and
LLM-based contextual reasoning, are some of the research’s key achievements.
• Design of a memory-augmented feedback loop that can facilitate the coordi-
nation among different agents during the filtering, scoring, and explanation
generation stages.
6.2 REVISITING RESEARCH QUESTIONS 84
• The risk assessment logic built on top of best security practices (SSDF, BSIMM,
SAMM) and security industry standards (CVSS, EPSS, OWASP).
• Production-ready dashboard interface design that provides detailed, graphical,
and valuable insights to developers and security professionals, and is directly
connected to and redirected from GHA.
• Empirical evaluation of the system showed benefits in usability compared
to the traditional scanner outputs, a stable risk classification accuracy, and
around 31% false positive reduction rate.
The method is more appropriate for real-world CI/CD environments because it
not only reduces the human effort to triage the security issues but also enhances
visibility and prioritization, and promotes DevSecOps culture.
6.2 Revisiting Research Questions
The research conducted in this thesis was guided by four key questions, each ad-
dressing a critical aspect of improving static security analysis through LLM-driven
intelligent automation. The implementation of the system, its design, and empirical
evaluation all determine the responses to all these questions.
6.2.1 Research Question 1 (RQ1)
How can reasoning agents and LLMs be used to differentiate between
true and false positives in static analysis results?
To assess the findings from static analysis tools, the system developed an agentic
AI framework that combined rule-based reasoning and LLM-driven reasoning. Each
agent performed particular operations, such as cross-referencing metadata (e.g., file
6.2 REVISITING RESEARCH QUESTIONS 85
paths, entropy, test directories), querying LLMs for contextual judgment, and heuris-
tically filtering issues. This combination led to a reliable categorization of issues,
demonstrating that LLMs can reliably assist in distinguishing true security results
from false alarms when used alongside guardrails and well-defined context.
6.2.2 Research Question 2 (RQ2)
How can results from multiple security tools be combined and standard-
ized to present a unified and consistent view of risks?
The solution introduced a normalized ingestion pipeline that gathered raw out-
puts from tools like GitLeaks, OWASP Dependency-Check, and Semgrep into a com-
mon internal format using parsers. Then, the solution relied heavily on the memory
and feedback loop of the Agentic AI. The risk scores, explanations, and corrective
measures for each entry were added as the agentic system processed data consistently
using shared memory structures (result.json and true_positives_raw.json).
6.2.3 Research Question 3 (RQ3)
How can a web-based interface be designed to efficiently support devel-
oper workflows, offering clear visualization of triaged risks and actionable
remediation insights?
The thesis included the development of a custom dashboard accessible via a se-
cure web interface, and directly from the piplines. This interface provided detailed
summaries of the filtered issues, interactive severity breakdowns, tool-wise com-
parisons, and one-click access to the remediation instructions. The structure and
presentation of the data were intentionally designed to support clarity, traceability,
and relevance in a development workflow.
6.4 FUTURE WORK 86
6.2.4 Research Question 4 (RQ4)
To what extent can such a system reduce the overall triage time, minimize
the manual effort, and improve the developer trust and engagement with
security tools?
The solution reduced the number of issues requiring manual inspection by over
31% through automated reasoning, contextual filtering, and risk classification. Man-
ual evaluations in the routine triaging were minimised when manual assessments con-
firmed that AI outputs were consistent with expert expectations. Comprehensive
contextual explanations and organized remediation improved developer workflows,
and with the help of dashboard visibility and integration with CI/CD pipelines, it
improved the usability and adoption.
6.3 Limitations of the Study
While the proposed system demonstrates promising results in risk triage automation
and reduction of false positives, several limitations need to be considered. The
evaluation of the system was limited by a small dataset, which might have affected its
wider relevance. Although the LLM-based reasoning is effective, it still has problems
like overgeneralisation, and its accuracy is currently reliant on the external scanning
technologies and their inherent limitations. Additionally, since the solution depends
on cloud infrastructure and external APIs, it is vulnerable to problems with external
services. Lastly, in the absence of end-user validation, usability and trust findings
are derived from the design hypotheses rather than firsthand user inputs.
6.4 Future Work
The research presented in this thesis introduces a practical and production-grade
implementation of Agentic AI for static security analysis. While the system demon-
6.4 FUTURE WORK 87
strates effective results in reducing false positives and automating triage, there are
still various directions in which this work could be taken technically and academi-
cally.
Domain-Specific Fine-Tuning of Language Models
Currently, the system generates the remediations by reasoning about scan outputs
with general-purpose LLMs. Subsequent versions might require LLMs to be tuned
with domain-specific security facts. Like LLMs specifically trained to solve AppSec
reasoning.
Performance Optimization Using Local LLMs and Rate-Limit
Handling
To interface with LLMs installed on other cloud platforms, the existing system em-
ploys remote APIs, which is higher-latency and is rate-limited to some extent. By
deploying local or on-premise LLMs, organizations could achieve reduced inference
times, improved control over data privacy, and can eliminate third-party usage con-
straints. This would also be enabled for higher-frequency agentic execution cycles,
especially in CI/CD scenarios.
Optimized Handling of Vulnerability Databases (e.g. NVD)
Even through the use of the official API Key, the OWASP Dependency-Check tool
needs to download approximately 300,000+ items per pipeline execution in order to
get vulnerability metadata directly from the NVD. This increases processing time
significantly. More scalable approach would be to establish a local NVDmirror server
which is queried by the pipeline in real time and synchronizes on regular basis. It
would enhance speed, reduce bandwidth usage, and enhance offline reliability.
6.4 FUTURE WORK 88
Broader Application of Agentic AI in Cybersecurity Domains
While risk triage and static security analysis were the primary focus of this thesis, the
basic agentic architecture is extendable to other areas of cybersecurity. Agent-based
reasoning frameworks can be applied to such areas as AI-assisted threat modeling,
incident response, cloud misconfiguration detection (along with self-repair options),
and chaining of vulnerabilities.
Advancing Explainability, Trust, and User Feedback Integra-
tion
Further work is encouraged in building interpretable layers on top of the LLM de-
cisions, especially for compliance-heavy environments. In addition to this, enabling
real-time developer feedback (e.g., accepting or rejecting decisions) could be looped
into the memory system to support continuous learning and adaptation of the agent.
References
[1] A. Petrosyan. “Internet and social media users in the world 2025”, Accessed:
Apr. 1, 2025. [Online]. Available: https://www.statista.com/statistics/
617136/digital-population-worldwide.
[2] McKinsey. “What is digital transformation?”, Accessed: Aug. 7, 2024.
[3] E. C. Eurostat, Digitalisation in Europe (Digitalisation in Europe ...). LU:
Publications Office, 2025. doi: 10.2785/3102705. [Online]. Available: https:
//data.europa.eu/doi/10.2785/3102705.
[4] M. Souppaya, K. Scarfone, and D. Dodson, Secure Software Development
Framework (SSDF) version 1.1:: recommendations for mitigating the risk of
software vulnerabilities. Feb. 2022. doi: 10.6028/nist.sp.800-218. [Online].
Available: http://dx.doi.org/10.6028/NIST.SP.800-218.
[5] M. Paul, The 7 qualities of highly secure software. CRC Press, 2012.
[6] C. Dimastrogiovanni and N. Laranjeiro, “Towards understanding the value
of false positives in static code analysis”, in 2016 Seventh Latin-American
Symposium on Dependable Computing (LADC), IEEE, 2016, pp. 119–122.
[7] M. Nadeem, B. J. Williams, and E. B. Allen, “High false positive detection of
security vulnerabilities: A case study”, in Proceedings of the 50th annual ACM
Southeast Conference, 2012, pp. 359–360.
REFERENCES 90
[8] Z. Guo et al., “Mitigating false positive static analysis warnings: Progress,
challenges, and opportunities”, IEEE Transactions on Software Engineering,
vol. 49, no. 12, pp. 5154–5188, 2023.
[9] Z. D. Wadhams, C. Izurieta, and A. M. Reinhold, “Barriers to using static
application security testing (sast) tools: A literature review”, in Proceedings
of the 39th IEEE/ACM International Conference on Automated Software En-
gineering Workshops, 2024, pp. 161–166.
[10] M. Nachtigall, M. Schlichtig, and E. Bodden, “A large-scale study of usability
criteria addressed by static analysis tools”, New York, NY, USA: Associa-
tion for Computing Machinery, 2022, isbn: 9781450393799. doi: 10.1145/
3533767.3534374. [Online]. Available: https://doi.org/10.1145/3533767.
3534374.
[11] T. Tiensuu, “Devsecops adoption: Improving visibility in application security”,
2022.
[12] S. Singh, “Secure software development life cycle: Implementation challenges
in small and medium enterprises (smes)”, Apr. 2025. doi: 10.22541/au.
174585836.63395541/v1. [Online]. Available: http://dx.doi.org/10.
22541/au.174585836.63395541/v1.
[13] S. Norberg, “Secure software development lifecycle (ssdlc)”, inAdvanced ASP.NET
Core 8 Security: Move Beyond ASP.NET Documentation and Learn Real Se-
curity. Berkeley, CA: Apress, 2024, pp. 417–443. doi: 10.1007/979-8-8688-
0494-6_13. [Online]. Available: https://doi.org/10.1007/979-8-8688-
0494-6_13.
[14] M. Shaikh, P. H. Ali Qureshi, M. Shaikh, Q. A. Arain, A. Zubedi, and P.
Shaikh, “Security paradigms in sdlc requirement phase — a comparative anal-
ysis approach”, in 2021 International Conference on Engineering and Emerg-
REFERENCES 91
ing Technologies (ICEET), 2021, pp. 1–6. doi: 10.1109/ICEET53442.2021.
9659614.
[15] A. H. A. Kamal, C. C. Y. Yen, G. J. Hui, P. S. Ling, and Fatima-tuz-Zahra,
Risk assessment, threat modeling and security testing in sdlc, 2020. arXiv:
2012.07226 [cs.SE]. [Online]. Available: https://arxiv.org/abs/2012.
07226.
[16] M. Dawson, D. Burrell, E. Rahim, and S. Brewster, “Integrating software as-
surance into the software development life cycle (sdlc)”, Journal of Information
Systems Technology and Planning, vol. 3, pp. 49–53, Jan. 2010.
[17] B. Loriot, F. Madeiral, and M. Monperrus, “Styler: Learning formatting con-
ventions to repair checkstyle violations”, Empirical Software Engineering, vol. 27,
no. 6, Aug. 2022, issn: 1573-7616. doi: 10.1007/s10664-021-10107-0. [On-
line]. Available: http://dx.doi.org/10.1007/s10664-021-10107-0.
[18] M. Learn, Threats - Microsoft Threat Modeling Tool - Azure, https://learn.
microsoft.com/en- us/azure/security/develop/threat- modeling-
tool-threats, 25-08-2022.
[19] L. Conklin, Threat Modeling Process | OWASP Foundation, https://owasp.
org/www-community/Threat_Modeling_Process.
[20] EC-Council, DREAD Threat Modeling: An Introduction to Qualitative Risk
Analysis — eccouncil.org, https://www.eccouncil.org/cybersecurity-
exchange/threat-intelligence/dread-threat-modeling-intro/, 2022.
[21] T. UcedaVelez and M. M. Morana, “Intro to pasta”, in Risk Centric Threat
Modeling: Process for Attack Simulation and Threat Analysis. 2015, pp. 317–
342. doi: 10.1002/9781118988374.ch6.
[22] Triage | GitLab Docs — docs.gitlab.com, https://docs.gitlab.com/user/
application_security/triage/.
REFERENCES 92
[23] J. Nagarajan, How To Perform a Cybersecurity Risk Assessment | CrowdStrike
— crowdstrike.com, https://www.crowdstrike.com/en-us/cybersecurity-
101/advisory-services/cybersecurity-risk-assessment/, 2024.
[24] P. Vishwakarma, Combining CVSS and EPSS to prioritize vulnerability | Sec-
Ops® Solution — secopsolution.com, https://www.secopsolution.com/
blog/combining-cvss-and-epss-to-prioritize-vulnerability, 2023.
[25] A. Miles, Triage Your Cloud Security: Risk Prioritization Methods — cyber-
ark.com, https://www.cyberark.com/resources/blog/triage- your-
cloud-security-risk-prioritization-methods, 2024.
[26] P. M. Mell, T. Bergeron, and D. Henning, Creating a patch and vulnerability
management program: en, 2005. doi: https://doi.org/10.6028/NIST.SP.
800-40ver2.
[27] M. Parkin, The Hidden Costs of Poor Risk Prioritization — balbix.com, https:
//www.balbix.com/blog/drowning-in-vulnerabilities-the-hidden-
costs-of-poor-risk-prioritization/, 2023.
[28] State of Software Security 2025: A New View of Maturity | Veracode — ve-
racode.com, https://www.veracode.com/resources/analyst-reports/
state-of-software-security-2025/, 2025.
[29] A. G. Bardas et al., “Static code analysis”, Journal of Information Systems &
Operations Management, vol. 4, no. 2, pp. 99–107, 2010.
[30] D. B. Cruz, J. R. Almeida, and J. L. Oliveira, “Open source solutions for vul-
nerability assessment: A comparative analysis”, IEEE Access, vol. 11, pp. 100 234–
100 255, 2023. doi: 10.1109/ACCESS.2023.3315595.
[31] G. Staff, Octoverse: AI leads Python to top language as the number of global
developers surges — github.blog, https://github.blog/news-insights/
octoverse/octoverse-2024/, 2024.
REFERENCES 93
[32] Synopsys, Synopsys 2024 Open Source Security and Risk Analysis Report,
https://static.carahsoft.com/concrete/files/1617/1597/8665/2024_
Open_Source_Security_and_Risk_Analysis_Report_WRAPPED.pdf, 2024.
[33] P. Kemppainen, “Managing 3rd party software components with software bill
of materials”, 2023.
[34] A. Volkova, “Modern methods of automated software security analysis: From
static analysis to comprehensive approach”, EUROPEAN JOURNAL OF NAT-
URAL HISTORY, p. 10, 2024.
[35] D. Stefanovic, D. Nikolic, D. Dakic, I. Spasojevic, and S. Ristic, “Static code
analysis tools: A systematic literature review”, in Proceedings of the 31st In-
ternational DAAAM Symposium 2020. DAAAM International Vienna, 2020,
pp. 0565–0573. doi: 10.2507/31st.daaam.proceedings.078. [Online]. Avail-
able: http://dx.doi.org/10.2507/31ST.DAAAM.PROCEEDINGS.078.
[36] SARIF Home — sarifweb.azurewebsites.net, https://sarifweb.azurewebsites.
net/.
[37] O. Security, 2022 cloud security alert fatigue report, https://orca.security/
wp - content / uploads / 2022 / 03 / Orca - 2022 - Cloud - Security - Alert -
Fatigue-Report.pdf, 2022.
[38] A. H. Jerónimo, P. M. Moreno, J. A. V. Camacho, and G. C. Vega, “Techniques
of sast tools in the early stages of secure software development: A systematic
literature review”, in 2024 IEEE International Conference on Engineering Ve-
racruz (ICEV), 2024, pp. 1–8. doi: 10.1109/ICEV63254.2024.10766004.
[39] H. J. Choi, H. Lee, and J.-Y. Choi, “Is a false positive really false positive?”, in
2022 24th International Conference on Advanced Communication Technology
(ICACT), 2022, pp. 145–149. doi: 10.23919/ICACT53585.2022.9728948.
REFERENCES 94
[40] H. Muthukrishnan, V. Viradia, and D. Yadav, “Unified ai and ml framework
in devsecops practices, solving real-world problems”, in SoutheastCon 2025,
IEEE, Mar. 2025, pp. 1250–1257. doi: 10.1109/southeastcon56624.2025.
10971458. [Online]. Available: http://dx.doi.org/10.1109/SoutheastCon56624.
2025.10971458.
[41] R. Kullberg, Identifying and Mitigating False Positive Alerts - Panther | The
Security Monitoring Platform for the Cloud, https://panther.com/blog/
identifying-and-mitigating-false-positive-alerts, 11-04-2024.
[42] G. Liargkovas, E. Panourgia, and D. Spinellis, Quieting the static: A study of
static analysis alert suppressions, 2023. arXiv: 2311.07482 [cs.SE]. [Online].
Available: https://arxiv.org/abs/2311.07482.
[43] I. Jaoua, O. B. Sghaier, and H. Sahraoui, Combining large language models
with static analyzers for code review generation, 2025. arXiv: 2502.06633
[cs.SE]. [Online]. Available: https://arxiv.org/abs/2502.06633.
[44] P. Li et al., Automated static vulnerability detection via a holistic neuro-
symbolic approach, 2025. arXiv: 2504 . 16057 [cs.CR]. [Online]. Available:
https://arxiv.org/abs/2504.16057.
[45] B. Aloraini, M. Nagappan, D. M. German, S. Hayashi, and Y. Higo, “An
empirical study of security warnings from static application security testing
tools”, Journal of Systems and Software, vol. 158, p. 110 427, Dec. 2019, issn:
0164-1212. doi: 10.1016/j.jss.2019.110427. [Online]. Available: http:
//dx.doi.org/10.1016/j.jss.2019.110427.
[46] P. Pavan, “Adaptive application security testing with ai automation”, Inter-
national Journal of AI, BigData, Computational and Management Studies,
vol. 4, no. 1, pp. 55–63, 2023.
REFERENCES 95
[47] M. F. Ansari, B. Dash, P. Sharma, and N. Yathiraju, “The impact and limita-
tions of artificial intelligence in cybersecurity: A literature review”, IJARCCE,
vol. 11, no. 9, 2022, issn: 2278-1021. doi: 10.17148/ijarcce.2022.11912.
[Online]. Available: http://dx.doi.org/10.17148/IJARCCE.2022.11912.
[48] M. Keltek, R. Hu, M. F. Sani, and Z. Li, Boosting cybersecurity vulnerability
scanning based on llm-supported static application security testing, 2024. arXiv:
2409.15735 [cs.CR]. [Online]. Available: https://arxiv.org/abs/2409.
15735.
[49] N. Kshetri, “Transforming cybersecurity with agentic ai to combat emerging
cyber threats”, Telecommunications Policy, vol. 49, no. 6, p. 102 976, 2025,
issn: 0308-5961. doi: https://doi.org/10.1016/j.telpol.2025.102976.
[Online]. Available: https://www.sciencedirect.com/science/article/
pii/S0308596125000734.
[50] Agentic AI (and AI Agents) — cyberark.com, https://www.cyberark.com/
what-is/agentic-ai-and-ai-agents/.
[51] S. Nawaz, Agentic AI vs AI Agents: 9 Key Differences — ampcome.com,
https : / / www . ampcome . com / post / agentic - ai - vs - ai - agents - a -
detailed-comparison, 12,12,2024.
[52] RedHat, What is agentic AI?, https://www.redhat.com/en/topics/ai/
what-is-agentic-ai, 6-11-2024.
[53] L. Columbus, Cybersecurity at AI speed: How agentic AI is supercharging
SOC teams in 2025 — venturebeat.com, https://venturebeat.com/ai/
cybersecurity-at-ai-speed-agentic-ai-supercharging-soc/, 24-02-
2025.
[54] E. Lisowski, AI Agents vs Agentic AI: What’s the Difference and Why Does
It Matter?, https://medium.com/@elisowski/ai-agents-vs-agentic-
REFERENCES 96
ai- whats- the- difference- and- why- does- it- matter- 03159ee8c2b4,
29-12-2024.
[55] M. Chiodi, Part 1 - Agentic AI in Cybersecurity: Closing the Last Mile in
Identity Security. https://www.cerby.com/resources/blog/agentic-ai-
in-cybersecurity, 5-02-2025.
[56] C. T. Brayda, Agentic AI Vs. AI Agents: Shaping The Future Of Cybersecurity
— forbes.com, https://www.forbes.com/councils/forbestechcouncil/
2025/04/14/agentic- ai- vs- ai- agents- shaping- the- future- of-
cybersecurity/, 14-04-2025.
[57] M. A. Ferrag, N. Tihanyi, and M. Debbah, From llm reasoning to autonomous
ai agents: A comprehensive review, 2025. arXiv: 2504.19678 [cs.AI]. [On-
line]. Available: https://arxiv.org/abs/2504.19678.
[58] P. Mell, K. Scarfone, S. Romanosky, et al., “A complete guide to the common
vulnerability scoring system version 2.0”, in Published by FIRST-forum of
incident response and security teams, vol. 1, 2007, p. 23.
[59] cve.org, About the cve program, https://www.cve.org/About/Overview.
[60] EUVD, About European Union Vulnerability Database (EUVD), https://
euvd.enisa.europa.eu/about.
[61] M. Albab, Severity vs risk : The limitations of cvss, Jan. 2025. [Online]. Avail-
able: http://essay.utwente.nl/106247/.
[62] first.org, Exploit Prediction Scoring System (EPSS), https://www.first.
org/epss/.
[63] M. Souppaya, K. Scarfone, and D. Dodson, Secure Software Development
Framework (SSDF) version 1.1 -recommendations for mitigating the risk of
software vulnerabilities. Feb. 2022. doi: 10.6028/nist.sp.800-218. [Online].
Available: http://dx.doi.org/10.6028/NIST.SP.800-218.
REFERENCES 97
[64] M. Learn, Microsoft Security Development Lifecycle (SDL) - Microsoft Ser-
vice Assurance, https : / / learn . microsoft . com / en - us / compliance /
assurance/assurance-microsoft-security-development-lifecycle, 23-
05-2024.
[65] OWASP, OWASP software assurance maturity model (SAMM), https://
owaspsamm.org/model/.
[66] BlackDuck, What Is the BSIMM and How Does It Work?, https://www.
blackduck.com/glossary/what-is-bsimm.html.
[67] owasp.org, OWASP Top 10:2021, https://owasp.org/Top10/, 2021.