Evaluating LLM-Based Cypher Query Generation from Natural Language over a CPQ Data Knowledge Graph

dc.contributor.authorNuutila, Akseli
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|
dc.contributor.facultyfi=Teknillinen tiedekunta|en=Faculty of Technology|
dc.contributor.studysubjectfi=Tietotekniikka|en=Information and Communication Technology|
dc.date.accessioned2025-12-16T22:02:56Z
dc.date.available2025-12-16T22:02:56Z
dc.date.issued2025-12-12
dc.description.abstractThe structured configuration data used in Configure-Price-Quote (CPQ) systems is often difficult for users to access without substantial knowledge of formal query languages. This creates barriers to exploration, even for domain experts. Recent advances in large language models (LLMs) raise the question of whether natural language interfaces can support accurate querying of such structured data. This thesis evaluates the feasibility of generating Cypher queries from natural language questions for a large-scale CPQ knowledge graph. A Neo4j knowledge graph was constructed from real CPQ data, and an evaluation pipeline was implemented to test multiple LLM configurations. Two query sets were used for the evaluation: one requiring only an understanding of the knowledge graph schema, and another requiring additional domain-specific knowledge, supplied either as a large static text file or through a retrieval-based (RAG) context construction approach.In the controlled evaluation presented in this thesis, GPT-5-mini was able to generate correct Cypher queries for nearly all schema-based test cases. For domain-context-augmented tasks, the evaluated configurations produced widely varying results. The best-performing combinations of few-shot prompting and retrieval-based context achieved high accuracy, reduced prompt size, and enabled a more maintainable prompting strategy. These findings demonstrate that LLM-based NL-to-Cypher generation is viable for complex CPQ data when appropriate context and prompting methods are employed. However, erroneous outputs still occurred occasionally, highlighting the need for validation mechanisms before such systems can be reliably deployed.
dc.format.extent117
dc.identifier.olddbid211680
dc.identifier.oldhandle10024/194699
dc.identifier.urihttps://www.utupub.fi/handle/11111/16983
dc.identifier.urnURN:NBN:fi-fe20251216120610
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightsavoin
dc.source.identifierhttps://www.utupub.fi/handle/10024/194699
dc.subjectLarge Language Models, Natural Language Querying, Knowledge Base Question Answering, Cypher Query Generation, Knowledge Graphs, Retrieval-Augmented Generation, CPQ Systems
dc.titleEvaluating LLM-Based Cypher Query Generation from Natural Language over a CPQ Data Knowledge Graph
dc.type.ontasotfi=Diplomityö|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Nuutila_Akseli_opinnayte.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format