Evaluating LLM-Based Cypher Query Generation from Natural Language over a CPQ Data Knowledge Graph

Nuutila, Akseli

Evaluating LLM-Based Cypher Query Generation from Natural Language over a CPQ Data Knowledge Graph

dc.contributor.author	Nuutila, Akseli
dc.contributor.department	fi=Tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.faculty	fi=Teknillinen tiedekunta\|en=Faculty of Technology\|
dc.contributor.studysubject	fi=Tietotekniikka\|en=Information and Communication Technology\|
dc.date.accessioned	2025-12-16T22:02:56Z
dc.date.available	2025-12-16T22:02:56Z
dc.date.issued	2025-12-12
dc.description.abstract	The structured configuration data used in Configure-Price-Quote (CPQ) systems is often difficult for users to access without substantial knowledge of formal query languages. This creates barriers to exploration, even for domain experts. Recent advances in large language models (LLMs) raise the question of whether natural language interfaces can support accurate querying of such structured data. This thesis evaluates the feasibility of generating Cypher queries from natural language questions for a large-scale CPQ knowledge graph. A Neo4j knowledge graph was constructed from real CPQ data, and an evaluation pipeline was implemented to test multiple LLM configurations. Two query sets were used for the evaluation: one requiring only an understanding of the knowledge graph schema, and another requiring additional domain-specific knowledge, supplied either as a large static text file or through a retrieval-based (RAG) context construction approach.In the controlled evaluation presented in this thesis, GPT-5-mini was able to generate correct Cypher queries for nearly all schema-based test cases. For domain-context-augmented tasks, the evaluated configurations produced widely varying results. The best-performing combinations of few-shot prompting and retrieval-based context achieved high accuracy, reduced prompt size, and enabled a more maintainable prompting strategy. These findings demonstrate that LLM-based NL-to-Cypher generation is viable for complex CPQ data when appropriate context and prompting methods are employed. However, erroneous outputs still occurred occasionally, highlighting the need for validation mechanisms before such systems can be reliably deployed.
dc.format.extent	117
dc.identifier.olddbid	211680
dc.identifier.oldhandle	10024/194699
dc.identifier.uri	https://www.utupub.fi/handle/11111/16983
dc.identifier.urn	URN:NBN:fi-fe20251216120610
dc.language.iso	eng
dc.rights	fi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.\|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.\|
dc.rights.accessrights	avoin
dc.source.identifier	https://www.utupub.fi/handle/10024/194699
dc.subject	Large Language Models, Natural Language Querying, Knowledge Base Question Answering, Cypher Query Generation, Knowledge Graphs, Retrieval-Augmented Generation, CPQ Systems
dc.title	Evaluating LLM-Based Cypher Query Generation from Natural Language over a CPQ Data Knowledge Graph
dc.type.ontasot	fi=Diplomityö\|en=Master's thesis\|

Tiedostot

Näytetään 1 - 1 / 1

Name:: Nuutila_Akseli_opinnayte.pdf
Size:: 1.74 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Pro gradu -tutkielmat ja diplomityöt sekä syventävien opintojen opinnäytetyöt (kokotekstit)