A Resource-Efficient Codebook-Driven Semantic Structuring Pipeline for Human-AI Dialogue in Ambient Intelligent Systems
Pysyvä osoite
Verkkojulkaisu
Tiivistelmä
Human–AI dialogue in ambient intelligent systems is increasingly relying on large language models (LLMs). When questions are generated dynamically to enable personalized and context-aware interactions, variations in phrasing and topical focus exist between conversations. Without structured organization, which is often extremely resource-intensive, conversational data remains fragmented and cannot be reliably used for systematic analysis or reporting. This study proposes a semantic structuring pipeline to map LLM-generated questions to shared codes, sub-themes, and themes using a predefined codebook. This multi-stage pipeline applies semantic screening, factor-based scoring, mathematical aggregation, and validation checks, supported by locally deployed LLMs and manual confirmation. The pipeline was evaluated on 6,030 question–response pairs collected from dynamic interviews across three research objectives. The framework achieved an overall mapping accuracy of 97% while reducing hallucinated semantic matches to 1.2% through layered validation. The results indicate that the framework effectively reduces hallucinated matches and improves mapping accuracy while remaining computationally efficient for private local deployment.