Grammatical Error Correction Using Large Language Models: A Case Study on Universal Dependencies Treebanks

avoin
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Lataukset2028

Verkkojulkaisu

DOI

Tiivistelmä

This thesis addresses Grammatical Error Correction (GEC) through two phases. The first phase investigates the use of Universal Dependencies (UD), a cross-linguistically consistent framework for syntactic annotation, particularly focusing on the Typo=Yes feature, to support error analysis in GEC. Tokens marked with Typo=Yes were extracted from three UD treebanks, including UD English EWT, UD English GUM, and UD Finnish TDT, and manually annotated based on the criteria of the ERRANT framework, which is designed to classify grammatical errors consistently. This enabled detailed cross-dataset and cross-linguistic error analysis. The second phase evaluates the ability of a Large Language Model (LLM) to classify grammatical errors using structured prompts based on the ERRANT framework. Both zero-shot and few-shot prompting techniques were applied, and the LLM's performance was compared against manually annotated gold standards developed during the first phase. This work aims to bridge linguistic annotation frameworks and neural language models to advance GEC systems.

item.page.okmtext