Grammatical Error Correction Using Large Language Models: A Case Study on Universal Dependencies Treebanks
Jalali, Arvin (2025-06-23)
Grammatical Error Correction Using Large Language Models: A Case Study on Universal Dependencies Treebanks
Jalali, Arvin
(23.06.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025062674959
https://urn.fi/URN:NBN:fi-fe2025062674959
Tiivistelmä
This thesis addresses Grammatical Error Correction (GEC) through two phases. The first phase investigates the use of Universal Dependencies (UD), a cross-linguistically consistent framework for syntactic annotation, particularly focusing on the Typo=Yes feature, to support error analysis in GEC. Tokens marked with Typo=Yes were extracted from three UD treebanks, including UD English EWT, UD English GUM, and UD Finnish TDT, and manually annotated based on the criteria of the ERRANT framework, which is designed to classify grammatical errors consistently. This enabled detailed cross-dataset and cross-linguistic error analysis.
The second phase evaluates the ability of a Large Language Model (LLM) to classify grammatical errors using structured prompts based on the ERRANT framework. Both zero-shot and few-shot prompting techniques were applied, and the LLM's performance was compared against manually annotated gold standards developed during the first phase. This work aims to bridge linguistic annotation frameworks and neural language models to advance GEC systems.
The second phase evaluates the ability of a Large Language Model (LLM) to classify grammatical errors using structured prompts based on the ERRANT framework. Both zero-shot and few-shot prompting techniques were applied, and the LLM's performance was compared against manually annotated gold standards developed during the first phase. This work aims to bridge linguistic annotation frameworks and neural language models to advance GEC systems.