Grammatical Error Correction Using Large Language Models: A Case Study on Universal Dependencies Treebanks

dc.contributor.authorJalali, Arvin
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|
dc.contributor.facultyfi=Teknillinen tiedekunta|en=Faculty of Technology|
dc.contributor.studysubjectfi=Tietotekniikka|en=Information and Communication Technology|
dc.date.accessioned2025-06-26T21:06:26Z
dc.date.available2025-06-26T21:06:26Z
dc.date.issued2025-06-23
dc.description.abstractThis thesis addresses Grammatical Error Correction (GEC) through two phases. The first phase investigates the use of Universal Dependencies (UD), a cross-linguistically consistent framework for syntactic annotation, particularly focusing on the Typo=Yes feature, to support error analysis in GEC. Tokens marked with Typo=Yes were extracted from three UD treebanks, including UD English EWT, UD English GUM, and UD Finnish TDT, and manually annotated based on the criteria of the ERRANT framework, which is designed to classify grammatical errors consistently. This enabled detailed cross-dataset and cross-linguistic error analysis. The second phase evaluates the ability of a Large Language Model (LLM) to classify grammatical errors using structured prompts based on the ERRANT framework. Both zero-shot and few-shot prompting techniques were applied, and the LLM's performance was compared against manually annotated gold standards developed during the first phase. This work aims to bridge linguistic annotation frameworks and neural language models to advance GEC systems.
dc.format.extent111
dc.identifier.olddbid199459
dc.identifier.oldhandle10024/182490
dc.identifier.urihttps://www.utupub.fi/handle/11111/20102
dc.identifier.urnURN:NBN:fi-fe2025062674959
dc.language.isoeng
dc.rightsfi=Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.|en=This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.|
dc.rights.accessrightsavoin
dc.source.identifierhttps://www.utupub.fi/handle/10024/182490
dc.subjectGrammatical Error Correction, Universal Dependencies, ERRANT Framework, Large Language Models, Prompt Engineering
dc.titleGrammatical Error Correction Using Large Language Models: A Case Study on Universal Dependencies Treebanks
dc.type.ontasotfi=Diplomityö|en=Master's thesis|

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Jalali_Arvin_Thesis.pdf
Size:
975.41 KB
Format:
Adobe Portable Document Format