How Far Can a Few Shots Take? Exploring Few-Shot Learning in Finnish Text Classification Through Sentence Transformer Fine-Tuning
Salmela, Anna (2025-05-21)
How Far Can a Few Shots Take? Exploring Few-Shot Learning in Finnish Text Classification Through Sentence Transformer Fine-Tuning
Salmela, Anna
(21.05.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025053056451
https://urn.fi/URN:NBN:fi-fe2025053056451
Tiivistelmä
With natural language processing solutions on the rise, language models are getting larger with the number of parameters measured in billions, while using more and more data. In addition to this, both training a text classification model and using it later for inference can require significant computational resources. Fine-tuning language models has for a long time been a great way of adapting said models into specific domains, but they usually need significant amounts of labelled data to succeed. In this thesis, I examine the capabilities of few-shot learning by fine-tuning sentence embedding models for text classification with artificially restricted datasets created from benchmarked Finnish data to see how well considerably lighter models with fewer data perform compared to state-of-the-art solutions.
As the main method, this thesis explores few-shot learning by using the SetFit library as a way to fine-tune sentence embedding models for text classification. SetFit enables the use of extremely small datasets for training, and dataset sizes of 8, 16, 32 and 64 samples per label are tested. The analysis includes comparing the results from several fine-tuned models, including both monolingual and multilingual sentence embedding models, with varying tasks: multilabel register (genre) classification, multilabel toxicity detection, multiclass news category classification and multiclass discussion forum topic classification.
Even though state-of-the-art results are not reached by fine-tuning sentence embedding models, SetFit shows promise especially in the multiclass prediction tasks. While the benchmark results are higher, SetFit achieves decent model performance with smaller datasets. In some cases, it looks like 32 or even 16 examples per label might be enough to get the most out of this method. From the different sentence embedding models tested, the 125M parameter monolingual Finnish one fares the best in all tasks when fine-tuned with SetFit.
The results of this thesis are promising for use cases where the amount of data and computational resources are limited. To my knowledge, this is the first time SetFit has been studied with Finnish data. Previously, Finnish few-shot classification has been tested with the aid of large language models, thus requiring significant computational resources. Compared to these methods, SetFit is very light to use and could lower the experimentation threshold for text classification tasks.
As the main method, this thesis explores few-shot learning by using the SetFit library as a way to fine-tune sentence embedding models for text classification. SetFit enables the use of extremely small datasets for training, and dataset sizes of 8, 16, 32 and 64 samples per label are tested. The analysis includes comparing the results from several fine-tuned models, including both monolingual and multilingual sentence embedding models, with varying tasks: multilabel register (genre) classification, multilabel toxicity detection, multiclass news category classification and multiclass discussion forum topic classification.
Even though state-of-the-art results are not reached by fine-tuning sentence embedding models, SetFit shows promise especially in the multiclass prediction tasks. While the benchmark results are higher, SetFit achieves decent model performance with smaller datasets. In some cases, it looks like 32 or even 16 examples per label might be enough to get the most out of this method. From the different sentence embedding models tested, the 125M parameter monolingual Finnish one fares the best in all tasks when fine-tuned with SetFit.
The results of this thesis are promising for use cases where the amount of data and computational resources are limited. To my knowledge, this is the first time SetFit has been studied with Finnish data. Previously, Finnish few-shot classification has been tested with the aid of large language models, thus requiring significant computational resources. Compared to these methods, SetFit is very light to use and could lower the experimentation threshold for text classification tasks.