Question Answering models for information extraction from perovskite materials science literature
Sipilä, Matilda; Mehryary, Farrokh; Pyysalo, Sampo; Ginter, Filip; Todorović, Milica
https://urn.fi/URN:NBN:fi-fe202601217125
Tiivistelmä
Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a Question Answering (QA) approach to extract material-property relationships from scientific publications. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatSciBERT and F1-scores improved on the current state-of-the-art. QA also outperformed three latest generative large language models on the information extraction task, except the GPT-4 model. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity and versatility of the QA approach all point to its considerable potential for text-driven discoveries in materials research.
Kokoelmat
- Rinnakkaistallenteet [29335]
