Question Answering models for information extraction from perovskite materials science literature
| dc.contributor.author | Sipilä, Matilda | |
| dc.contributor.author | Mehryary, Farrokh | |
| dc.contributor.author | Pyysalo, Sampo | |
| dc.contributor.author | Ginter, Filip | |
| dc.contributor.author | Todorović, Milica | |
| dc.contributor.organization | fi=data-analytiikka|en=Data-analytiikka| | |
| dc.contributor.organization | fi=materiaalitekniikka|en=Materials Engineering| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.68940835793 | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.80931480620 | |
| dc.converis.publication-id | 505920997 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/505920997 | |
| dc.date.accessioned | 2026-01-21T14:55:06Z | |
| dc.date.available | 2026-01-21T14:55:06Z | |
| dc.description.abstract | <p>Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a Question Answering (QA) approach to extract material-property relationships from scientific publications. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatSciBERT and F1-scores improved on the current state-of-the-art. QA also outperformed three latest generative large language models on the information extraction task, except the GPT-4 model. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity and versatility of the QA approach all point to its considerable potential for text-driven discoveries in materials research.<br></p> | |
| dc.identifier.eissn | 2662-4443 | |
| dc.identifier.olddbid | 213874 | |
| dc.identifier.oldhandle | 10024/196892 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/56045 | |
| dc.identifier.url | https://doi.org/10.1038/s43246-025-00979-w | |
| dc.identifier.urn | URN:NBN:fi-fe202601217125 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Sipilä, Matilda | |
| dc.okm.affiliatedauthor | Mehryary, Farrokh | |
| dc.okm.affiliatedauthor | Pyysalo, Sampo | |
| dc.okm.affiliatedauthor | Ginter, Filip | |
| dc.okm.affiliatedauthor | Todorovic, Milica | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 216 Materials engineering | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.discipline | 216 Materiaalitekniikka | fi_FI |
| dc.okm.internationalcopublication | not an international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A1 ScientificArticle | |
| dc.publisher | Springer Science and Business Media LLC | |
| dc.publisher.country | United Kingdom | en_GB |
| dc.publisher.country | Britannia | fi_FI |
| dc.publisher.country-code | GB | |
| dc.relation.articlenumber | 260 | |
| dc.relation.doi | 10.1038/s43246-025-00979-w | |
| dc.relation.ispartofjournal | Communications materials | |
| dc.relation.volume | 6 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/196892 | |
| dc.title | Question Answering models for information extraction from perovskite materials science literature | |
| dc.year.issued | 2025 |
Tiedostot
1 - 1 / 1