Question Answering models for information extraction from perovskite materials science literature

dc.contributor.authorSipilä, Matilda
dc.contributor.authorMehryary, Farrokh
dc.contributor.authorPyysalo, Sampo
dc.contributor.authorGinter, Filip
dc.contributor.authorTodorović, Milica
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=materiaalitekniikka|en=Materials Engineering|
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.contributor.organization-code1.2.246.10.2458963.20.80931480620
dc.converis.publication-id505920997
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/505920997
dc.date.accessioned2026-01-21T14:55:06Z
dc.date.available2026-01-21T14:55:06Z
dc.description.abstract<p>Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a Question Answering (QA) approach to extract material-property relationships from scientific publications. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatSciBERT and F1-scores improved on the current state-of-the-art. QA also outperformed three latest generative large language models on the information extraction task, except the GPT-4 model. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity and versatility of the QA approach all point to its considerable potential for text-driven discoveries in materials research.<br></p>
dc.identifier.eissn2662-4443
dc.identifier.olddbid213874
dc.identifier.oldhandle10024/196892
dc.identifier.urihttps://www.utupub.fi/handle/11111/56045
dc.identifier.urlhttps://doi.org/10.1038/s43246-025-00979-w
dc.identifier.urnURN:NBN:fi-fe202601217125
dc.language.isoen
dc.okm.affiliatedauthorSipilä, Matilda
dc.okm.affiliatedauthorMehryary, Farrokh
dc.okm.affiliatedauthorPyysalo, Sampo
dc.okm.affiliatedauthorGinter, Filip
dc.okm.affiliatedauthorTodorovic, Milica
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline216 Materials engineeringen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline216 Materiaalitekniikkafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSpringer Science and Business Media LLC
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.articlenumber260
dc.relation.doi10.1038/s43246-025-00979-w
dc.relation.ispartofjournalCommunications materials
dc.relation.volume6
dc.source.identifierhttps://www.utupub.fi/handle/10024/196892
dc.titleQuestion Answering models for information extraction from perovskite materials science literature
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s43246-025-00979-w.pdf
Size:
1.85 MB
Format:
Adobe Portable Document Format