RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature
| dc.contributor.author | Nastou, Katerina | |
| dc.contributor.author | Mehryary, Farrokh | |
| dc.contributor.author | Ohta, Tomoko | |
| dc.contributor.author | Luoma, Jouni | |
| dc.contributor.author | Pyysalo, Sampo | |
| dc.contributor.author | Jensen, Lars Juhl | |
| dc.contributor.organization | fi=data-analytiikka|en=Data-analytiikka| | |
| dc.contributor.organization | fi=tietotekniikan laitos|en=Department of Computing| | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.68940835793 | |
| dc.contributor.organization-code | 1.2.246.10.2458963.20.85312822902 | |
| dc.converis.publication-id | 458222413 | |
| dc.converis.url | https://research.utu.fi/converis/portal/Publication/458222413 | |
| dc.date.accessioned | 2025-08-28T02:55:43Z | |
| dc.date.available | 2025-08-28T02:55:43Z | |
| dc.description.abstract | In the field of biomedical text mining, the ability to extract relations from the literature is crucial for advancing both theoretical research and practical applications. There is a notable shortage of corpora designed to enhance the extraction of multiple types of relations, particularly focusing on proteins and protein-containing entities such as complexes and families, as well as chemicals. In this work, we present RegulaTome, a corpus that overcomes the limitations of several existing biomedical relation extraction (RE) corpora, many of which concentrate on single-type relations at the sentence level. RegulaTome stands out by offering 16 961 relations annotated in >2500 documents, making it the most extensive dataset of its kind to date. This corpus is specifically designed to cover a broader spectrum of >40 relation types beyond those traditionally explored, setting a new benchmark in the complexity and depth of biomedical RE tasks. Our corpus both broadens the scope of detected relations and allows for achieving noteworthy accuracy in RE. A transformer-based model trained on this corpus has demonstrated a promising F1-score (66.6%) for a task of this complexity, underscoring the effectiveness of our approach in accurately identifying and categorizing a wide array of biological relations. This achievement highlights RegulaTome's potential to significantly contribute to the development of more sophisticated, efficient, and accurate RE systems to tackle biomedical tasks. Finally, a run of the trained RE system on all PubMed abstracts and PMC Open Access full-text documents resulted in >18 million relations, extracted from the entire biomedical literature. | |
| dc.identifier.jour-issn | 1758-0463 | |
| dc.identifier.olddbid | 209937 | |
| dc.identifier.oldhandle | 10024/192964 | |
| dc.identifier.uri | https://www.utupub.fi/handle/11111/50001 | |
| dc.identifier.url | https://doi.org/10.1093/database/baae095 | |
| dc.identifier.urn | URN:NBN:fi-fe2025082792547 | |
| dc.language.iso | en | |
| dc.okm.affiliatedauthor | Mehryary, Farrokh | |
| dc.okm.affiliatedauthor | Luoma, Jouni | |
| dc.okm.affiliatedauthor | Pyysalo, Sampo | |
| dc.okm.discipline | 113 Computer and information sciences | en_GB |
| dc.okm.discipline | 3111 Biomedicine | en_GB |
| dc.okm.discipline | 113 Tietojenkäsittely ja informaatiotieteet | fi_FI |
| dc.okm.discipline | 3111 Biolääketieteet | fi_FI |
| dc.okm.internationalcopublication | international co-publication | |
| dc.okm.internationality | International publication | |
| dc.okm.type | A1 ScientificArticle | |
| dc.publisher | OXFORD UNIV PRESS | |
| dc.publisher.country | United Kingdom | en_GB |
| dc.publisher.country | Britannia | fi_FI |
| dc.publisher.country-code | GB | |
| dc.publisher.place | OXFORD | |
| dc.relation.articlenumber | baae095 | |
| dc.relation.doi | 10.1093/database/baae095 | |
| dc.relation.ispartofjournal | Database: The Journal of Biological Databases and Curation | |
| dc.relation.volume | 2024 | |
| dc.source.identifier | https://www.utupub.fi/handle/10024/192964 | |
| dc.title | RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature | |
| dc.year.issued | 2024 |
Tiedostot
1 - 1 / 1