Hyppää sisältöön
    • Suomeksi
    • In English
  • Suomeksi
  • In English
  • Kirjaudu
Näytä aineisto 
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
  •   Etusivu
  • 3. UTUCris-artikkelit
  • Rinnakkaistallenteet
  • Näytä aineisto
JavaScript is disabled for your browser. Some features of this site may not work without it.

Lifestyle factors in the biomedical literature: An ontology and comprehensive resources for named entity recognition

Nourani, Esmaeil; Koutrouli, Mikaela; Xie, Yijia; Vagiaki, Danai; Pyysalo, Sampo; Nastou, Katerina; Brunak, Søren; Jensen, Lars Juhl

Lifestyle factors in the biomedical literature: An ontology and comprehensive resources for named entity recognition

Nourani, Esmaeil
Koutrouli, Mikaela
Xie, Yijia
Vagiaki, Danai
Pyysalo, Sampo
Nastou, Katerina
Brunak, Søren
Jensen, Lars Juhl
Katso/Avaa
btae613.pdf (2.233Mb)
Lataukset: 

Oxford University Press (OUP)
doi:10.1093/bioinformatics/btae613
URI
https://doi.org/10.1093/bioinformatics/btae613
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025082789998
Tiivistelmä

Motivation

Despite lifestyle factors (LSFs) being increasingly acknowledged in shaping individual health trajectories, particularly in chronic diseases, they have still not been systematically described in the biomedical literature. This is in part because no named entity recognition (NER) system exists, which can comprehensively detect all types of LSFs in text. The task is challenging due to their inherent diversity, lack of a comprehensive LSF classification for dictionary-based NER, and lack of a corpus for deep learning-based NER.

Results

We present a novel lifestyle factor ontology (LSFO), which we used to develop a dictionary-based system for recognition and normalization of LSFs. Additionally, we introduce a manually annotated corpus for LSFs (LSF200) suitable for training and evaluation of NER systems, and use it to train a transformer-based system. Evaluating the performance of both NER systems on the corpus revealed an F-score of 64% for the dictionary-based system and 76% for the transformer-based system. Large-scale application of these systems on PubMed abstracts and PMC Open Access articles identified over 300 million mentions of LSF in the biomedical literature.

Availability and implementation

LSFO, the annotated LSF200 corpus, and the detected LSFs in PubMed and PMC-OA articles using both NER systems, are available under open licenses via the following GitHub repository: https://github.com/EsmaeilNourani/LSFO-expansion. This repository contains links to two associated GitHub repositories and a Zenodo project related to the study. LSFO is also available at BioPortal: https://bioportal.bioontology.org/ontologies/LSFO.

Kokoelmat
  • Rinnakkaistallenteet [29337]

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste
 

 

Tämä kokoelma

JulkaisuajatTekijätNimekkeetAsiasanatTiedekuntaLaitosOppiaineYhteisöt ja kokoelmat

Omat tiedot

Kirjaudu sisäänRekisteröidy

Turun yliopiston kirjasto | Turun yliopisto
julkaisut@utu.fi | Tietosuoja | Saavutettavuusseloste