All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Salakoski T; Pahikkala T; Pyysalo S; Ginter F; Airola A; Bjorne J

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Salakoski T; Pahikkala T; Pyysalo S; Ginter F; Airola A; Bjorne J

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Salakoski T

Pahikkala T

Pyysalo S

Ginter F

Airola A

Bjorne J

Katso/Avaa

all_paths.pdf (422.0Kb)

Lataukset:

BIOMED CENTRAL LTD

doi:10.1186/1471-2105-9-S11-S2

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2021042714749

Tiivistelmä

Background

Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure.

Results

We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus.

Conclusion

We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided.

Kokoelmat

Rinnakkaistallenteet [19207]