Preventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement

dc.contributor.authorVadadokhau, Uladzislau
dc.contributor.authorSoliman, Mai
dc.contributor.authorCastillon, Leticia
dc.contributor.authorPastor Muñoz, Paula
dc.contributor.authorId, Linda
dc.contributor.authorNatraj Gayathri, Swethaa
dc.contributor.authorSrivastava, Ankita
dc.contributor.authorRuneberg, Tyko
dc.contributor.authorGonzález-Armijos, Tamara
dc.contributor.authorŠapovalovaitė, Karina
dc.contributor.authorSakalauskaite, Milda
dc.contributor.authorAdhikari, Sadiksha
dc.contributor.authorAbe, Oluwatosin
dc.contributor.authorTohmola, Tiialotta
dc.contributor.authorLi, Hao
dc.contributor.authorSundaresan, Srividhya
dc.contributor.authorVesikukka, Hanna
dc.contributor.authorRoininen, Jannica
dc.contributor.authorZangene, Ehsan
dc.contributor.authorSoliymani, Rabah
dc.contributor.authorTuomivaara, Sami T.
dc.contributor.authorSchwämmle, Veit
dc.contributor.authorSaei, Amir A.
dc.contributor.authorVarjosalo, Markku
dc.contributor.authorJafari, Mohieddin
dc.contributor.organizationfi=Turun biotiedekeskus|en=Turku Bioscience Centre|
dc.contributor.organization-code1.2.246.10.2458963.20.18586209670
dc.converis.publication-id508924988
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/508924988
dc.date.accessioned2026-04-24T20:20:49Z
dc.description.abstract<p>Public proteomics repositories now host vast amounts of mass spectrometry data, yet much of it remains difficult to reuse, risking “data tombs” that are open access but not practically re-analyzable. In spring 2025, a graduate-level course at the University of Helsinki tasked six student teams with reanalyzing six projects from the Proteomics Identification Database (label-free quantification only) using a common R-based workflow (rpx, mzR, QFeatures, DEP/MSqRob2/limma/OmicsQ packages) that was shared across all teams. The teams reproduced identification, optional quantification, normalization, imputation, and differential expression analyses, and compared the outcomes to the original studies. As expected, systemic barriers recurred across cases: (i) no sample and data relationship format for proteomics metadata in any of the cases; (ii) missing details regarding decoy sets for false discovery rate assessment; (iii) proprietary-only outputs or software (e.g., Thermo.msf, Progenesis) that impeded open reanalysis in interoperable, community-standard formats; (iv) missing data-independent acquisition spectral libraries or protein sequences database files (FASTA); (v) absent or vague normalization/imputation/statistical parameters; (vi) inconsistent file naming; and (vii) insufficient biological/technical replication in at least one project. These shortcomings yielded large discrepancies in the analysis results (e.g., 13,068 vs. 4,923 proteins; 108 vs. 11 differentially expressed proteins), and, in one instance, a highlighted protein lacked robust support in the deposited identifications. We observed that reproducibility in mass spectrometry-based proteomics hinges less on instruments than on transparent metadata, open formats, and executable analysis provenance. We propose that data creators provide a minimum re-analysis package, including raw data and open formats, community standards, basic quality control summaries, data-independent acquisition spectral libraries, and complete parameter/code sets with pinned versions or containers. Moreover, we recommend repository-level nudges toward making such packages mandatory. This educational exercise simultaneously trains the students as well as stress-tests the community data practices to prevent proteomics “data tombs”.<br></p>
dc.identifier.eissn2052-4463
dc.identifier.urihttps://www.utupub.fi/handle/11111/59515
dc.identifier.urlhttps://doi.org/10.1038/s41597-026-06614-8
dc.identifier.urnURN:NBN:fi-fe2026042333261
dc.language.isoen
dc.okm.affiliatedauthorAbe, Oluwatosin
dc.okm.discipline1182 Biochemistry, cell and molecular biologyen_GB
dc.okm.discipline1182 Biokemia, solu- ja molekyylibiologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSpringer Nature
dc.publisher.countryUnited Kingdomen_GB
dc.publisher.countryBritanniafi_FI
dc.publisher.country-codeGB
dc.relation.doi10.1038/s41597-026-06614-8
dc.relation.ispartofjournalScientific Data
dc.relation.issue1
dc.relation.volume13
dc.titlePreventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement
dc.year.issued2026

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s41597-026-06614-8.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format