Data analysis with limited data availability: prostate cancer prediction and characterization as a case study

dc.contributor.authorMontoya Perez, Ileana
dc.contributor.departmentfi=Tietotekniikan laitos|en=Department of Computing|-
dc.contributor.facultyfi=Matemaattis-luonnontieteellinen tiedekunta|en=Faculty of Science|-
dc.contributor.studysubjectfi=Tietojenkäsittelytiede|en=Computer Science|-
dc.date.accessioned2024-04-18T05:46:03Z
dc.date.available2024-04-18T05:46:03Z
dc.date.issued2024-05-17
dc.description.abstractResearch studies conducted on limited datasets (i.e., data from tens to maximum hundreds of observations) may be the only practical option for many research areas, as data collection might be costly, complex, or both. Data analysis on these datasets is challenging as it can lead to inaccurate results. In this thesis, we addressed this challenge in the context of prostate cancer research by empirically assessing the predictive and characterization capabilities of attributes with the following objectives: to evaluate the predictive power of features extracted from prostate magnetic resonance imaging (MRI) using cross-validation techniques, to develop and evaluate a cross-validation method for small sample sizes that allow receiver operating characteristic (ROC) analysis, and to identify and compare relevant predictors among MRI features, clinical variables, gene expressions, and kallikreins for prostate cancer detection and stratification. To achieve these objectives, we used data from approved studies and registered clinical trials at Turku University Hospital, involving a strong collaboration between university departments and hospitals. This collaboration enabled the collection of diverse, high-quality features to enhance prostate cancer diagnosis and prognosis research. The results of this thesis can be summarized as follows. First, when evaluating radiomic features from various MRI modalities, our findings demonstrate the potential that these features have in stratifying prostate tumors into low- and highrisk. Second, in terms of model evaluation using ROC analysis and cross-validation, our research highlights a significant negative bias in the area under the ROC curve when estimated by leave-one-out (LOOCV) and introduces a novel cross-validation method called tournament leave-pair-out (TLPOCV) as a more reliable method for ROC analysis than LOOCV. Finally, our results provide empirical evidence of the predictive potential that quantitative and qualitative features from MRI, clinical variables, gene expressions, and kallikreins—individually and in combination—have in detecting and stratifying prostate cancer. The findings in this research are of interest not only to medical professionals and healthcare providers engaged in prostate cancer research but also to those involved in analyzing and learning from size-constrained datasets while achieving clinically meaningful evaluation outcomes.-
dc.description.accessibilityfeatureei tietoa saavutettavuudesta
dc.format.contentfulltext-
dc.identifier.olddbid193886
dc.identifier.oldhandle10024/176943
dc.identifier.urihttps://www.utupub.fi/handle/11111/28643
dc.identifier.urnURN:ISBN:978-951-29-9646-9-
dc.language.isoeng-
dc.publisherfi=Turun yliopisto|en=University of Turku|-
dc.relation.ispartofseriesTurun yliopiston julkaisuja - Annales Universitatis Turkuensis, Ser F: Technica - Informatica-
dc.relation.issn2736-9684-
dc.relation.numberinseries36-
dc.source.identifierhttps://www.utupub.fi/handle/10024/176943
dc.titleData analysis with limited data availability: prostate cancer prediction and characterization as a case study-
dc.type.ontasotfi=Artikkeliväitöskirja|en=Doctoral dissertation (article-based)|-

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
AnnalesF36PerezDISS.pdf
Size:
1.61 MB
Format:
Adobe Portable Document Format