An efficient incremental algorithm for clustering large datasets

dc.contributor.authorLampainen, Jenni
dc.contributor.authorJoki, Kaisa
dc.contributor.authorKarmitsa, Napsu
dc.contributor.authorMäkelä, Marko M.
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=sovellettu matematiikka|en=Applied mathematics|
dc.contributor.organization-code1.2.246.10.2458963.20.48078768388
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id523214179
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/523214179
dc.date.accessioned2026-05-07T20:11:31Z
dc.description.abstractClustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on novel incremental approach and nonsmooth formulation of the the minimum sum-of-squares clustering problem. Particularly, the clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively in very large datasets, the limited memory bundle method (Haarala et al. in Optim Methods Softw 19(6):673–692, 2004) is applied as an underlying solver in Clust-Splitter. We test and evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental results demonstrate the efficiency of the proposed method for clustering very large datasets, as well as the high quality of its solutions, which are on par with those of the best existing methods.
dc.identifier.eissn1862-5355
dc.identifier.jour-issn1862-5347
dc.identifier.urihttps://www.utupub.fi/handle/11111/60433
dc.identifier.urlhttps://doi.org/10.1007/s11634-025-00661-6
dc.identifier.urnURN:NBN:fi-fe2026050740933
dc.language.isoen
dc.okm.affiliatedauthorLampainen, Jenni
dc.okm.affiliatedauthorJoki, Kaisa
dc.okm.affiliatedauthorKarmitsa, Napsu
dc.okm.affiliatedauthorMäkelä, Marko
dc.okm.discipline111 Mathematicsen_GB
dc.okm.discipline111 Matematiikkafi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherSpringer Science and Business Media LLC
dc.publisher.countryGermanyen_GB
dc.publisher.countrySaksafi_FI
dc.publisher.country-codeDE
dc.relation.doi10.1007/s11634-025-00661-6
dc.relation.ispartofjournalAdvances in Data Analysis and Classification
dc.titleAn efficient incremental algorithm for clustering large datasets
dc.year.issued2026

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
s11634-025-00661-6.pdf
Size:
8.15 MB
Format:
Adobe Portable Document Format