An efficient incremental algorithm for clustering large datasets

Lampainen, Jenni; Joki, Kaisa; Karmitsa, Napsu; Mäkelä, Marko M.

An efficient incremental algorithm for clustering large datasets

dc.contributor.author	Lampainen, Jenni
dc.contributor.author	Joki, Kaisa
dc.contributor.author	Karmitsa, Napsu
dc.contributor.author	Mäkelä, Marko M.
dc.contributor.organization	fi=data-analytiikka\|en=Data-analytiikka\|
dc.contributor.organization	fi=sovellettu matematiikka\|en=Applied mathematics\|
dc.contributor.organization-code	1.2.246.10.2458963.20.48078768388
dc.contributor.organization-code	1.2.246.10.2458963.20.68940835793
dc.converis.publication-id	523214179
dc.converis.url	https://research.utu.fi/converis/portal/Publication/523214179
dc.date.accessioned	2026-05-07T20:11:31Z
dc.description.abstract	Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on novel incremental approach and nonsmooth formulation of the the minimum sum-of-squares clustering problem. Particularly, the clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively in very large datasets, the limited memory bundle method (Haarala et al. in Optim Methods Softw 19(6):673–692, 2004) is applied as an underlying solver in Clust-Splitter. We test and evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental results demonstrate the efficiency of the proposed method for clustering very large datasets, as well as the high quality of its solutions, which are on par with those of the best existing methods.
dc.identifier.eissn	1862-5355
dc.identifier.jour-issn	1862-5347
dc.identifier.uri	https://www.utupub.fi/handle/11111/60433
dc.identifier.url	https://doi.org/10.1007/s11634-025-00661-6
dc.identifier.urn	URN:NBN:fi-fe2026050740933
dc.language.iso	en
dc.okm.affiliatedauthor	Lampainen, Jenni
dc.okm.affiliatedauthor	Joki, Kaisa
dc.okm.affiliatedauthor	Karmitsa, Napsu
dc.okm.affiliatedauthor	Mäkelä, Marko
dc.okm.discipline	111 Mathematics	en_GB
dc.okm.discipline	111 Matematiikka	fi_FI
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A1 ScientificArticle
dc.publisher	Springer Science and Business Media LLC
dc.publisher.country	Germany	en_GB
dc.publisher.country	Saksa	fi_FI
dc.publisher.country-code	DE
dc.relation.doi	10.1007/s11634-025-00661-6
dc.relation.ispartofjournal	Advances in Data Analysis and Classification
dc.title	An efficient incremental algorithm for clustering large datasets
dc.year.issued	2026

Tiedostot

Näytetään 1 - 1 / 1

Name:: s11634-025-00661-6.pdf
Size:: 8.15 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Rinnakkaistallenteet