Stochastic limited memory bundle algorithm for clustering in big data

dc.contributor.authorKarmitsa, Napsu
dc.contributor.authorEronen, Ville-Pekka
dc.contributor.authorMäkelä, Marko M.
dc.contributor.authorPahikkala, Tapio
dc.contributor.authorAirola, Antti
dc.contributor.organizationfi=data-analytiikka|en=Data-analytiikka|
dc.contributor.organizationfi=sovellettu matematiikka|en=Applied mathematics|
dc.contributor.organizationfi=terveysteknologia|en=Health Technology|
dc.contributor.organization-code1.2.246.10.2458963.20.28696315432
dc.contributor.organization-code1.2.246.10.2458963.20.48078768388
dc.contributor.organization-code1.2.246.10.2458963.20.68940835793
dc.converis.publication-id491806564
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/491806564
dc.date.accessioned2025-08-27T23:32:00Z
dc.date.available2025-08-27T23:32:00Z
dc.description.abstractClustering is a crucial task in data mining and machine learning. In this paper, we propose an efficient algorithm, BIG-CLuST, for solving minimum sum-of-squares clustering problems in large and big datasets. We first develop a novel stochastic limited memory bundle algorithm (SLMBA) for large-scale nonsmooth finite-sum optimization problems and then formulate the clustering problem accordingly. The BIG-CLuST algorithm - a stochastic adaptation of the incremental clustering methodology - aims to find the global or a high-quality local solution for the clustering problem. It detects good starting points, i.e., initial cluster centers, for the SLMBA, applied as an underlying solver. We evaluate BIG-CLuST on several real-world datasets with numerous data points and features, comparing its performance with other clustering algorithms designed for large and big data. Numerical results demonstrate the efficiency of the proposed algorithm and the high quality of the found solutions on par with the best existing methods.
dc.identifier.eissn1873-5142
dc.identifier.jour-issn0031-3203
dc.identifier.olddbid204139
dc.identifier.oldhandle10024/187166
dc.identifier.urihttps://www.utupub.fi/handle/11111/52287
dc.identifier.urlhttps://doi.org/10.1016/j.patcog.2025.111654
dc.identifier.urnURN:NBN:fi-fe2025082786330
dc.language.isoen
dc.okm.affiliatedauthorKarmitsa, Napsu
dc.okm.affiliatedauthorEronen, Ville-Pekka
dc.okm.affiliatedauthorMäkelä, Marko
dc.okm.affiliatedauthorPahikkala, Tapio
dc.okm.affiliatedauthorAirola, Antti
dc.okm.discipline111 Mathematicsen_GB
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline111 Matematiikkafi_FI
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.internationalcopublicationnot an international co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherElsevier BV
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.publisher.placeLondon
dc.relation.articlenumber111654
dc.relation.doi10.1016/j.patcog.2025.111654
dc.relation.ispartofjournalPattern Recognition
dc.relation.volume165
dc.source.identifierhttps://www.utupub.fi/handle/10024/187166
dc.titleStochastic limited memory bundle algorithm for clustering in big data
dc.year.issued2025

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
1-s2.0-S0031320325003140-main.pdf
Size:
2.53 MB
Format:
Adobe Portable Document Format