Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions

dc.contributor.authorAndrade, Stefan B.
dc.contributor.authorFasang, Anette Eva
dc.contributor.authorHelske, Satu
dc.contributor.authorKarhula, Aleksi
dc.contributor.organizationfi=sosiologia|en=Sociology|
dc.contributor.organization-code2603303
dc.converis.publication-id459134509
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/459134509
dc.date.accessioned2025-08-28T02:21:24Z
dc.date.available2025-08-28T02:21:24Z
dc.description.abstract<p>Sequence analysis in the social sciences heavily relies on cluster techniques to identify typologies. Clustering techniques and statistical cluster cut-off criteria for selecting the optimal number of clusters have greatly improved. In contrast, we lack a systematic assessment of how data features, such as the sequence sample size, the number of time points in the sequences, and the number of distinct states in the sequence alphabet might systematically impact the identification of sequence typologies. Drawing on both simulated data from mixture Markov models and real data from the German Family Panel survey, we provide best-practice guidelines for applied researchers to gauge whether their data is sufficient for extracting robust sequence typologies, if they empirically exist. Sequence typologies are most robust for samples with at least 500 sequences, sequence lengths greater than 10 time points, and state alphabets that have at least as many states as the “true” number of clusters.<br></p>
dc.identifier.olddbid208977
dc.identifier.oldhandle10024/192004
dc.identifier.urihttps://www.utupub.fi/handle/11111/36595
dc.identifier.urlhttp://doi.org/10.31235/osf.io/kj8d5
dc.identifier.urnURN:NBN:fi-fe2025082792202
dc.language.isoen
dc.okm.affiliatedauthorHelske, Satu
dc.okm.discipline5141 Sociologyen_GB
dc.okm.discipline5141 Sosiologiafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeD4 Scientific Report
dc.publisherCenter for Open Science
dc.publisher.countryUnited Statesen_GB
dc.publisher.countryYhdysvallat (USA)fi_FI
dc.publisher.country-codeUS
dc.relation.doi10.31235/osf.io/kj8d5
dc.source.identifierhttps://www.utupub.fi/handle/10024/192004
dc.titleTypologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions
dc.year.issued2023

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Typologies in Sequence Analysis.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format