Realizing multioperations and multiprefixes in Thick Control Flow processors

dc.contributor.authorForsell Martti
dc.contributor.authorRoivainen Jussi
dc.contributor.authorLeppänen Ville
dc.contributor.authorTräff Jesper L.
dc.contributor.organizationfi=ohjelmistotekniikka|en=Software Engineering|
dc.contributor.organization-code1.2.246.10.2458963.20.71310837563
dc.converis.publication-id179035044
dc.converis.urlhttps://research.utu.fi/converis/portal/Publication/179035044
dc.date.accessioned2025-08-28T02:30:31Z
dc.date.available2025-08-28T02:30:31Z
dc.description.abstract<p>Multioperations are primitives of parallel computation by which threads perform reductions, e.g., additions, on values provided by multiple threads into a single value in a constant number of steps. Multiprefixes resemble multioperations, but return to each participating thread a cumulative ordered reduction of all preceding values. Algorithmically, multioperations and multiprefixes can speed up parallel programs by a logarithmic factor over their single operation counterparts. In this paper, we introduce architectural techniques for realizing multioperations and multiprefixes in so-called Thick Control Flow (TCF) processors. A thick control flow is a computational construct that bundles homogeneous threads following the same control path into a data parallel entity. Our proposed processors optimized for executing TCFs feature a frontend-backend structure with low-latency processing of TCF-common computations and high-throughput execution of data parallel parts. Our solution relies on step caches and equally sized multioperation scratchpads, while on the memory side, we make use of active memory modules. The idea is to compute partial results in backend units to reduce the traffic to the referred shared memory location. The final multioperation result is then computed in the active memory unit of the target memory module. Multiprefixes use an additional phase where the final results are computed with a help of backend-wise prefixes. According to our evaluation, the proposed techniques indeed speed up certain N data element algorithms by a log N factor with reasonable hardware costs.<br></p>
dc.identifier.jour-issn0141-9331
dc.identifier.olddbid209213
dc.identifier.oldhandle10024/192240
dc.identifier.urihttps://www.utupub.fi/handle/11111/40730
dc.identifier.urlhttps://doi.org/10.1016/j.micpro.2023.104807
dc.identifier.urnURN:NBN:fi-fe2023032933675
dc.language.isoen
dc.okm.affiliatedauthorLeppänen, Ville
dc.okm.discipline113 Computer and information sciencesen_GB
dc.okm.discipline213 Electronic, automation and communications engineering, electronicsen_GB
dc.okm.discipline113 Tietojenkäsittely ja informaatiotieteetfi_FI
dc.okm.discipline213 Sähkö-, automaatio- ja tietoliikennetekniikka, elektroniikkafi_FI
dc.okm.internationalcopublicationinternational co-publication
dc.okm.internationalityInternational publication
dc.okm.typeA1 ScientificArticle
dc.publisherElsevier B.V.
dc.publisher.countryNetherlandsen_GB
dc.publisher.countryAlankomaatfi_FI
dc.publisher.country-codeNL
dc.relation.articlenumber104807
dc.relation.doi10.1016/j.micpro.2023.104807
dc.relation.ispartofjournalMicroprocessors and Microsystems
dc.relation.volume98
dc.source.identifierhttps://www.utupub.fi/handle/10024/192240
dc.titleRealizing multioperations and multiprefixes in Thick Control Flow processors
dc.year.issued2023

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
1-s2.0-S0141933123000534-main.pdf
Size:
2.03 MB
Format:
Adobe Portable Document Format