Transition from Batch to Real-Time Analytics : Implications for Analytics Data Modeling
avoin
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
Lataukset1
Pysyvä osoite
Verkkojulkaisu
DOI
Tiivistelmä
This thesis examines how analytics-layer data modeling changes when batch-oriented processing is reduced or removed in favor of continuous or near-real-time analytics. Batch-oriented architectures have historically supported analytical interpretation by providing temporally stable snapshots through scheduled extraction, transformation, and loading processes that reconcile data before exposure. As real-time and streaming-oriented architectures become more common, these conditions weaken, making it necessary to reconsider how analytical truth, closure, revision, and semantic consistency are maintained when the analytical state is continuously evolving.
The study addresses this problem through two complementary methods: a systematic literature review conducted in accordance with PRISMA 2020 guidelines and a comparative simulation study. The literature review identifies and analyzes prior research on analytics-layer modeling under real-time or continuous processing constraints, while the simulation study evaluates how alternative architectural patterns behave under a shared set of business requirements. To support this analysis, the thesis proposes a six-dimension framework consisting of analytical truth, temporal closure, history mutability, semantic stabilization, analytical abstraction, and semantic consistency.
The literature review results point to several distinct architectural patterns that preserve different aspects of interpretability while relaxing others, including closed snapshot warehouses, open evolving streams, window-bounded streams, log-consistent HTAP architectures, and virtual semantic snapshots. The simulation study further demonstrates that, although architectures differ in their native state, they can be adapted to satisfy similar business requirements, including requirements commonly associated with batch-native behavior. What differs is not primarily the user-facing analytical contract that can be achieved, but the architectural means by which it is achieved. This includes the maintenance burden, closure mechanism, and the need for supporting intermediary structures.
The thesis offers three related conceptual contributions. First, it proposes a six-dimension framework covering analytical truth, temporal closure, history mutability, semantic stabilization, analytical abstraction, and semantic consistency, which makes the temporal and semantic assumptions of different analytics architectures explicit and comparable across paradigms. Second, it applies this framework to argue that the transition to real-time analytics represents not only a change in processing speed but a change in the interpretability conditions of analytics-layer models: the central question shifts from whether familiar analytical requirements can still be met to how truth, closure, revision, and semantic consistency must be explicitly maintained when the analytical state can no longer be assumed closed. Third, the simulation study demonstrates that while architectures with different native semantics can be adapted to satisfy similar analytical contracts, doing so may relocate rather than eliminate the need for stabilization, a finding with direct implications for how real-time architecture decisions are evaluated in practice.