Br J Educ Psychol. 2025;00:1–26.   | 1wileyonlinelibrary.com/journal/bjep
Received: 27 June 2025 | Accepted: 16 October 2025
DOI: 10.1111/bjep.70043  
A R T I C L E
Beyond performance: Emotions before and after 
semi- high- stakes mathematics testing among 
school- aged students
Reetta Kyynäräinen1,2  |   Santeri Holopainen1  |   Jari Metsämuuronen1  | 
Umar Bin Qushem1  |   Mikko- Jussi Laakso1  |   Katarina Alanko1
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction 
in any medium, provided the original work is properly cited.
© 2025 The Author(s). British Journal of Educational Psycholog y published by John Wiley & Sons Ltd on behalf of British Psychological Society.
1Turku Research Institute for Learning Analytics, 
University of Turku, Turku, Finland
2Department of Chemistry, University of Turku, 
Turku, Finland
Correspondence
Reetta Kyynäräinen, Department of Chemistry, 
University of Turku, Turku, Finland.
Email: reetta.kyynarainen@utu.fi
Funding information
Opetus- ja Kulttuuriministeriö; Academy of 
Finland
Abstract
Background: Previous research has shown that testing 
differs significantly from other classroom activities and is 
associated with heightened negative emotions and lower lev-
els of positive emotions. However, relatively little is known 
about students' emotions surrounding testing, particularly 
in higher- stakes assessment settings.
Aims: This study aims to examine how students' levels of 
four emotions (i.e., happiness, relaxation, anxiety and bore-
dom) develop from pre to- post- test, and it investigates how 
individual factors (i.e., gender, grade level, perceived mathe-
matical competence and test performance), impact students' 
emotional states and moderate their emotional trajectories.
Sample: The sample (N = 2179) consists of 692 third- grade, 
605 sixth- grade, 413 eighth- grade and 469 ninth- grade stu-
dents from various schools across Finland, who partici-
pated in a digital, semi- high- stakes, end- of- year mathematics 
assessment.
Methods: An in- situ approach was used to assess students' 
emotions immediately before and after testing. Analyses 
were conducted using linear mixed- effects modelling to ac-
count for the repeated- measurements structure.
Results and Conclusions: Students generally reported 
lower positive emotions after the assessment. The meas-
ured individual factors significantly predict both students' 
emotional states and their development during the assess-
ment. Boys reported higher levels of positive emotions and 
lower anxiety, while younger students remained more posi-
tive during the assessment. Students who perceived them-
selves as competent experienced higher levels of positive 
2 |   KYYNÄRÄINEN et al.
INTRODUCTION
Emotions actively influence cognitive processes like attention and memory, as well as affective pro-
cesses such as motivation, particularly in evaluative settings (Pekrun, 2006; Pekrun et al., 2002). There 
is an abundance of previous research on students' emotions—particularly anxiety—related to assess-
ment and testing in mathematics (Halme et al., 2022; Putwain, 2008; Putwain et al., 2021; Schutz & 
Davis, 2000; Vogl & Pekrun, 2016). However, there is still limited research (see Goetz et al., 2007; 
Kleine et al., 2005) on emotional trajectories surrounding testing situations, especially concerning the 
development of emotions and the factors contributing to these shifts (Schmid et al., 2025; Vogl & 
Pekrun, 2016). Moreover, previous research has predominantly focused on negative emotions, over-
looking positive affect, especially following assessment (Pekrun et al., 2002; Vogl & Pekrun, 2016). 
Understanding these emotional trajectories is essential, as they may reflect shifts not only in students' 
performance but also in broader affective constructs such as engagement or persistence in mathematics.
This study explores how elementary and middle school students' emotions shift before and after 
a large- scale, semi- high- stakes math assessment. The assessment provides teachers with a final grade 
suggestion for each student based on their performance on the test. Furthermore, we focus on the roles 
of gender, grade, test scores and perceived math competence. By examining emotional changes from 
pre- to post- assessment, the research aims to understand why some students remain calm or excel under 
pressure, helping identify those who may need targeted emotional support to promote both academic 
and emotional development.
Emotions
Emotions fall under the affective learning domain, which also contains concepts like moods, beliefs, 
attitudes, motivation and interest. An emotion, in contrast to a mood, is of a shorter duration, more 
fluid nature and has a more specific focus. Typically, emotions are understood as transient states trig-
gered by a specific stimulus, but they can also reflect individuals' dispositions to experience such states 
in comparable contexts. These momentarily, transient states are referred to as state emotions, while the 
latter constitutes what is known as trait emotions (Pekrun et al., 2018).
In addition to stability, emotions are often categorized based on valence and arousal (Pekrun 
et al., 2018). Valence refers to the intrinsic positivity or negativity of an emotional experience, that is, how 
pleasant or unpleasant the emotion feels, and it can be used to categorize emotions as positive or negative. 
Arousal, on the other hand, distinguishes emotions based on shifts in one's physiological activity when 
experiencing the emotion. Activating emotions prompt quick reactions, while deactivating emotions 
promote rest (Ketonen et al., 2023). These two dimensions can be used to separate emotions into four 
categories in a two- dimensional emotional map: positive activating (such as happy, excited), positive de-
activating (e.g., relaxed, relieved), negative activating (e.g., anxious, frustrated) and negative deactivating 
and lower levels of negative emotions, whereas students 
who performed poorly showed a decline in positive emo-
tions during the assessment. Future research could focus on 
whether support for emotional regulation affects student 
performance in test situations.
K E Y W O R D S
assessment, competence belief, emotions, gender differences, semi- high- 
stakes testing, test performance
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 3EMOTIONS BEFORE AND AFTER MATH TESTING
(e.g., bored, hopeless) (Pekrun et al., 2018). We chose to examine four emotions—happy, relaxed, anxious 
and bored—as representatives of the four quadrants of this two- dimensional circumplex model.
Third, Pekrun et al. (2018) identify object focus, the trigger or antecedent of the emotion, as a key 
factor in addressing emotions at school. In instructional settings, emotions are triggered by the ongoing 
academic tasks and can be defined as academic emotions. Emotions related to assessment, however, 
may be classified as achievement emotions, tied to students' expectations and experiences of success or 
failure (Pekrun, 2006). Nonetheless, emotions are holistic by nature, and without further information 
from the learner, object- based distinctions should be made with caution.
Emotions arise in response to external triggers, but internal factors can impact individuals' emo-
tional regulation, too (Harley et al., 2019). According to previous research, students' achievement, self- 
perceptions, age and gender, for instance, can predict their emotional states or emotional responses to 
certain triggers. In mathematics, high- performing students are less anxious (Erturan & Jansen, 2015; 
Pekrun et al., 2018), students who feel competent experience more positive emotions (Pekrun et al., 2018; 
Vogl & Pekrun, 2016), girls experience particularly more trait- like test- anxiety (Erturan & Jansen, 2015; 
Frenzel et al., 2007; Goetz et al., 2013), and younger students express higher levels of positive emotions 
and lower levels of negative emotions (Mata et al., 2022). However, prior studies suggest that these 
individual characteristics might not impact dynamic state emotions as significantly as they impact trait- 
emotions (see Goetz et al., 2013; Tulis & Ainley, 2011). This could be due to the fact that some individ-
ual factors can contribute to stereotypes which influence how likely students are to overestimate, for 
example, their habitual anxiety (cf. Goetz et al., 2013). In addition, in contrast to more stable trait emo-
tions, adolescents' situational state emotions show substantial within- person variability across settings 
and companions. Situational factors and the proximal environment account for large fluctuations in 
state affects, repeatedly demonstrated by, for example, experience- sampling studies (Dirk & Nett, 2022; 
Mölsä et al., 2025; Vilhunen et al., 2022).
Emotions during learning and assessment
Although primarily affective by nature, emotions also encompass cognitive, physiological, motivational 
and behavioural components (Brun et al., 2008; Pekrun et al., 2017). As such, they potentially influ-
ence students' thinking, motivation and actions, promoting or hindering learning and test performance 
( Jarrell & Lajoie, 2017; Pekrun et al., 2002). They shape key cognitive processes such as concentra-
tion, attention, memory and decision- making (Pierson et al., 2023; Shuman & Scherer, 2013; Vilhunen 
et al., 2023; Vogl & Pekrun, 2016).
Generally, positive emotions relate positively to fundamental metacognitive strategies, such as 
elaboration and critical thinking (Pekrun, 2006; Pekrun et al., 2002), which are vital in a test setting. 
The connection between experiencing negative emotions and learning is not as consistent as, for in-
stance, negative activating emotions (e.g., anxiety) can foster efficient metacognitive strategies (Pekrun 
et al., 2002; Tulis & Fulmer, 2013), whereas negative deactivating emotions (e.g., boredom) can be det-
rimental (D'Mello & Graesser, 2011; Ketonen et al., 2023; Pekrun et al., 2010). Learning could be pro-
moted by maintaining negative emotions at adequate levels. However, students might need additional 
pedagogical scaffolding to regulate the levels and duration of negative emotions—support that is rarely 
available during assessment (D'Mello et al., 2014).
Pekrun's control- value theory of achievement emotions (Pekrun, 2006) has been widely used as a 
theoretical lens for studying affects related to assessment in mathematics, because it identifies relation-
ships between motivational factors, emotions and learning outcomes in students. Students' achieve-
ment emotions—enjoyment, boredom and anxiety—in mathematics are predicted by intrinsic values 
(e.g., interest) and sense of control, which are in turn linked to their achievement outcomes in testing 
(Putwain et al., 2021). They are also connected to students' self- concept, including students' percep-
tions of their competence in mathematics, which is positively associated with mathematics achievement 
(Frenzel et al., 2007; Van Der Beek et al., 2017; Zhang et al., 2023). Thus, experiencing academic success 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 |   KYYNÄRÄINEN et al.
or failure in an assessment could trigger reappraisal of one's competence, shaping students' self- concept 
and emotions (cf. Frenzel et al., 2007; Pekrun et al., 2018; Putwain et al., 2021). In general, mathemat-
ics self- concept has been found to weaken through the years of elementary and secondary education, 
 accompanied by the increase of trait- like math anxiety (Kaur et al., 2022; Nagy et al., 2010).
Previous research has shown that testing situations differ significantly from other types of classroom 
activities (Beymer et al., 2021; Schutz & Davis, 2000; Vogl & Pekrun, 2016). Generally, students associ-
ate high importance with tests, quizzes and their outcomes (Beymer et al., 2021; Schutz & Davis, 2000). 
Test settings might promote a lower sense of control over the outcome, perhaps because students have 
limited opportunities to choose how to behave or what to direct their attention to (Beymer et al., 2021). 
Additionally, students associate heightened negative emotions and lower levels of positive emotions 
with assessment (Beymer et al., 2021; Schutz & Davis, 2000). However, much of this knowledge is cross- 
sectional, with limited research exploring how emotions evolve across the testing situation.
Nevertheless, a few studies have previously investigated students' emotions before, during and after 
testing, proposing that students experience heightened levels of negative emotions before testing (see 
Goetz et al., 2007; Kleine et al., 2005). Goetz et al. (2007) suggest that low math ability predicts higher 
anxiety during testing and portray a general decline in enjoyment and increases in anger and bore-
dom from pre- to post- testing. Moreover, positive deactivating emotions are typically experienced after 
achievement tasks (Goetz et al., 2007; Junça- Silva et al., 2018; Pekrun et al., 2002).
Within this field, it is relevant to distinguish the high- stakes or low- stakes nature of the assessment 
because it can play a key role in predicting students' affective experiences and self- regulation (Ball, 1995; 
Putwain, 2008; Schutz & Davis, 2000; Vogl & Pekrun, 2016). If a test significantly affects outcomes like 
final grades, students may attach greater importance to it, triggering different or more intense emotions. 
However, much prior research, including studies on emotional development during testing, has been 
conducted in low- stakes contexts (see Putwain et al., 2021), which likely lowers student motivation and 
influences their emotional responses.
Some students can experience trait- like anxiety or hopelessness in test settings. These emotions can 
significantly hinder their performance in the test, even risking the validity of mathematics test scores 
across students with different levels of test anxiety (Vogl & Pekrun, 2016). Previous research states that 
the intensity of test anxiety could be related to the stakes of the assessment setting, but even more so 
to how familiar the test setting is to students (Putwain, 2008). Furthermore, trait- like test anxiety can 
moderate the levels of students' emotions associated with feeling challenged during a test. According to 
previous studies, lower test anxiety is associated with more positive emotions and less self- blame when 
faced with challenges (Schutz & Davis, 2000).
Mathematics education and assessment in Finland
In Finland, children typically enter school at the age of seven, after having completed 2 years of pre- 
primary education without an academic burden. Unlike in many other countries, children are not typi-
cally taught reading, writing, or mathematics in preschool; rather, the academic program begins in 
school (EDUFI, 2014, 2020, 2022). Accordingly, children enter first grade with a wide range of skills 
(see Metsämuuronen & Ukkola, 2019, 2022; Ukkola & Metsämuuronen, 2019; Ukkola et al., 2020).
Unlike in most European countries, no high- stakes, exam- type evaluation is administered during 
basic education. Instead, student assessment and evaluation are based on teachers' observations as well 
as diagnostic and teacher- made tests (Harju et al., 2025). In general, teachers may use three types of 
tests: teacher- made, diagnostic and summative tests. Typically, teachers administer several teacher- 
made tests throughout the year before and after math courses. Diagnostic tests, on the other hand, 
include assessments such as the FUNA- DB (Functional Numeracy Assessment Dyscalculia Battery; 
Hellstrand et al., 2024; Räsänen et al., 2021) for numeracy fluency and the MUREA (Multilingual 
Reading Assessment; Bertram et al., 2025; Salmela et al., 2021) for language fluency. Although not 
all schools use these tests, tens of thousands of students take the FUNA- DB alone every year. Third, 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 5EMOTIONS BEFORE AND AFTER MATH TESTING
summative tests are available for some grades from the Finnish Association for Teachers of Mathematics, 
Physics, Chemistry and Informatics. The DigiEva dataset in this study is one of these summative tests.
The challenge with teacher- made tests is that they are not comparable across schools and municipal-
ities. The normative National Core Curriculum for Basic Education addresses this issue by providing 
guidelines for grading (EDUFI, 2020). However, national assessments (Metsämuuronen, 2023) show 
that ‘assessment cultures’ differ radically between schools and providers. Students with the same level 
of achievement in mathematics may receive different grades from teachers in different classes, schools 
and municipalities. The DigiEva assessment aims to tackle this challenge, providing comparable sug-
gestions of the end- of- year grade for each student across schools and municipalities.
The present study
This study aims to investigate how students' emotions develop from pre- to post- assessment in a semi- 
high- stakes mathematics test. Specifically, we focus on four key emotions—happiness, relaxation, 
anxiety and boredom—and examine how these trajectories are influenced by individual characteris-
tics (gender, grade level, perceived mathematical competence and actual test performance). The study 
contributes to broadening the range of emotions studied by including an often- overlooked positive 
deactivating emotion of relaxation.
The specific research questions are:
1. Is there a significant change in the levels of  students' emotions from pre- to post- assessment tasks?
2. Is the change in emotional levels moderated by gender, grade level, competence beliefs and test per-
formance? If so, to what extent?
3. Do gender, grade, competence beliefs and test performance affect students' emotional levels at the 
general level, regardless of time? If so, to what extent?
The study was guided by the following hypothesis:
H1. Before the assessment, students experience higher levels of activating emotions 
as they may relate high importance and higher cognitive demands to that situation, 
compared to post- assessment, when we expect to see tension- reducing deactivating 
emotions.
H2. Individual characteristics and test performance moderate the changes; the effects 
vary across variables and groups. We further explore how individual characteristics (gender, 
grade, competence belief, performance) moderate the changes in emotions before and after 
the test situation.
H3. Individual characteristics predict general level of emotions, so that (a) girls report more 
anxiousness than boys, (b) younger students report more positive emotions overall than older 
students, (c) students with low competence beliefs report more negative emotions and (d) bet-
ter test performance is associated with more positive emotions both before and after testing.
METHOD
Participants
The participants were 2179 students from the 3rd (692, 31.8%), 6th (605, 27.8%), 8th (413, 19.0%) and 
9th (469, 21.5%) grades from various schools around Finland. The schools were recruited by an open 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 |   KYYNÄRÄINEN et al.
call for participation in the DigiEva pilot, constituting a semi- representative sample of primary and 
lower secondary schools in Finland. The ages of the students were typically 9, 12, 14 and 15, respectively. 
1052 were boys (48.3%), 1028 were girls (47.2%) and 99 chose not to report their gender (4.5%). 2007 
completed the assessments in Finnish (92.1%) and 172 in Swedish (7.9%), depending on the official 
instructional language in the school. For 1746 participants (80.1%), their home language was the same 
as the assessments' language, whereas for 158 participants (7.3%), their home language was something 
different. In addition, 178 came from mixed- language homes (8.2%; Finnish and Swedish or some other 
language) and 97 did not report their home language (4.5%). Before analysis, students with missing data 
on gender or competence belief were excluded (130, 6.0%). Finally, students with missing data on the 
emotion variables were removed (happy: 236, 11.5%; relaxed: 240, 11.7%; anxious: 249, 12.2%; bored: 
246; 12.0%).
The study was conducted following Finnish law and the ethical guidelines of the Finnish National 
Board on Research Integrity (TENK). Research permission was obtained from the relevant municipal-
ities. Students participated anonymously and voluntarily during regular school hours, and their guard-
ians were informed about the assessments in advance. The principle of informed dissent was upheld, 
ensuring that students could decline participation without any consequences.
Data collection
The questionnaire data concerning students' emotional experiences were collected before and after 
an end- of- the- study- year mathematics assessment conducted on a digital ViLLE learning platform 
(see Laakso et al., 2018). This study assesses emotions before and immediately after the test. In the 
in- situ questionnaires, students were asked ‘How are you feeling right now…’, instructed to respond 
concerning the four emotions—happy, relaxed, anxious and bored—on a 5- point Likert scale with 
response categories ranging from ‘1 = Not at all’ to ‘5 = Very much’ (cf. Pekrun et al., 2002). In the pre- 
assessment questionnaire, students also reported their gender and perceived mathematics competence 
with one item measure: ‘I am good at mathematics’ (cf. Fennema & Sherman, 1976; Gogol et al., 2014; 
Metsämuuronen, 2012; Parker et al., 2012), again on a 5- point scale. This item was originally adopted 
from the Fennema and Sherman Mathematics Attitudes Scale (1976), and edited for the Finnish con-
text (Metsämuuronen, 2012). This edited item, ‘I think I am good at mathematics’, expressed good 
predictive validity, having the highest factor loading reflecting students' self- concept in mathematics 
(Metsämuuronen, 2012). Thereby, a similar, simplified item was chosen for this study as the single- item 
measure of perceived competence in mathematics. At the beginning of the background questionnaire, 
participants were informed about its items, as well as the items asking about their current emotions dur-
ing the test. All items were in Finnish or Swedish. The background questionnaire was kept short, with a 
very limited set of questions (varying in length for different grades; item number varied between 22 and 
34). For the emotions, the emotion (in words) appeared on the left- hand side, and the Likert scale next 
to it on the right- hand side. All emotions appeared on the same page.
The test itself forms traditional assessment data, referred to as DigiEva. This was a pilot for a 
national assessment, which provides teachers with a suggestion of an end- of- year final grade for 
each student. Teachers were encouraged to inform students that their end- of- year evaluation would 
be affected by the math test; however, this was not monitored. The test was composed of multi-
ple sections, each evaluating a different mathematical domain. These six domains represented the 
core of the Finnish curriculum in mathematics: (1) mental calculations and computational thinking; 
(2) numbers and numerical operations; (3) algebra; (4) functions (not included in the third and sixth 
grade tests); (5) geometry and measurement and (6) statistics and probability. From this perspec-
tive, the construct validity of the tests is ensured. The test was administered to students in grades 
3, 6, 8 and 9 in April 2025. The entire DigiEva battery of tests, including background questions 
and 15- minute research- oriented test batteries, took one 45- minute lesson for third and sixth grad-
ers and two 75–90- min lessons for eighth and ninth graders. During the pilot phase, the time on 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 7EMOTIONS BEFORE AND AFTER MATH TESTING
task was studied, and the number of items was determined so that all students would have enough 
time to answer all questions. An additional 15 min (the lesson break) was allotted for students who 
needed more time to finish the test. Each grade had three linked test versions, and the scores were 
equated so that the average 9th grader received a standard score of 0. Experienced mathematics 
teachers selected the tasks from the item banks so that the content would be relevant and emphasize 
application- type tasks over memorization. The wording of the test items was controlled to avoid 
ambiguity, excess and complexity. This was done with students with low language skills and reading 
ability in lower grades in mind.
Finland's national evaluation and assessment system related to learning outcomes is based on national 
representative samples and low- stakes tests (Metsämuuronen, 2009; Metsämuuronen & Ukkola, 2019). 
The low- stakes characteristic stems from the fact that test results do not affect student selection pro-
cesses or prospects. Conversely, the DigiEva study in 2025 is partly low- stakes testing and partly semi- 
high- stakes testing. The low- stakes characteristic stems from the same fact as in the national assessment: 
the test score itself is not used in any selection process. However, DigiEva's semi- high- stakes charac-
teristic stems from the fact that teachers were encouraged to use the test results when finalizing the 
end- of- year summative evaluation of students. This was suggested to motivate students to perform 
their best. To this end, teachers were provided with a grading tool that suggested a mark (on a scale of 
5–10) for students based on their performance on the test. This suggested grade was based on grade- 
wise quantiles of the score distributions and the corresponding quantiles of the teachers' grades in the 
national register. For example, 13.3% of the best third- grade students received a mark of 10 (‘excellent’), 
while 11.0% of the best ninth- grade students received a mark of 10.
Analytical approach
To assess the representativeness of the final samples included in the analysis, we compared excluded 
and included participants in terms of grade, gender, mathematics competence belief, and test score for 
each emotion separately, as the amount of missing data varied between emotion variables. Pearson's 
chi- squared tests of independence were performed for the categorical variables (grade, gender and com-
petence belief), and analyses of variance were performed for the test score.
To account for the repeated measurements structure in the data (two responses nested within stu-
dents), we employed linear mixed- effects modelling (LMM) with a random intercept for participants 
utilizing lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages in R 4.5.0 software 
(R Core Team, 2023). We fitted separate models for each emotion. The emotion levels (on Likert scale) 
were the dependent variables, and time (‘before’ or ‘after’ the assessment), grade, gender, standardized 
test score (M = 0, SD = 1), competence belief in mathematics, and their possible interactions were used 
as predictors. We treated time, grade, gender and competence belief as categorical variables and test 
score as a continuous variable. In the full models (Tables A1–A4), the categorical variables were effect 
coded, meaning that each level of a variable was compared to the grand mean of that variable. To draw 
meaningful interpretations, emotion levels were treated as continuous variables, although this violates 
the normality assumption for LMMs. However, especially since our sample was large, we referred to 
studies suggesting that parametric statistics can be safely used even when assumptions are violated 
(Norman, 2010; Sullivan & Artino, 2013). The appropriateness of an LMM approach was supported by 
the estimated variance attributable to individual participants and the intraclass correlation coefficients 
for each emotion: .50 (happiness), .47 (relaxation), .51 (anxiety) and .41 (boredom), indicating substantial 
within- person clustering. Table 1 contains the descriptive statistics and correlations for the emotion 
variables.
We utilized a forward- elimination type of approach for selecting the final model. For each emotion, 
we started with a model with only the main effects of all predictors and the interactions between time 
and the other predictors, because we were exclusively interested in how the other predictors moder-
ated the effect of time. Then, we iteratively added three- way interactions that always included the time 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 |   KYYNÄRÄINEN et al.
variable one at a time to the model. If a three- way interaction was not significant, but the corresponding 
additional two- way interaction that did not include time was statistically significant, we left both in the 
final model. However, if neither the three- way interaction nor the additional two- way interaction was 
significant, we excluded them. To maintain simplicity, three- way interactions that did not include time 
were not tested.
We assessed the significance of the full terms with F- tests using the Satterthwaite approximation 
for the effective degrees of freedom. We performed post- hoc analyses to assess the effect of each cate-
gorical variable and their interactions, where p- values were adjusted for false discovery rate (Benjamini 
& Hochberg, 1995). Notably, due to the way that variance is partitioned in mixed models (Rights & 
Sterba, 2019), there is no agreed- upon approach to calculating standardized effect sizes for individual 
model terms. Hence, we report unstandardized effect sizes, which align with general recommendations 
(Pek & Flora, 2018).
R ESULTS
Missing data analysis
Grade, gender and competence belief of the participants were independent of the membership in ei-
ther the excluded or the included group for each emotion. For example, when comparing participants 
who were excluded due to missing data on the happiness variable with those who were not excluded, 
the chi- squared test results were as follows: χ2(3) = 1.27, p = .735 (grade); χ2(1) = .01, p = .943 (gender); 
and χ2(4) = 5.33, p = .255 (competence belief). However, the test score was significantly lower in the 
excluded participant group than in the included participant group for each emotion. The analysis of 
variance results were as follows: t(2,028) = 7.75, p < .001 (happiness); t(2,028) = 7.35, p < .001 (relaxation); 
t(2,028) = 7.69, p < .001 (anxiety) and t(2,028) = 7.34, p < .001 (boredom). These results suggest that the 
samples included in the analysis may not fully represent the original sample. It is important to note this 
limitation when interpreting the results of the mixed- effects models below.
Emotion: Happy
For happiness, the final model included only the main effects of the predictors and the interactions 
between time and the other predictors (see Table A1). First, the interaction between time and grade 
(F(3; 1,803) = 19.87, p < .001) was statistically significant, and post- hoc analysis revealed that, compared 
to pre- assessments, happiness level was significantly lower post- assessments in grade levels 6, 8 and 
9, but not in grade level 3 (see Table 2). The interaction between time and grade on happiness lev-
els is visualized in Figure 1. While the main effects of time (F(1; 1,803) = 206.34, p < .001) and grade 
(F(3; 1,803) = 80.46, p < .001) were significant, they were not meaningful because of the interaction be-
tween time and grade as the patterns with respect to time or grade were different depending on which 
grade level or time point was being examined, respectively.
T A B L E  1  Descriptive statistics and correlations for the emotion variables.
Variable N M SD 1 2 3 4
1. Happy 3626 2.78 1.33 –
2. Relaxed 3618 2.67 1.32 .64 –
3. Anxious 3600 2.01 1.25 −.22 −.24 –
4. Bored 3606 3.01 1.45 −.39 −.27 .23 –
Note: All correlations were statistically significant at the .001 level.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 9EMOTIONS BEFORE AND AFTER MATH TESTING
T A B L E  2  Results of pairwise comparisons (t tests) in students' happiness levels between before and after the 
assessments, when accounting for gender, grade level and competence belief.
N
Before After
t(1,803) pM SD M SD
Gender
Boy 913 3.08 1.21 2.74 1.45 9.06 <.001
Girl 900 2.87 1.21 2.41 1.36 12.63 <.001
Grade level
3 555 3.69 1.12 3.49 1.34 1.66 .105
6 495 2.95 1.09 2.72 1.29 4.42 <.001
8 355 2.39 1.10 1.77 1.07 10.39 <.001
9 408 2.55 1.10 1.85 1.11 12.37 <.001
Competence belief (‘I am good at mathematics’)
0 134 2.09 1.28 1.58 1.11 3.63 <.001
1 231 2.40 1.05 1.77 1.07 6.69 <.001
2 386 2.66 1.08 2.23 1.23 7.21 <.001
3 629 3.11 1.12 2.73 1.37 9.84 <.001
4 433 3.64 1.12 3.39 1.38 7.15 <.001
Total 1813 2.98 1.22 2.58 1.42 14.36 <.001
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
F I G U R E  1  Statistically significant interactions of time and grade level on the levels of the different emotions (happy, 
relaxed, anxious, bored).
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 |   KYYNÄRÄINEN et al.
The interaction between time and gender (F(1; 1,803) = 5.02, p = .025) was also significant; while 
happiness level decreased pre- to- post assessments for both boys and girls, girls had a larger decrease (see 
Table 1). The significant main effect of gender (F(1; 1,803) = 11.47, p = .001) was not meaningful; before 
the assessments, the difference between boys and girls was not significant (t(2,891) = 1.81, p = .071), but 
after the assessments, girls reported lower levels of happiness (t(2,891) = 4.05, p < .001). Additionally, the 
interaction between time and competence belief (F(4; 1,803) = .44, p = .781) was non- significant, unlike 
the main effect of competence belief (F(4; 1,803) = 40.25, p < .001), which is visualized in Figure 2; the 
level of happiness increased as the level of agreement to being good at mathematics increased, regardless 
of when the level of happiness was reported.
Finally, the interaction between time and test score (F(1; 1,803) = 6.41, p = .011) was significant, but 
the main effect of test score (F(1; 1,803) = .002, p = .962) was not: before the assessments, the slope for 
happiness as a function of test score was negative, whereas post- assessments, the slope was positive. As 
seen in Figure 3, low- performing students (e.g., scaled test score <−2) were happier before than after the 
assessments, whereas high- performing students (e.g., scaled test score >2) did not report lower or higher 
levels of happiness post- assessments compared to pre- assessments.
Emotion: Relaxed
For relaxation, in addition to the main effects and interactions between time and the other predictors, 
the final model (see Table A2) included a non- significant interaction term between time, grade and 
gender (F(3; 1,796) = 2.49, p = .058). First, similarly to happiness, the interaction between time and grade 
(F(3; 1,796) = 14.29, p < .001) was statistically significant (see Figure 1). Post- hoc analysis indicated that 
the level of relaxation was significantly lower after the assessments in grade levels 6, 8 and 9, but not in 
grade level 3 (see Table 3). The non- significant interaction between time, grade and gender suggested 
that this relationship between time and grade was gender independent. In addition, like with happi-
ness, the significant main effects of time (F(1; 1,796) = 121.23, p < .001) and grade (F(3; 1,796) = 27.72, 
p < .001) were not meaningful because of the interaction.
F I G U R E  2  Effect of competence belief (‘I am good at mathematics’; 0 = ‘completely disagree’, 4 = ‘completely agree’) on 
the levels of positive emotions (happy, relaxed), independent of time.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 11EMOTIONS BEFORE AND AFTER MATH TESTING
The interaction between time and gender (F(1; 1,796) = 1.02, p = .313) was non- significant, unlike 
the interaction between grade and gender (F(3; 1,796) = 4.90, p = .002). Girls reported lower levels of 
relaxation in grades 6, 8 and 9, but there was no such difference in grade level 3 (see Table 4). Since the 
three- way interaction between time, grade and gender was non- significant, these differences between 
boys and girls applied at a general level independent of time. The significant main effect of gender 
(F(1; 1,796) = 34.63, p < .001) was meaningless due to the interaction with grade.
The interaction between time and competence belief in mathematics (F(4; 1,796) = 2.30, p = .057) 
was not significant, but the main effect of competence belief (F(4; 1,796) = 25.75, p < .001) was. The 
level of relaxation increased as the level of agreement to being good at mathematics increased, regard-
less of time (see Figure 2). Finally, the main effect of test score (F(1; 1,796) = 2.83, p = .093) was non- 
significant, unlike the interaction between time and test score (F(1; 1,796) = 10.91, p < .001); the slope 
for relaxation as a function of test score was negative pre- assessments, whereas the slope was positive 
post- assessments (see Figure 3). Low- performing students were more relaxed before than after the 
assessments, whereas high- performing students did not report differing levels of relaxation between 
pre- and post- assessments.
Emotion: Anxious
For anxiety, in addition to the main effects and interactions between time and the other predictors, the 
final model (see Table A3) included non- significant interaction terms between time, gender and test 
score (F(1; 1,785) = .20, p = .653) and time, gender and competence belief (F(4; 1,785) = 1.43, p = .222). 
The interactions between time and grade (F(3; 1,785) = 4.51, p = .004) (see Figure 1) and time and gender 
F I G U R E  3  Interaction of time and test score on the levels of the different emotions (happy, relaxed, anxious, bored). 
Note that the interaction of time and test score on students' anxiety levels was not significant.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 |   KYYNÄRÄINEN et al.
(F(1; 1,785) = 5.74, p = .017) were significant. Post- hoc analyses showed significantly reduced anxiety lev-
els in 3rd and 6th graders post- assessments, whereas levels in 8th and 9th graders remained unchanged 
(see Table 5). The main effects of time (F(1; 1,785) = 11.83, p = .001) and grade (F(3; 1,785) = 9.59, 
p < .001) were significant but meaningless due to their interaction. Girls' anxiety levels decreased sig-
nificantly from pre- to post- assessments, whereas boys' anxiety levels remained the same (see Table 5). 
However, regardless of time, girls reported higher levels of anxiety (before: t(2,840) = −7.28, p < .001; 
after: t(2,840) = −4.90, p < .001). The non- significant three- way interactions between time, gender and 
competence belief and time, gender and test score suggested that the interaction between time and gen-
der was independent of competence belief and test score.
The interactions between gender and competence belief (F(4; 1,785) = 3.54, p = .007) and gender 
and test score (F(1; 1,785) = 7.63, p = .006) were significant, and because of the non- significance of 
the three- way interactions including time in addition to these variables, these two- way interactions 
T A B L E  3  Results of pairwise comparisons (t tests) in students' relaxation levels between before and after the assessments 
when accounting for gender, grade level and competence belief.
N
Before After
t(1,796) pM SD M SD
Gender
Boy 911 3.06 1.23 2.65 1.40 8.65 <.001
Girl 898 2.64 1.23 2.31 1.30 7.80 <.001
Grade level
3 557 3.27 1.22 3.03 1.40 .83 .430
6 492 2.80 1.23 2.59 1.33 3.28 .002
8 353 2.43 1.17 1.91 1.15 7.85 <.001
9 407 2.72 1.22 2.10 1.21 10.23 <.001
Competence belief (‘I am good at mathematics’)
0 132 1.95 1.14 1.67 1.14 1.35 .190
1 230 2.46 1.11 1.91 1.11 5.18 <.001
2 385 2.59 1.21 2.25 1.25 4.91 <.001
3 628 2.94 1.17 2.57 1.34 8.89 <.001
4 434 3.44 1.21 3.10 1.39 7.32 <.001
Total 1809 2.86 1.25 2.48 1.37 11.01 <.001
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
T A B L E  4  Results of pairwise comparisons (t tests) in students' relaxation levels averaged over time between boys and 
girls when accounting for grade level.
Grade level
Boys Girls
t(1,796) pN M SD N M SD
3 550 3.22 1.31 564 3.08 1.33 .23 .849
6 510 2.99 1.32 474 2.38 1.17 5.16 <.001
8 344 2.41 1.25 362 1.94 1.07 3.60 .001
9 418 2.58 1.31 396 2.22 1.17 2.72 .010
Total 1822 2.86 1.34 1796 2.47 1.28 5.89 <.001
Abbreviations: M, mean; N, sample size; SD, standard deviation.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 13EMOTIONS BEFORE AND AFTER MATH TESTING
were independent of time (see Figures 4 and 5, respectively). Based on post- hoc analysis, the differ-
ence between boys' and girls' anxiety levels averaged over time appeared to decrease as the level of 
agreement to being good at mathematics increased (see Table 6); among students who perceived low 
math competence, girls reported higher levels of anxiety, whereas among students with high perceived 
T A B L E  5  Results of pairwise comparisons (t tests) in students' anxiety levels between before and after the assessments 
when accounting for gender, grade level and competence belief.
N
Before After
t(1,785) pM SD M SD
Gender
Boy 905 1.87 1.14 1.77 1.20 .69 .490
Girl 895 2.31 1.26 2.10 1.31 4.48 <.001
Grade level
3 551 1.85 1.14 1.59 1.05 3.59 .001
6 491 2.05 1.18 1.84 1.17 3.40 .002
8 350 2.23 1.29 2.28 1.42 −1.14 .308
9 408 2.34 1.26 2.23 1.37 1.10 .315
Competence belief (‘I am good at mathematics’)
0 133 2.66 1.51 2.51 1.60 1.58 .165
1 229 2.28 1.25 2.29 1.40 −.18 .941
2 384 2.18 1.17 2.06 1.30 1.90 .092
3 622 2.02 1.16 1.85 1.16 2.79 .014
4 432 1.83 1.16 1.58 1.07 2.36 .034
Total 1800 2.09 1.22 1.94 1.27 3.44 .001
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
F I G U R E  4  Interaction of gender and competence belief (‘I am good at mathematics’; 0 = ‘completely disagree’, 
4 = ‘completely agree’) on the levels of the negative emotions (anxious, bored), independent of time.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 |   KYYNÄRÄINEN et al.
math competence, girls and boys reported similar levels of anxiety. Due to this interaction, the sig-
nificant main effects of gender (F(1; 1,785) = 49.23, p < .001) and competence belief (F(4; 1,785) = 7.26, 
p < .001) were not meaningful. Finally, among girls, the slope of anxiety level as a function of test score 
was positive, whereas among boys, the slope was negative; high- performing girls were more anxious 
than high- performing boys, whereas low- performing boys and girls reported similar levels of anxiety, 
regardless of time. The main effect of test score (F(1; 1,785) = .16, p = .693) was not significant.
Emotion: Bored
For boredom, in addition to the main effects and interactions between time and the other predictors, 
the final model (see Table A4) included non- significant interaction terms between time, grade and test 
score (F(3; 1,786) = .92, p = .431) and time, gender and competence belief (F(4; 1,786) = 2.36, p = .051). 
The interactions between time and grade (F(3; 1,786) = 5.56, p < .001) (Figure 1), time and test score 
(F(1; 1,786) = 4.06, p = .044) and grade and test score (F(3; 1,786) = 3.58, p = .013) were all statistically 
significant, while the three- way interaction including these three variables was not, suggesting that each 
two- way interaction was independent of the third variable. Based on post- hoc analysis, boredom levels 
F I G U R E  5  Interaction of gender and test score on the students' general level of anxiety, and interaction of grade and test 
score on the students' general level of boredom, both of which were independent of time.
T A B L E  6  Results of pairwise comparisons (t tests) in students' anxiety levels averaged over time between boys and girls 
when accounting for their competence beliefs (‘I am good at mathematics’).
Competence belief
Boys Girls
t(1,785) pN M SD N M SD
0 94 2.18 1.56 172 2.81 1.51 −3.77 .001
1 176 2.00 1.26 282 2.46 1.34 −3.57 .001
2 356 1.90 1.12 412 2.31 1.30 −4.48 <.001
3 642 1.79 1.13 602 2.08 1.18 −3.39 .002
4 542 1.68 1.12 322 1.76 1.13 −.63 .675
Total 1810 1.82 1.17 1790 2.20 1.29 −7.02 <.001
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 15EMOTIONS BEFORE AND AFTER MATH TESTING
of 8th and 9th graders increased significantly from pre- to post- assessments, whereas boredom levels of 
3rd and 6th graders remained at the same level (see Table 7). The main effect of time (F(1; 1,786) = 3.48, 
p = .062) was not significant, and while the main effect of grade (F(3; 1,786) = 27.26, p < .001) was sig-
nificant, it was not meaningful in the presence of the interaction.
The interaction between time and test score (see Figure 3) reveals an inverse pattern compared 
to happiness and relaxation; pre- assessments, the slope for boredom as a function of test score was 
positive, whereas post- assessments, the slope was negative. In other words, low- performing students 
were more bored after the assessments, whereas high- performing students reported similar levels of 
boredom before and after. Moreover, the interaction between grade and test score (see Figure 5) cap-
tures a negative slope for boredom level as a function of test score for the 3rd graders, and positive or 
near- zero for others. That is, for the 3rd graders, a higher test score was associated with a lower level of 
boredom in general, no matter when the level of boredom was reported. The main effect of test score 
(F(1; 1,786) = .27, p = .607) was not significant.
Finally, the interaction between gender and competence belief (F(4; 1,786) = 2.56, p = .037) was 
significant and independent of time (see Figure 4). Among students who did not believe to be good at 
mathematics (‘completely disagree’, ‘disagree’), no significant difference could be found between boys' 
and girls' levels of boredom (see Table 8). Conversely, girls who believed to be good at mathematics 
(‘completely agree’) reported significantly lower levels of boredom than boys with similar beliefs. Due 
to this interaction, the significant main effect of competence belief (F(4; 1,786) = 13.03, p < .001) was 
not meaningful. The main effect of gender (F(1; 1,786) = 2.94, p = .087) was not significant.
DISCUSSION
Our results suggest that students' emotions associated with math testing are dynamic and affected by 
several student characteristics. First, we see significant differences in the emotional states from before 
T A B L E  7  Results of pairwise comparisons (t tests) in students' boredom levels between before and after the assessments 
when accounting for gender, grade level and competence belief.
N
Before After
t(1,786) pM SD M SD
Gender
Boy 907 3.01 1.38 3.07 1.58 −.71 .573
Girl 896 2.92 1.33 3.06 1.49 −2.17 .092
Grade level
3 556 2.51 1.36 2.51 1.48 1.43 .177
6 491 3.00 1.33 2.98 1.43 .72 .470
8 350 3.36 1.27 3.64 1.47 −3.50 .001
9 406 3.20 1.29 3.43 1.51 −2.57 .015
Competence belief (‘I am good at mathematics’)
0 130 3.52 1.48 3.57 1.60 .81 .486
1 230 3.41 1.28 3.55 1.48 −1.12 .331
2 384 3.12 1.27 3.37 1.45 −2.74 .011
3 626 2.84 1.30 2.96 1.48 −2.19 .045
4 433 2.60 1.40 2.55 1.52 −.71 .539
Total 1803 2.96 1.36 3.07 1.53 −1.86 .063
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 |   KYYNÄRÄINEN et al.
to after the assessment. For all students except 3rd graders, levels of positive emotions (happiness, 
relaxation) generally decreased from pre- to post- assessments, contrasting previous findings and our 
hypothesis H1 of post- assessment- relaxation (Goetz et al., 2007; Pekrun et al., 2002). Anxiety, on the 
other hand, followed a similar decreasing trend for elementary school students, while middle school 
students still reported similar levels of anxiety after the test. Conversely, middle school students' 
boredom levels increased from before to after the assessment, whereas elementary school students' 
boredom levels remained the same. This partially supports H1, as happiness and anxiety, represent-
ing the activating emotions, were at higher level before the test for some groups of the students. 
Moreover, the results were well aligned with previous findings of decreasing enjoyment (positive 
activating emotion) and increasing boredom (negative deactivating emotion) from pre- to- post- testing 
by Goetz et al. (2007).
Nonetheless, particularly among older students, we see a decrease in positive emotions, accompanied 
by an increase in negative deactivating emotions after the assessment. Younger students expressed plau-
sibly more stable emotional states, however, including a decrease in anxiety after the assessment. This 
suggests their pre- assessment anxiety may differ from that of older students—possibly more state- like, 
dynamic and sensitive to environmental changes (cf. Halme et al., 2022; Pekrun et al., 2018). Overall, 
these findings support H2, indicating that individual factors influence emotional shifts.
Second, gender was a significant predictor of emotional states. In general, aligned with previous 
findings, boys experienced higher levels of positive emotions and lower levels of anxiety (Erturan & 
Jansen, 2015; Frenzel et al., 2007; Justicia- Galiano et al., 2023). While there were no gender differ-
ences in happiness levels before the assessment, boys were happier than girls afterwards. Boys also 
experienced higher levels of relaxation, both before and after the assessment, whereas, conversely, 
girls reported higher levels of anxiety. Aligning with the findings of Erturan and Jansen (2015), we 
observed that gender differences in test anxiety evened out if girls perceived higher math competence 
beliefs. This might indicate higher trait- like test anxiety among girls, previously proposed by Goetz 
et al. (2013).
Third, aligning with prior research (Haciomeroglu, 2019; Liu et al., 2018; Van Der Beek et al., 2017), 
students' math competence beliefs indicated differences in the general levels of students' emotions. 
Students who perceived themselves as more competent in math experienced higher levels of positive 
emotions. Our findings also align with previous research pointing to a significant decreasing trend 
in the level of students' mathematics competence belief across the early years of elementary school to 
secondary level education (Nagy et al., 2010), as younger students broadly experienced more positive 
emotions.
Additionally, we observed an impact of math competence belief on students' negative emotions in 
interaction with other student characteristics; high math competence was associated with lower anxiety 
and boredom, yet girls with low competence beliefs were highly anxious, and girls with high competence 
T A B L E  8  Results of pairwise comparisons (t tests) in students' boredom levels averaged over time between boys and 
girls when accounting for their competence beliefs (‘I am good at mathematics’).
Competence belief
Boys Girls
t(1,786) pN M SD N M SD
0 92 3.58 1.65 168 3.52 1.49 .20 .882
1 178 3.59 1.46 282 3.41 1.33 .99 .417
2 358 3.20 1.39 410 3.29 1.34 −1.42 .221
3 644 3.00 1.43 608 2.79 1.34 2.05 .067
4 542 2.71 1.50 324 2.35 1.35 2.95 .008
Total 1814 3.04 1.48 1792 2.99 1.42 1.71 .087
Note: For competence belief, 0 corresponds to ‘completely disagree’ and 4 corresponds to ‘completely agree’.
Abbreviations: M, mean; N, sample size; SD, standard deviation.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 17EMOTIONS BEFORE AND AFTER MATH TESTING
beliefs were even less bored than boys with high competence beliefs. These findings support our hy-
pothesis H3, indicating that individual factors impact students' emotional states during assessment in a 
similar way they predict math- related emotions overall. The results may point to a hypothesis that par-
ticularly girls with low self- perceived math competence could plausibly benefit from additional affective 
scaffolding, supporting their regulation of negative emotions, during mathematics learning. However, 
further research is needed to support this hypothesis. Competence belief was strongly associated with 
performance, as a post hoc test with a linear regression model with standardized test score as the depen-
dent variable and gender (F(1; 2,020) = 2.24, p = .135), competence belief (F(4; 2,020) = 11.97, p < .001) 
and their interaction (F(4; 2,020) = .34, p = .850) as predictors, reveals that students who had higher 
competence beliefs performed better on the test.
Based on our findings, students who performed poorly showed a decline in positive emotions post- 
assessment, suggesting that low performance feels emotionally discouraging. The finding also supports 
our hypotheses H2 and H3, portraying test- performance as a significant predictor of (changes in) stu-
dents' emotional states. Additionally, as an indication of heightened anxiety among girls (cf. Frenzel 
et al., 2007), high- performing girls experienced more anxiety than high- performing boys, while lower- 
performing girls experienced similar levels of anxiety to low- performing boys. Conversely, poorly per-
forming students experienced an increase in negative deactivating boredom. We suggest that this is a 
maladaptive coping strategy, demonstrated by over- challenged students, which might be linked to the 
experience of low control over the situation (D'Mello et al., 2014; Kleine et al., 2005; Pekrun et al., 2010; 
Schwartze et al., 2020). However, for 3rd graders, high- performing students experienced less boredom 
in general, indicating that the assessment may have been more engaging for them (cf. Bekker et al.). 
Perhaps high- performers among older students may experience greater pressure and stress than their 
younger peers (Parviainen et al., 2021; Salmela- Aro et al., 2018), which could foster a more cynical 
attitude towards learning—a tendency reflected in their higher levels of boredom (Bekker et al., 2023; 
Ramos- Vera et al., 2025).
Although the participants of this study did not receive any formal feedback during the test, they may 
have engaged in self- evaluation of their performance. Based on previous studies, self- reflection and 
self- assessment can also function as feedback (Vogl & Pekrun, 2016) and influence students' emotional 
levels (Tulis & Ainley, 2011). In general, success feedback tends to elicit positive emotions, while failure 
feedback can trigger negative affects (Tulis & Ainley, 2011). Previous research also states that boys tend 
to react more adaptively to their errors than girls (Soncini et al., 2022). Our findings contrast with this 
result as we did not observe any gender differences between the shifts from pre- to post- assessment for 
lower- performing students. Encouraging feedback provided in a digital learning environment has been 
found to mitigate students' negative emotional responses to failure (Narciss & Alemdag, 2025; Soncini 
et al., 2025; Tulis & Dresel, 2025), which could also be utilized in digital assessments. This could sup-
port students' adaptive reactions, promoting continuous learning following the assessment. Educators 
should, however, bear in mind that some negative emotions, experienced at a moderate level, can be 
linked to higher agency and increase students' performance by activating more efficient metacognitive 
strategies (Pekrun et al., 2002; Tulis & Fulmer, 2013).
Consequently, instead of evading negative emotions, educators should invest in supporting students' 
meta- affective learning (Radoff et al., 2019; Sharabi & Roth, 2025). This means promoting behavioural 
shifts in students towards seeing frustration and challenges as excellent learning opportunities—trans-
forming the negative affective experiences into positive meta- affects. As a result, facing challenges 
or disappointment in one's performance can act as a driver for better performance in the future. 
Particularly, students who are attentive to negative emotions and demonstrate adaptive coping practices 
are likely to learn from their academic failure (Sharabi & Roth, 2025). Based on the observed maladap-
tive reactions to academic failure among low performers, we suggest that they would particularly benefit 
from interventions supporting meta- affective learning (cf. Burleson, 2006).
Finally, although much of the previous research has been conducted in low- stakes testing settings, 
our findings align with that. This supports a good generalizability of the findings on students' emotions 
associated with mathematics testing across various settings and environments.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 |   KYYNÄRÄINEN et al.
Limitations and future research
A limitation is that this was the first pilot of the digital assessment, making the setting unfamiliar 
for many students, likely affecting their experiences (cf. Putwain, 2008). The length and content of 
the test may have impacted students' emotional levels, but generally, both were designed according 
to what students are generally used to at each grade level. Furthermore, there are limitations related 
to the generalizability of the results. Originally, 3387 students from 12 municipalities representing 
three of the six geographical areas and two out of three types of municipalities participated. Thus, 
the dataset accurately represents Southern and Southwestern Finland and Lapland, but the results 
cannot be generalized to the eastern, western, inland and northern areas of Finland. Overall, the 
results can also be generalized to urban municipalities and municipalities with high population 
density, but not to rural municipalities. Moreover, the final sample included in the analysis may not 
have fully represented the original sample. This is because we found that those who were excluded 
due to missing data in the emotion variables achieved statistically significantly lower test scores than 
those who were not excluded.
Additionally, students' background math competence beliefs were measured at the same time as 
their before- assessment- emotions. Accordingly, these before- assessment- emotions could have impacted 
students' competence beliefs. This sets limitations on interpreting the directions of the interactions, 
and in future studies, students' background information should be measured before testing. This could 
provide clearer perspectives on, for example, the role of internalized, more general math competence in 
predicting students' pre- assessment- emotions. Furthermore, students' competence beliefs and emotions 
were measured with one item, and therefore, measurement errors could not be modelled (see Gogol 
et al., 2014). Nevertheless, based on previous research, for measuring motivational- affective constructs, 
such as emotions competence belief, single- item indicators offer a psychometrically sound alternative to 
longer scales (see Allen et al., 2022; Goetz et al., 2016; Gogol et al., 2014; Hoeppner et al., 2011). For ex-
ample, the single- item, ‘I am good at mathematics’, used in this study expresses good predictive validity 
of students' mathematics self- concept based on previous studies (cf. Metsämuuronen, 2012; Fennema 
& Sherman, 1976).
Additionally, state and trait- like anxiety might operate differently and have a different impact on 
students' performance (cf. Halme et al., 2022; Tulis & Ainley, 2011). In this study, we were not able to 
identify the stability of the emotion and explicitly distinguish, for example, trait- like test or math anxiety 
from state anxiety. This should be considered in future studies focusing on the dynamics of emotional 
trajectories. Finally, the emotion items appeared several times during the test, which could have affected 
the emotional landscapes of students. However, we only measured the levels of emotions before and 
after the assessment tasks. Future studies should consider the emotional trajectories during testing, 
providing a deeper understanding of students' emotional responses to specific tasks.
CONCLUSIONS
This study examined how students' levels of four emotions (i.e., happiness, relaxation, anxiety and 
boredom) develop from before to after assessment, investigating the role of individual factors (i.e., 
gender, grade level, perceived mathematical competence and test performance) in predicting students' 
emotional states and moderating their emotional trajectories. The sample (N = 2179) consisted of stu-
dents from various schools across Finland, who participated in a digital, semi- high- stakes, end- of- year 
mathematics assessment. Students' emotions were measured immediately before and after the assess-
ment using an in- situ approach, and all analyses were conducted using linear mixed- effects modelling. 
The results point out that students reported lower positive emotions after the assessment. However, the 
measured individual factors significantly predicted both students' emotional states and how they devel-
oped during the assessment. Younger students remained more positive during the assessment and even 
experienced a decrease in anxiety after. In general, boys reported higher levels of positive emotions and 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 19EMOTIONS BEFORE AND AFTER MATH TESTING
lower anxiety and students who perceived themselves as competent experienced higher levels of posi-
tive and lower levels of negative emotions. Finally, students who performed poorly showed a decline in 
positive emotions during the assessment. We suggest that students with heightened negative emotions 
would benefit from additional emotional support, fostering meta- affective learning, after completing a 
higher- stakes assessment.
AUTHOR CONTR IBUTIONS
Reetta Kyynäräinen: Conceptualization; writing – original draft; writing – review and editing; visu-
alization. Santeri Holopainen: Formal analysis; methodology; visualization; writing – original draft; 
writing – review and editing. Jari Metsämuuronen: Conceptualization; investigation; project ad-
ministration; writing – review and editing. Umar Bin Qushem: Conceptualization; writing – review 
and editing. Mikko- Jussi Laakso: Funding acquisition; project administration. Katarina Alanko: 
Conceptualization; investigation; project administration; writing – review and editing.
ACK NOW L EDGEM ENTS
We would like to express our gratitude to Teemu Holopainen for helping us with the initial analysis. 
The study is funded by the Research Council of Finland (EDUCA Flagship #358924, #358947) and 
the Ministry of Education and Culture (Doctoral school pilot #VN/3137/2024- OKM- 4). Open access 
publishing facilitated by Turun yliopisto, as part of the Wiley - FinELib agreement.
CONFL IC T OF I NT ER EST STAT EM ENT
No potential conflict of interest was reported by the authors.
DATA AVA IL A BIL IT Y STAT EM ENT
The data that support the findings of this study are available on request from the corresponding author. 
The data are not publicly available due to privacy or ethical restrictions.
ORCID
Reetta Kyynäräinen  https://orcid.org/0009-0000-1592-7384 
Santeri Holopainen  https://orcid.org/0000-0002-6777-6247 
Jari Metsämuuronen  https://orcid.org/0000-0001-6027-0799 
Umar Bin Qushem  https://orcid.org/0000-0003-0845-3285 
Mikko- Jussi Laakso  https://orcid.org/0000-0001-9163-2676 
Katarina Alanko  https://orcid.org/0000-0003-0513-377X 
R EF ER ENC E S
Allen, M. S., Iliescu, D., & Greiff, S. (2022). Single item measures in psychological science: A call to action. European Journal of 
Psychological Assessment, 38(1), 1–5. https:// doi. org/ 10. 1027/ 1015- 5759/ a000699
Ball, S. (1995). Anxiety and test performance. In C. D. Speilberger & P. R. Vagg (Eds.), Test anxiety: Theory, assessment and treatment 
(pp. 107–113). Taylor & Francis.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed- effects models using lme4. Journal of Statistical 
Software, 67(1), 1–48. https:// doi. org/ 10. 18637/ jss. v067. i01
Bekker, C. I., Rothmann, S., & Kloppers, M. M. (2023). The happy learner: Effects of academic boredom, burnout, and engage-
ment. Frontiers in Psycholog y, 13, 974486. https:// doi. org/ 10. 3389/ fpsyg. 2022. 974486
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple 
testing. Journal of the Royal Statistical Society: Series B: Methodological, 57(1), 289–300. https:// doi. org/ 10. 1111/j. 2517- 6161. 1995. 
tb020 31. x
Bertram, R., Rautaoja, T., Holopainen, S., Häikiö, T., Enges, P., Hyönä, J., Lehtonen, M., Pugh, K. R., Rueckl, J. G., Salmela, 
R., Siegelman, N., & Räsänen, P. (2025). Assessing vocabulary skills of school children aged 9 to 15 in Finland: Tracking the gender and 
home language gap [preprint]. Research Square. https:// doi. org/ 10. 21203/ rs.3. rs- 64480 49/ v1
Beymer, P. N., Robinson, K. A., & Schmidt, J. A. (2021). Classroom activities as predictors of control, value, and state emotions 
in science. The Journal of Educational Research, 114(6), 550–561. https:// doi. org/ 10. 1080/ 00220 671. 2021. 1997882
Brun, G., Doğuoğlu, U., & Kuenzle, D. (2008). Epistemolog y and emotions. Ashgate.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
20 |   KYYNÄRÄINEN et al.
Burleson, W. (2006). Affective learning companions: Strategies for empathetic agents with real- time multimodal affective sensing to foster meta- 
cognitive and meta- affective approaches to learning, motivation, and perseverance [Doctoral dissertation]. Massachusetts Institute of 
Technology. https:// dspace. mit. edu/ handle/ 1721.1/ 37404 
Dirk, J., & Nett, U. E. (2022). Uncovering the situational impact in educational settings: Studies on motivational and emotional 
experiences. Learning and Instruction, 81, 101661. https:// doi. org/ 10. 1016/j. learn instr uc. 2022. 101661
D'Mello, S., & Graesser, A. (2011). The half- life of cognitive- affective states during complex learning. Cognition and Emotion, 
25(7), 1299–1308. https:// doi. org/ 10. 1080/ 02699 931. 2011. 613668
D'Mello, S., Lehman, B., Pekrun, R., & Graesser, A. (2014). Confusion can be beneficial for learning. Learning and Instruction, 29, 
153–170. https:// doi. org/ 10. 1016/j. learn instr uc. 2012. 05. 003
EDUFI. (2014). National core curriculum 2014. Määräykset ja ohjeet 2014:96. Finnish National Agency for Education [in Finnish]. 
https:// www. oph. fi/ sites/ defau lt/ files/ docum ents/ perus opetu ksen_ opetu ssuun nitel man_ perus teet_ 2014. pdf
EDUFI. (2020). Assessment of pupils' learning and competence in basic education. Amendments to the national core curriculum for basic 
education 2014. Finnish National Agency for Education 10.2.2020 [in Finnish]. https:// www. oph. fi/ sites/ defau lt/ files/ 
docum ents/ perus opetu ksen- arvio intil uku- 10- 2- 2020_2. pdf
EDUFI. (2022). National core curriculum for preprimary education 2022. Määräykset ja ohjeet 2022:2a. Finnish National Agency 
for Education [in Finnish]. https:// www. oph. fi/ fi/ tilas tot- ja- julka isut/ julka isut/ varha iskas vatus suunn itelm an- perus 
teet- 2022
Erturan, S., & Jansen, B. (2015). An investigation of boys' and girls' emotional experience of math, their math performance, 
and the relation between these variables. European Journal of Psycholog y of Education, 30(4), 421–435. https:// doi. org/ 10. 1007/ 
s1021 2- 015- 0248- 7
Fennema, E., & Sherman, J. A. (1976). Fennema- Sherman mathematics attitudes scales: Instruments designed to measure atti-
tudes toward the learning of mathematics by females and males. Journal for Research in Mathematics Education, 7(5), 324–326. 
https:// doi. org/ 10. 2307/ 748467
Frenzel, A. C., Pekrun, R., & Goetz, T. (2007). Girls and mathematics – A “hopeless” issue? A control- value approach to gender 
differences in emotions towards mathematics. European Journal of Psycholog y of Education, 22(4), 497–514.
Goetz, T., Bieg, M., Lüdtke, O., Pekrun, R., & Hall, N. C. (2013). Do girls really experience more anxiety in mathematics? 
Psychological Science, 24(10), 2079–2087. https:// doi. org/ 10. 1177/ 09567 97613 486989
Goetz, T., Preckel, F., Pekrun, R., & Hall, N. C. (2007). Emotional experiences during test taking: Does cognitive ability make 
a difference? Learning and Individual Differences, 17(1), 3–16. https:// doi. org/ 10. 1016/j. lindif. 2006. 12. 002
Goetz, T., Sticca, F., Pekrun, R., Murayama, K., & Elliot, A. J. (2016). Intraindividual relations between achievement goals and 
discrete achievement emotions: An experience sampling approach. Learning and Instruction, 41, 115–125. https:// doi. org/ 
10. 1016/j. learn instr uc. 2015. 10. 007
Gogol, K., Brunner, M., Goetz, T., Martin, R., Ugen, S., Keller, U., Fischbach, A., & Preckel, F. (2014). “My questionnaire is 
too long!” the assessments of motivational- affective constructs with three- item and single- item measures. Contemporary 
Educational Psycholog y, 39(3), 188–205. https:// doi. org/ 10. 1016/j. cedps ych. 2014. 04. 002
Haciomeroglu, G. (2019). The relationship between elementary students' achievement emotions and sources of mathematics 
self- efficacy. International Journal of Research in Education and Science, 5(2), 548–559.
Harley, J. M., Pekrun, R., Taxer, J. L., & Gross, J. J. (2019). Emotion regulation in achievement situations: An integrated model. 
Educational Psychologist, 54(2), 106–126. https:// doi. org/ 10. 1080/ 00461 520. 2019. 1587297
Halme, H., Trezise, K., Hannula- Sormunen, M., & McMullen, J. (2022). Mathematics anxiety and performance across adaptive 
and routine tasks. Journal of Numerical Cognition, 8(3), 414–429. https:// doi. org/ 10. 5964/ jnc. 7675
Harju, K., Sandberg, R., Lerkkanen, M.- K., & Laakso, M.- J. (2025). Oppilasarvioinnin palaute ja sen hyödyntäminen: kaksivuo-
tisen esiopetuksen kokeilun ensimmäisen luokan oppilasarviointien palaute opettajille ja rehtoreille.
Hellstrand, H., Holopainen, S., Korhonen, J., Räsänen, P., Hakkarainen, A., Laakso, M.- J., Laine, A., & Aunio, P. (2024). 
Arithmetic fluency and number processing skills in identifying students with mathematical learning disabilities. Research 
in Developmental Disabilities, 151, 104795. https:// doi. org/ 10. 1016/j. ridd. 2024. 104795
Hoeppner, B. B., Kelly, J. F., Urbanoski, K. A., & Slaymaker, V. (2011). Comparative utility of a single- item versus multiple- 
item measure of self- efficacy in predicting relapse among young adults. Journal of Substance Abuse Treatment, 41(3), 305–312. 
https:// doi. org/ 10. 1016/j. jsat. 2011. 04. 005
Jarrell, A., & Lajoie, S. P. (2017). The regulation of achievements emotions: Implications for research and practice. Canadian 
Psycholog y/Psychologie Canadienne, 58(3), 276–287. https:// doi. org/ 10. 1037/ cap00 00119 
Junça- Silva, A., Caetano, A., & Rueff Lopes, R. (2018). Activated or deactivated? Understanding how cognitive appraisals can 
drive emotional activation in the aftermath of daily work events. European Review of Applied Psycholog y, 68(4–5), 189–198. 
https:// doi. org/ 10. 1016/j. erap. 2018. 10. 001
Justicia- Galiano, M. J., Martín- Puga, M. E., Linares, R., & Pelegrina, S. (2023). Gender stereotypes about math anxiety: Ability 
and emotional components. Learning and Individual Differences, 105, 102316. https:// doi. org/ 10. 1016/j. lindif. 2023. 102316
Kaur, T., McLoughlin, E., & Grimes, P. (2022). Mathematics and science across the transition from primary to secondary school: 
A systematic literature review. International Journal of STEM Education, 9, 13. https:// doi. org/ 10. 1186/ s4059 4- 022- 00328 - 0
Ketonen, E. E., Salonen, V., Lonka, K., & Salmela- Aro, K. (2023). Can you feel the excitement? Physiological correlates of stu-
dents' self- reported emotions. British Journal of Educational Psycholog y, 93(S1), 113–129. https:// doi. org/ 10. 1111/ bjep. 12534 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 21EMOTIONS BEFORE AND AFTER MATH TESTING
Kleine, M., Goetz, T., Pekrun, R., & Hall, N. (2005). The structure of students' emotions experienced during a mathematical 
achievement test. Zentralblatt für Didaktik der Mathematik, 37(3), 221–225. https:// doi. org/ 10. 1007/ s1185 8- 005- 0012- 6
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal 
of Statistical Software, 82(13), 1–26. https:// doi. org/ 10. 18637/ jss. v082. i13
Laakso, M.- J., Kaila, E., & Rajala, T. (2018). ViLLE – collaborative education tool: Designing and utilizing an exercise- based 
learning environment. Education and Information Technologies, 23(4), 1655–1676. https:// doi. org/ 10. 1007/ s1063 9- 017- 9659- 1
Liu, R.- D., Zhen, R., Ding, Y., Liu, Y., Wang, J., Jiang, R., & Xu, L. (2018). Teacher support and math engagement: Roles of 
academic self- efficacy and positive emotions. Educational Psycholog y, 38(1), 3–16. https:// doi. org/ 10. 1080/ 01443 410. 2017. 
1359238
Mata, L., Monteiro, V., Peixoto, F., Santos, N. N., Sanches, C., & Gomes, M. (2022). Emotional profiles regarding maths among 
primary school children – A two- year longitudinal study. European Journal of Psycholog y of Education, 37(2), 391–415. https:// 
doi. org/ 10. 1007/ s1021 2- 020- 00527 - 9
Metsämuuronen, J. (2009). Metodit arvioinnin apuna [Methods assisting the assessment]. Oppimistulosten arviointi 1/2009. 
Opetushallitus [in Finnish].
Metsämuuronen, J. (2012). Challenges of the Fennema- Sherman test in the international comparisons. International Journal of 
Psychological Studies, 4(3), 1. https:// doi. org/ 10. 5539/ ijps. v4n3p1
Metsämuuronen, J. (2023). Matematiikkaa COVID- 19- pandemian varjossa III. Syventäviä analyyseja matematiikan 9. Luokan arvioinnista 
keväällä 2021. Julkaisut 31:2023. Kansallinen Koulutuksen Arviointikeskus.
Metsämuuronen, J., & Ukkola, A. (2019). Alkumittauksen menetelmällisiä ratkaisuja. Julkaisut 18:2019. Kansallinen Koulutuksen 
Arviointikeskus [in Finnish]. https:// www. karvi. fi/ sites/ defau lt/ files/ sites/ defau lt/ files/ docum ents/ KARVI_ 1819. pdf
Metsämuuronen, J., & Ukkola, A. (2022). Rudimentary stages of the mathematical thinking and proficiency. Mathematical skills 
of low- performing pupils at the beginning of the first grade. LUMAT: International Journal on Math, Science and Technolog y 
Education, 10(2), 56–83. https:// doi. org/ 10. 31129/ LUMAT. 10.2. 1632
Mölsä, M. E., Forsman, A. K., & Söderberg, P. (2025). The role of peers, teachers, and family on daily positive affect among 
school students: A 10- day experience sampling study. Educational Psycholog y, 45(1), 87–104. https:// doi. org/ 10. 1080/ 01443 
410. 2025. 2449975
Nagy, G., Watt, H., Eccles, J., Trautwein, U., Lüdtke, O., & Baumert, J. (2010). The development of students' mathematics self- 
concept in relation to gender: Different countries, different trajectories? Journal of Research on Adolescence, 20(3), 482–506. 
https:// doi. org/ 10. 1111/j. 1532- 7795. 2010. 00644. x
Narciss, S., & Alemdag, E. (2025). Learning from errors and failure in educational contexts: New insights and future directions 
for research and practice. British Journal of Educational Psycholog y, 95(1), 197–218. https:// doi. org/ 10. 1111/ bjep. 12716 
Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 
625–632. https:// doi. org/ 10. 1007/ s1045 9- 010- 9222- y
Parker, P. D., Schoon, I., Tsai, Y.- M., Nagy, G., Trautwein, U., & Eccles, J. S. (2012). Achievement, agency, gender, and socio-
economic background as predictors of postschool choices: A multicontext study. Developmental Psycholog y, 48(6), 1629–1642. 
https:// doi. org/ 10. 1037/ a0029167
Parviainen, M., Aunola, K., Torppa, M., Lerkkanen, M. K., Poikkeus, A. M., & Vasalampi, K. (2021). Early antecedents of school 
burnout in upper secondary education: A five- year longitudinal study. Journal of Youth and Adolescence, 50, 231–245. https:// 
doi. org/ 10. 1007/ s1096 4- 020- 01331 - w
Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological 
Methods, 23(2), 208–225. https:// doi. org/ 10. 1037/ met00 00126 
Pekrun, R. (2006). The control- value theory of achievement emotions: Assumptions, corollaries, and implications for edu-
cational research and practice. Educational Psycholog y Review, 18(4), 315–341. https:// doi. org/ 10. 1007/ s1064 8- 006- 9029- 9
Pekrun, R., Frenzel, A. C., Götz, T., & Muis, K. R. (2018). Emotions at school. Routledge is an Imprint of the Taylor & Francis 
Group, an Informa Business. https:// doi. org/ 10. 4324/ 97813 15187822
Pekrun, R., Goetz, T., Daniels, L. M., Stupnisky, R. H., & Perry, R. P. (2010). Boredom in achievement settings: Exploring con-
trol–value antecedents and performance outcomes of a neglected emotion. Journal of Educational Psycholog y, 102(3), 531–549. 
https:// doi. org/ 10. 1037/ a0019243
Pekrun, R., Goetz, T., Titz, W., & Perry, R. P. (2002). Academic emotions in students' self- regulated learning and achievement: 
A program of qualitative and quantitative research. Educational Psychologist, 37(2), 91–105. https:// doi. org/ 10. 1207/ S1532 
6985E P3702_ 4
Pekrun, R., Vogl, E., Muis, K. R., & Sinatra, G. M. (2017). Measuring emotions during epistemic activities: The epistemically- 
related emotion scales. Cognition and Emotion, 31(6), 1268–1276. https:// doi. org/ 10. 1080/ 02699 931. 2016. 1204989
Pierson, A. E., Brady, C. E., & Lee, S. J. (2023). Emotional configurations in STEM classrooms: Braiding feelings, sensemaking, 
and practices in extended investigations. Science Education, 107(5), 1126–1162. https:// doi. org/ 10. 1002/ sce. 21799 
Putwain, D. (2008). Do examinations stakes moderate the test anxiety- examination performance relationship? Educational 
Psycholog y, 28(2), 109–118. https:// doi. org/ 10. 1080/ 01443 41070 1452264
Putwain, D. W., Schmitz, E. A., Wood, P., & Pekrun, R. (2021). The role of achievement emotions in primary school mathemat-
ics: Control- value antecedents and achievement outcomes. British Journal of Educational Psycholog y, 91(1), 347–367. https:// 
doi. org/ 10. 1111/ bjep. 12367 
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
22 |   KYYNÄRÄINEN et al.
R Core Team. (2023). R: A language and environment for statistical computing (version 4.3.0) [computer software]. R Foundation for 
Statistical Computing.
Radoff, J., Jaber, L. Z., & Hammer, D. (2019). “It's scary but It's also exciting”: Evidence of meta- affective learning in science. 
Cognition and Instruction, 37(1), 73–92. https:// doi. org/ 10. 1080/ 07370 008. 2018. 1539737
Ramos- Vera, C., Basauri- Delgado, M., Calizaya- Milla, Y. E., & Saintila, J. (2025). Exploring the mediation of stress and emo-
tional exhaustion on academic ineffectiveness and cynicism among university students. Psychiatry Investigation, 22(4), 365–
374. https:// doi. org/ 10. 30773/ pi. 2024. 0111
Räsänen, P., Aunio, P., Laine, A., Hakkarainen, A., Väisänen, E., Finell, J., Rajala, T., Laakso, M.- J., & Korhonen, J. (2021). 
Effects of gender on basic numerical and arithmetic skills: Pilot data from third to ninth grade for a large- scale online 
dyscalculia screener. Frontiers in Education, 6, 683672. https:// doi. org/ 10. 3389/ feduc. 2021. 683672
Rights, J. D., & Sterba, S. K. (2019). Quantifying explained variance in multilevel models: An integrative framework for defining 
R- squared measures. Psychological Methods, 24(3), 309–338. https:// doi. org/ 10. 1037/ met00 00184 
Salmela, R., Lehtonen, M., Garusi, S., & Bertram, R. (2021). Lexize: A test to quickly assess vocabulary knowledge in Finnish. 
Scandinavian Journal of Psycholog y, 62(6), 806–819. https:// doi. org/ 10. 1111/ sjop. 12768 
Salmela- Aro, K., Read, S., Minkkinen, J., Kinnunen, J. M., & Rimpelä, A. (2018). Immigrant status, gender, and school burn-
out in Finnish lower secondary school students: A longitudinal study. International Journal of Behavioral Development, 42(2), 
225–236. https:// doi. org/ 10. 1177/ 01650 25417 690264
Schmid, R., Smit, R., Robin, N., & Strahl, A. (2025). The role of momentary emotions in promoting error learning orientation 
among lower secondary school students: An intervention study embedded in a short visual programming course. British 
Journal of Educational Psycholog y, 95(1), 107–123. https:// doi. org/ 10. 1111/ bjep. 12681 
Schutz, P. A., & Davis, H. A. (2000). Emotions and self- regulation during test taking. Educational Psychologist, 35(4), 243–256. 
https:// doi. org/ 10. 1207/ S1532 6985E P3504_ 03
Schwartze, M. M., Frenzel, A. C., Goetz, T., Marx, A. K. G., Reck, C., Pekrun, R., & Fiedler, D. (2020). Excessive boredom 
among adolescents: A comparison between low and high achievers. PLoS One, 15(11), e0241671. https:// doi. org/ 10. 1371/ 
journ al. pone. 0241671
Sharabi, Y., & Roth, G. (2025). Emotion regulation styles and the tendency to learn from academic failures. British Journal of 
Educational Psycholog y, 95(1), 162–179. https:// doi. org/ 10. 1111/ bjep. 12696 
Shuman, V., & Scherer, K. R. (2013). Concepts and structures of emotions. In International handbook of emotions in education. 
Routledge. https:// doi. org/ 10. 4324/ 97802 03148 211. ch2
Soncini, A., Matteucci, M. C., Tomasetto, C., & Butera, F. (2025). Supportive error feedback fosters students' adaptive reac-
tions towards errors: Evidence from a targeted online intervention with Italian middle school students. British Journal of 
Educational Psycholog y, 95(1), 92–106. https:// doi. org/ 10. 1111/ bjep. 12679 
Soncini, A., Visintin, E. P., Matteucci, M. C., Tomasetto, C., & Butera, F. (2022). Positive error climate promotes learning out-
comes through students' adaptive reactions towards errors. Learning and Instruction, 80, 101627. https:// doi. org/ 10. 1016/j. 
learn instr uc. 2022. 101627
Sullivan, G. M., & Artino, A. R. (2013). Analyzing and interpreting data from Likert- type scales. Journal of Graduate Medical 
Education, 5(4), 541–542. https:// doi. org/ 10. 4300/ JGME- 5- 4- 18
Tulis, M., & Ainley, M. (2011). Interest, enjoyment and pride after failure experiences? Predictors of students' state- emotions 
after success and failure during learning in mathematics. Educational Psycholog y, 31(7), 779–807. https:// doi. org/ 10. 1080/ 
01443 410. 2011. 608524
Tulis, M., & Dresel, M. (2025). Effects on and consequences of responses to errors: Results from two experimental studies. 
British Journal of Educational Psycholog y, 95(1), 143–161. https:// doi. org/ 10. 1111/ bjep. 12686 
Tulis, M., & Fulmer, S. M. (2013). Students' motivational and emotional experiences and their relationship to persistence during 
academic challenge in mathematics and reading. Learning and Individual Differences, 27, 35–46. https:// doi. org/ 10. 1016/j. 
lindif. 2013. 06. 003
Ukkola, A., & Metsämuuronen, J. (2019). Alkumittaus – Matematiikan ja äidinkielen ja kirjallisuuden osaaminen ensimmäisen luokan 
alussa. Julkaisut 17:2019. Kansallinen Koulutuksen.
Ukkola, A., Metsämuuronen, J., & Paananen, M. (2020). Alkumittauksen syventäviä kysymyksiä. Julkaisut 10:2020. Kansallinen 
Koulutuksen Arviointikeskus. https:// www. karvi. fi/ sites/ defau lt/ files/ sites/ defau lt/ files/ docum ents/ KARVI_ Alkum 
ittaus. pdf
Van Der Beek, J. P. J., Van Der Ven, S. H. G., Kroesbergen, E. H., & Leseman, P. P. M. (2017). Self- concept mediates the relation 
between achievement and emotions in mathematics. British Journal of Educational Psycholog y, 87(3), 478–495. https:// doi. org/ 
10. 1111/ bjep. 12160 
Vilhunen, E., Chiu, M.- H., Salmela- Aro, K., Lavonen, J., & Juuti, K. (2023). Epistemic emotions and observations are in-
tertwined in scientific sensemaking: A study among upper secondary physics students. International Journal of Science and 
Mathematics Education, 21(5), 1545–1566. https:// doi. org/ 10. 1007/ s1076 3- 022- 10310 - 5
Vilhunen, E., Turkkila, M., Lavonen, J., Salmela- Aro, K., & Juuti, K. (2022). Clarifying the relation between epistemic emotions 
and learning by using experience sampling method and pre- posttest design. Frontiers in Education, 7, 826852. https:// doi. 
org/ 10. 3389/ feduc. 2022. 826852
Vogl, E., & Pekrun, R. (2016). Emotions that matter to achievement: Student feelings about assessment. In Handbook of human 
and social conditions in assessment (pp. 111–128). Routledge.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 23EMOTIONS BEFORE AND AFTER MATH TESTING
Zhang, J., Chiu, M. M., & Lei, H. (2023). Achievement, self- concept and anxiety in mathematics and English: A three- wave 
cross- lagged panel study. British Journal of Educational Psycholog y, 93(1), 56–72. https:// doi. org/ 10. 1111/ bjep. 12539 
How to cite this article: Kyynäräinen, R., Holopainen, S., Metsämuuronen, J., Bin Qushem, U., 
Laakso, M.-J., & Alanko, K. (2025). Beyond performance: Emotions before and after semi- high- 
stakes mathematics testing among school- aged students. British Journal of Educational Psycholog y, 00, 
1–26. https://doi.org/10.1111/bjep.70043
A PPEN DI X 
Full results of the final mixed- effects models
T A B L E  A 1  Full results of the final mixed- effects model for happiness.
Fixed effect Beta SE 95% CI t(1,803) p
Intercept (grand mean) 2.55 .03 2.49 to 2.60 97.18 <.001
Time (before) .22 .02 .19 to .25 14.36 <.001
Grade (3) .72 .05 .62 to .83 13.83 <.001
Grade (6) .15 .04 .07 to .22 3.9 <.001
Grade (8) −.5 .04 −.59 to −.41 −11.21 <.001
Gender (boy) .08 .02 .03 to .12 3.39 .001
CB (0) −.54 .07 −.68 to −.40 −7.43 <.001
CB (1) −.26 .06 −.38 to −.15 −4.57 <.001
CB (2) −.08 .05 −.17 to .01 −1.67 .096
CB (3) .26 .04 .18 to .34 6.34 <.001
Test score 0 .03 −.06 to .06 .05 .962
Time (before) × Grade (3) −.16 .03 −.22 to −.10 −5.3 <.001
Time (before) × Grade (6) −.1 .02 −.14 to −.06 −4.53 <.001
Time (before) × Grade (8) .1 .03 .05 to .15 3.9 <.001
Time (before) × Gender (boy) −.03 .01 −.06 to −.00 −2.24 .025
Time (before) × CB (0) −.04 .04 −.12 to .05 −.88 .379
Time (before) × CB (1) .04 .03 −.03 to .10 1.07 .284
Time (before) × CB (2) −.01 .03 −.06 to .04 −.4 .688
Time (before) × CB (3) .01 .02 −.04 to .05 .33 .745
Time (before) × Test score −.05 .02 .08 to −.01 −2.53 .011
Note: Test score is scaled (M = 0, SD = 1). Variance (SD) of the random intercept for participants was  .61 (.78). Marginal and conditional R- 
squared values were .31 and .65, respectively. p- Values for fixed effects were calculated using Satterthwaite approximations. Model equation: 
Happiness level ~ (1 | Participant) + Time + Grade + Gender + Competence belief + Test score + Time × (Grade + Gender + Competence belief 
+ Test score).
Abbreviation: CB, competence belief.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
24 |   KYYNÄRÄINEN et al.
T A B L E  A 2  Full results of the final mixed- effects model for relaxation.
Fixed effect Beta SE 95% CI t(1,796) p
Intercept (grand mean) 2.49 .03 2.43 to 2.54 88.52 <.001
Time (before) .19 .02 .15 to .22 11.01 <.001
Grade (3) .44 .06 .33 to .55 7.77 <.001
Grade (6) .08 .04 −.01 to .16 1.83 .067
Grade (8) −.36 .05 −.46 to −.27 −7.56 <.001
Gender (boy) .15 .02 .10 to .20 5.89 <.001
CB (0) −.52 .08 −.68 to −.37 −6.73 <.001
CB (1) −.16 .06 −.28 to −.03 −2.52 .012
CB (2) −.04 .05 −.14 to .05 −.87 .386
CB (3) .18 .04 .09 to .27 4.11 <.001
Test score .06 .03 −.01 to .12 1.68 .093
Time (before) × Grade (3) −.16 .03 −.22 to −.09 −4.58 <.001
Time (before) × Grade (6) −.09 .02 −.14 to −.04 −3.61 <.001
Time (before) × Grade (8) .08 .03 .03 to .14 2.86 .004
Time (before) × Gender (boy) .02 .02 −.01 to .04 1.01 .313
Grade (3) × Gender (boy) −.14 .04 −.21 to −.06 −3.47 .001
Grade (6) × Gender (boy) .09 .04 .01 to .17 2.29 .022
Grade (8) × Gender (boy) .05 .05 −.04 to .14 1.11 .268
Time (before) × CB (0) −.11 .05 −.20 to −.02 −2.37 .018
Time (before) × CB (1) .03 .04 −.04 to .11 .87 .383
Time (before) × CB (2) −.03 .03 −.09 to .03 −.97 .334
Time (before) × CB (3) .04 .03 −.01 to .09 1.54 .124
Time (before) × Test score −.07 .02 −.11 to −.03 −3.3 .001
Time (before) × Grade (3) × Gender (boy) .01 .02 −.04 to .05 .29 .77
Time (before) × Grade (6) × Gender (boy) −.05 .02 −.09 to .00 −1.82 .069
Time (before) × Grade (8) × Gender (boy) −.03 .03 −.08 to .03 −.92 .36
Note: Test score is scaled (M = 0, SD = 1). Variance (SD) of the random intercept for participants was  .67 (.82). Marginal and conditional R- 
squared values were .18 and .56, respectively. p- Values for fixed effects were calculated using Satterthwaite approximations. Model equation: 
Relaxation level ~ (1 | Participant) + Time + Grade + Gender + Competence belief + Test score + Grade × Gender + Time × (Grade + Gender 
+ Competence belief + Test score + Grade × Gender).
Abbreviation: CB, competence belief.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
    | 25EMOTIONS BEFORE AND AFTER MATH TESTING
T A B L E  A 3  Full results of the final mixed- effects model for anxiety.
Fixed effects Beta SE 95% CI t(1,785) p
Intercept (grand mean) 2.1 .03 2.05 to 2.16 72 <.001
Time (before) .06 .02 .02 to .09 3.44 .001
Gender (boy) −.2 .03 −.26 to −.15 −7.02 <.001
Test score .01 .03 −.05 to .08 .39 .693
CB (0) .34 .08 .18 to .50 4.16 <.001
CB (1) .05 .06 −.07 to .18 .86 .392
CB (2) 0 .05 −.10 to .10 .01 .991
CB (3) −.12 .05 −.21 to −.04 −2.74 .006
Grade (3) −.25 .06 −.36 to −.14 −4.37 <.001
Grade (6) −.1 .04 −.18 to −.01 −2.31 .021
Grade (8) .15 .05 .06 to .25 3.18 .002
Time (before) × Gender (boy) −.04 .02 −.07 to −.01 −2.4 .017
Time (before) × Test score .03 .02 −.01 to .06 1.29 .197
Gender (boy) × Test score −.07 .02 −.12 to −.02 −2.76 .006
Time (before) × CB (0) .03 .05 −.06 to .12 .65 .514
Time (before) × CB (1) −.06 .04 −.14 to .01 −1.77 .076
Time (before) × CB (2) 0 .03 −.06 to .06 .04 .967
Time (before) × CB (3) .01 .03 −.04 to .06 .43 .668
Gender (boy) × CB (0) −.15 .08 −.31 to .00 −1.93 .054
Gender (boy) × CB (1) −.05 .06 −.17 to .07 −.77 .439
Gender (boy) × CB (2) −.03 .05 −.13 to .06 −.69 .491
Gender (boy) × CB (3) .06 .04 −.02 to .15 1.44 .151
Time (before) × Grade (3) .08 .03 .01 to .14 2.34 .02
Time (before) × Grade (6) .04 .02 −.01 to .09 1.7 .089
Time (before) × Grade (8) −.1 .03 −.15 to −.04 −3.42 .001
Time (before) × Gender (boy) × Test score −.01 .01 −.03 to .02 −.45 .653
Time (before) × Gender (boy) × CB (0) −.01 .05 −.10 to .08 −.19 .847
Time (before) × Gender (boy) × CB (1) −.07 .04 −.14 to .00 −1.86 .064
Time (before) × Gender (boy) × CB (2) .03 .03 −.02 to .09 1.21 .228
Time (before) × Gender (boy) × CB (3) 0 .02 −.05 to .05 −.09 .925
Note: Test score is scaled (M = 0, SD = 1). Variance (SD) of the random intercept for participants was .73 (.85). Marginal and conditional 
R- squared values were .09 and .55, respectively. p- values for fixed effects were calculated using Satterthwaite approximations. Model 
equation: Anxiety level ~ (1 | Participant) + Time + Grade + Gender + Competence belief + Test score + Gender × (Test score + Competence 
belief ) + Time × [Grade + Gender + Competence belief + Test score + Gender × (Test score + Competence belief )].
Abbreviation: CB, competence belief.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
26 |   KYYNÄRÄINEN et al.
T A B L E  A 4  Full results of the final mixed- effects model for boredom.
Fixed effects Beta SE 95% CI t(1,786) p
Intercept (grand mean) 3.11 .04 3.04 to 3.19 85.43 <.001
Time (before) −.04 .02 −.09 to .00 −1.86 .062
Grade (3) −.55 .07 −.69 to −.42 −8.07 <.001
Grade (6) −.05 .05 −.15 to .04 −1.07 .285
Grade (8) .39 .06 .27 to .50 6.6 <.001
Test score −.02 .04 −.09 to .05 −.51 .607
Gender (boy) .06 .03 −.01 to .12 1.71 .087
CB (0) .3 .09 .12 to .48 3.3 .001
CB (1) .24 .07 .10 to .38 3.4 .001
CB (2) .07 .06 −.04 to .18 1.21 .225
CB (3) −.2 .05 −.30 to −.10 −3.91 <.001
Time (before) × Grade (3) .12 .04 .03 to .21 2.72 .007
Time (before) × Grade (6) .07 .03 .01 to .13 2.16 .031
Time (before) × Grade (8) −.11 .04 −.18 to −.04 −2.88 .004
Time (before) × Test score .05 .02 .00 to .10 2.01 .044
Grade (3) × Test score −.18 .06 −.29 to −.07 −3.12 .002
Grade (6) × Test score .09 .06 −.02 to .20 1.6 .11
Grade (8) × Test score .09 .06 −.03 to .20 1.39 .163
Time (before) × Gender (boy) .02 .02 −.02 to .06 .96 .337
Time (before) × CB (0) .1 .06 −.02 to .22 1.71 .088
Time (before) × CB (1) −.02 .05 −.11 to .08 −.33 .742
Time (before) × CB (2) −.07 .04 −.14 to .00 −1.84 .065
Time (before) × CB (3) −.03 .03 −.10 to .03 −.93 .351
Gender (boy) × CB (0) −.03 .09 −.21 to .14 −.39 .695
Gender (boy) × CB (1) .02 .07 −.11 to .16 .31 .755
Gender (boy) × CB (2) −.14 .06 −.25 to −.03 −2.49 .013
Gender (boy) × CB (3) .04 .05 −.06 to .13 .81 .416
Time (before) × Grade (3) × Test score .03 .04 −.04 to .11 .87 .385
Time (before) × Grade (6) × Test score −.05 .04 −.12 to .02 −1.38 .169
Time (before) × Grade (8) × Test score .04 .04 −.04 to .11 .89 .372
Time (before) × Gender (boy) × CB (0) .08 .06 −.03 to .19 1.41 .158
Time (before) × Gender (boy) × CB (1) −.13 .04 −.22 to −.05 −2.97 .003
Time (before) × Gender (boy) × CB (2) .04 .04 −.03 to .11 1.15 .252
Time (before) × Gender (boy) × CB (3) .01 .03 −.05 to .07 .3 .766
Note: Test score is scaled (M = 0, SD = 1). Variance (SD) of the random intercept for participants was  .77 (.88). Marginal and conditional R- 
squared values were .11 and .47, respectively. p- Values for fixed effects were calculated using Satterthwaite approximations. Model equation: 
Boredom level ~ (1 | Participant) + Time + Grade + Gender + Competence belief + Test score + Grade × Test score + Gender × Competence 
belief + Time × (Grade + Gender + Competence belief + Test score + Grade × Test score + Gender × Competence belief ).
Abbreviation: CB, competence belief.
 20448279, 0, D
ow
nloaded from
 https://bpspsychub.onlinelibrary.w
iley.com
/doi/10.1111/bjep.70043 by University of Turku, W
iley Online Library on [02/11/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on W
iley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License