Applying Supervised Machine Learning for
Radiation Dose Accumulation
University of Turku
Department of Computing
Master of Science (Tech) Thesis
Health Technology
February 2025
Valtteri Puumalainen
Supervisors:
Jari Björne, PhD. (University of Turku)
Jussi Nieminen, M.Sc. (Teollisuuden Voima Oyj)
Eemeli Härmälä, M.Sc. (Teollisuuden Voima Oyj)
The originality of this thesis has been checked in accordance with the University of Turku quality assurance system
using the Turnitin OriginalityCheck service.
UNIVERSITY OF TURKU
Department of Computing
Valtteri Puumalainen: Applying Supervised Machine Learning for Radiation
Dose Accumulation
Master of Science (Tech) Thesis, 77 p.
Health Technology
February 2025
Predicting radiation doses in nuclear power plants is a challenging problem for
maintaining the radiation safety of workers whilst ensuring that high exposures do not
occur. These radiation doses can be predicted using different sensors, measurements
or by manually reviewing the dose history of personnel. However, in this study, a
previously unexplored visit-based machine learning approach for predicting radiation
doses was developed. This approach utilises time relational data on personnel visits
to the controlled area of OL1 and OL2 (Olkiluoto Unit 1 and 2) nuclear power plants,
including radiation doses measured during these visits. This allows us to predict
visits for different interval classes depending on the radiation dose received.
To provide a comprehensive foundation for machine learning modeling, we also
examined the regulations governing current activities and analysed the nature of
radiation exposure in nuclear power plant environments, including the origins and
effects of radiation. Finally, we evaluated the prerequisites and considerations for
deploying a comparable application in a production environment.
Through a combination of literature and experimental analysis, a basis for machine
learning analysis was established, adopting five different models: 1) Random Forest,
2) Balanced Random Forest, 3) XGBoost, 4) LightGBM and 5) Easy Ensemble with
AdaBoost. Among the models tested, LightGBM achieved the most promising results,
however, its performance fell short of expectations due to the inherent imbalance and
lack of descriptiveness in the dataset. While the models demonstrated an ability to
learn from the data, this learning was insufficient to effectively distinguish between
all class intervals. These limitations emphasise the value of integrating additional
contextual information, such as the specific work tasks completed during visits, to
enhance the dataset’s descriptiveness and improve the model’s performance.
By addressing these limitations, this study highlights the broader potential for
data-driven modelling and further research. Specifically, we demonstrate that the
descriptiveness and contextual relevance of data are as, if more, important as
its quantity, as the mere existence or abundance of data does not guarantee its
applicability to similar data-driven methods.
Keywords: machine learning, ml, radiation exposure, occupational exposure, radi-
ation dose, radiation, as low as reasonably achievable (ALARA), nuclear power
plant, npp, nuclear energy, crisp-dm
Acknowledgements
I would like to begin by thanking my supervisors, Jari Björne, Jussi Nieminen and
Eemeli Härmälä, for their good and constructive approach to my work. I felt that
I was contributing to something meaningful, both for my own career and for the
company that made my research possible, Teollisuuden Voima Oyj. In addition, I
would also like to express my gratitude to everyone else who has helped me with my
work, both with the databases and with the data, tools and systems. Your support
has been invaluable and has greatly accelerated my progress over the past year.
Furthermore, I am grateful to my previous employers, and my current employer,
Teollisuuden Voima Oyj, for enabling me to be here today. The opportunity to
learn and grow, which is of great significance at the start of one’s career, has been
truly invaluable. I also extend my sincere thanks to everyone at Turun Yliopisto
(University of Turku) for teaching and supporting me throughout my studies. This
is an excellent place to continue!
Valtteri Puumalainen
January, 2025
Contents
1 Introduction 1
2 Research process 17
3 Radiation and machine learning 20
3.1 Legislative approach to dose regulation . . . . . . . . . . . . . . . . . 20
3.2 Nuclear power plant setting . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Site visit formation . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Radiation phenomena and dosage . . . . . . . . . . . . . . . . 24
3.2.3 Biological effects of radiation . . . . . . . . . . . . . . . . . . 29
3.3 Related studies and machine learning . . . . . . . . . . . . . . . . . . 33
3.3.1 Present-day applications . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Overview of machine learning . . . . . . . . . . . . . . . . . . 36
4 Machine learning approach 45
4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Applying machine learning . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Results 63
5.1 Evaluation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Operational use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
i
6 Conclusions 76
References 78
ii
List of acronyms
H ∗ (d) Ambient Dose Equivalent
Hp(d) Personal Dose Equivalent
wR Radiation Weighting Factor
wT Tissue Weighting Factor
A Mass Number
AdaBoost Adaptive Boosting
ALARA As Low As Reasonably Achievable
ANN Artificial Neural Network
AUC Area Under the Curve
Bq Becquerel
BRF Balanced Random Forest
BWR Boiling Water Reactor
CRISP-DM CRoss-Industry Standard Process for Data Mining
D-Tree Decision Tree
EE Easy Ensemble
iii
EFB Exclusive Feature Bundling
GBDT Gradient Boosting Decision Trees
GBN Gaussian Bayesian Network
GDPR General Data Protection Regulation
GOSS Gradient-Based One-Side Sampling
GP Gaussian Process
Gy Gray
ICRP International Commission on Radiological Protection
IQR Ínterquartile Range
IXRPC International X-Ray and Radium Protection Committee or Commission
LDR Low-dose radiation
LightGBM Light Gradient Boosting Machine
LLN Law of Large Numbers
LNT Linear No-Threshold
LWR Light Water Reactor
MAE Mean Absolute Error
man-Sv man-Sievert
MAPE Mean Absolute Percentage Error
ML Machine Learning
NLL Negative Log-Likelihood
iv
NPP Nuclear Power Plant
OOB Out-of-bag
OvR One-vs-the-Rest
PR Precision-Recall
PTSD Post-traumatic stress disorder
RF Random Forest
RMSE Root Mean Squared Error
ROC AUC Area Under the Receiver-Operating Characteristic
ROC Receiver Operating Characteristic
RWP Radiation Work Permit
STUK Radiation and Nuclear Safety Authority
Sv Sievert
TLD Thermoluminescent Dosimeter
TVO Teollisuuden Voima Oyj
WDC Work Dose Code
XGBoost Extreme Gradient Boosting
Z Atomic Number
v
1 Introduction
Radiation is a phenomenon that has always occurred naturally on Earth in two
distinguishable forms: ionising and non-ionising. The defining characteristic of
ionising radiation is its ability to remove electrons from atoms or molecules, thereby
producing ionisation, whereas non-ionising radiation does not have enough energy to
achieve this [1]–[3]. These types of radiation occur in nature as so-called background
radiation due to cosmic rays and terrestrial radiation [1]. Although the general
understanding is that radiation is perceived as harmful, it is often employed in
healthcare settings for a variety of diagnostic and therapeutic procedures, as well as
in other applications, such as nuclear power production [4]. This is referred to as
man-made radiation [1].
The focus of this study is Teollisuuden Voima Oyj (TVO), the operator of three
nuclear power plants (NPPs) in Olkiluoto, Finland. These NPPs account for one
third of Finland’s electricity production, the first of which has been in operation
since 1979 and the latest since 2023 [5]. To operate these NPPs, TVO must comply
with the laws, decrees, decisions and regulations laid down in Finnish legislation,
which are affected by the decisions of the Council of Europe [6]. The Radiation and
Nuclear Safety Authority (STUK) oversees nuclear safety and radiation monitoring
in Finland, and therefore sets regulations that need to be followed [6], [7].
This study focuses on man-made ionising radiation from fission reactions, resulting
in fission products, to which personnel are exposed in the nuclear power plant
CHAPTER 1. INTRODUCTION 2
environments. This exposure may be referred to as occupational exposure [8]. TVO
has a legal obligation to maintain and manage radiation dose data from their NPPs
in order to monitor personnel’s exposure [6]. The objective of this study is to utilise
this data to predict the radiation dose of personnel during site visits using machine
learning. At present, this prediction is carried out manually based on past experience
and work history. This study therefore seeks to assess whether such prediction can
be achieved through the application of machine learning, which can be defined as
the teaching of an algorithm with historical data, thereby enabling it to adapt and
make accurate predictions based on past experiences [9], [10]. More precisely, this
problem will be approached through classification, which means that the radiation
dose is predicted with a finite number of possible outcomes, even if the dose is a
continuous value, but this problem is solved by dividing the value of the dose into
interval classes [11].
The current manual method for predicting radiation doses achieves a Mean
Absolute Error (MAE) of 111.9 man-mSv and a Mean Absolute Percentage Error
(MAPE) of 18.76% for annual maintenance cycles completed between 2012 and 2023.
This means that the average difference between the predicted and actual radiation
doses is 111.9 man-mSv and the percentage error relative to the actual dose is 18.76%
[10]. In the light of these metrics, the term man-Sievert (man-Sv) is used to describe
the collective dose received by a group of people. It represents the sum of the
individual doses within some population.
The data used in this study for the machine learning modelling was collected
from the same period as the initial (MAE and MAPE) benchmarks, spanning from
2012 to 2023. The MAE and MAPE values are calculated from the total accumulated
dose and dose predictions for both OL1 and OL2 at the annual maintenance level.
In this sense, the machine learning model is assumed to have at most the same
MAE and MAPE values predicting the total dose. This study therefore sets as a
CHAPTER 1. INTRODUCTION 3
success criterion that the value obtained by machine learning prediction should be
at least as accurate on the annual cycle level as that obtained by manual prediction.
These maintenances may include either a nuclear fuel change or maintenance that
includes a nuclear fuel change. These take place alternately at OL1 and OL2. At the
same time, the maintenance includes any necessary repair of malfunctions, possible
modifications and, if necessary, preparation for the next year’s maintenance. The
necessity for annual maintenance represent a significant aspect of the operational life
cycle of the plant units [5].
The following research questions are formed with the intention of modelling the
effects of ionising radiation on humans in nuclear power plant environments. This will
provide a basis for the processing of radiation dose data. The processing of the data
must take into account the legal regulations and the operating environment. Finally,
the most suitable machine learning model for the given problem will be identified,
and the potential integration of such a model into a network information system
solution will be explored. The study makes use of the CRISP-DM (CRoss-Industry
Standard Process for Data Mining) process model.
• Research question 1: What is the impact of radiation on individuals in
nuclear power plant environments?
• Research question 2: Which machine learning model is the most suitable
for predicting occupational radiation exposure during a site visit?
• Research question 3: How can the machine learning model implemented in
this research be integrated into real-world nuclear power plant operations?
The succeeding paragraphs present the general information on the subject of the
study, which is essential for comprehending the overall context. This will enable the
reader to understand the fundamental principles of nuclear power plant operation,
radiation sources, and radiation monitoring activities that underpin nuclear and
CHAPTER 1. INTRODUCTION 4
radiation safety with the focus on individual monitoring, meaning the making and
interpretation of measurements related to occupational exposure [8]. This will provide
an overview that will guide the reader towards a more in-depth understanding of
Chapter 3.
As previously stated, TVO operates three nuclear power plants, OL1, OL2 and
OL3 (Olkiluoto Unit 1, 2 and 3). The first two plants are identical and each has
a net electricity output of 890 MW (megawatt) [5]. In contrast to the preceding
units, the third nuclear power plant, OL3, operates on a different principle, with the
capacity to generate electricity with a net electricity output of 1600 MW [12]. This
study will focus on OL1 and OL2. This choice is based on the fact that these sites
have been in production use for a considerable length of time compared to OL3 and
thus have well established work procedures and significantly more data available.
In principle, nuclear energy counts as thermal energy, and the OL1 and OL2
plants operate on this principle, using Boiling Water Reactors (BWRs), or more
generally, Light Water Reactors (LWRs) [5]. BWRs operate by circulating water
between the fuel rods within the reactor core, which results in the heating and
vaporisation of the water [5]. In the case of TVO, the resulting steam is then directed
through four main steam pipes to the high-pressure turbine, which directs the steam
to be reheated with intermediate heater, before finally reaching the four low-pressure
turbines that work in parallel [5]. These turbines drive the generator. The power
of the reactor is controlled by control rods and main circulation pumps [5]. The
reactors use uranium dioxide (UO2) pellets as fuel, with the primary process being
the neutron-induced fission of uranium-235 isotopes [3], [5]. This fission reaction,
among others, releases ionising radiation, such as neutrons and gamma rays, and the
necessary non-ionising radiation, heat [3], [5]. The BWR system of the OL1 and OL2
reactors is illustrated in Figure 1.1. A detailed overview of the operational aspects
of the reactor is not within the scope of this study.
CHAPTER 1. INTRODUCTION 5
Figure 1.1: OL1 and OL2 BWR system overview [3], [13]
In consideration of the various radiation sources, the most significant is ionising
radiation emitted from the reactor. However, this has been considered in the design
of the NPPs and the fuel pellets utilised in the reactor already serve as the initial
barrier against the dissemination of radiation [5]. The second layer of protection
is the metallic shell of the fuel rods, which contain the fuel pellets [5]. The third
layer of protection is the reactor pressure vessel, which is protected by a containment
building that acts as fourth layer [5]. The final layer of protection is the reactor
building [5]. These constitute a series of nested protective zones.
CHAPTER 1. INTRODUCTION 6
Despite the above-mentioned protective measures, radiation doses to personnel
still occur, and the highest doses occur during the mandatory annual maintenance
of plants, as this is when a lot of work is carried out near radiating objects. It
should be noted that the plants emit more radiation during power operation, but in
this case there is not as much work being carried out on the radiating systems as
during maintenance, if any, so there is no increase in the dose of the personnel [5].
In this instance, the source of radiation is the contamination of water that comes
into contact with the reactor. This water contains radioactive impurities due to the
activation of elements like oxygen that transform into nitrogen-16 (16N), which emits
gamma radiation. These impurities accumulate in various systems, including on the
surfaces of pipes. However, during a reactor shutdown when the maintenance is done,
the radioactivity of the water decreases as short-lived isotopes like nitrogen-16 decay,
reducing the overall radiation levels.
The radiation in question is produced as a consequence of radioactivity, whereby
unstable atomic nuclei undergo a spontaneous change in state, emitting particles or
electromagnetic radiation, in order to achieve stability [1], [2]. The aforementioned
stability thus gives origin to ionising radiation, which can be classified into four
distinct categories: 1) α (alpha) particles, 2) β (beta) particles, 3) γ (gamma)
rays, 4) X-rays (X-rays) and 5) n (neutrons) [1]. Of these categories, gamma and
X-ray radiation is discussed in detail in Chapter 3.2.2, as these are measured by
the electronic dosimeters and forms the basis of the dose in the available data. It
should be noted that TVO is also capable of measuring other types of radiation, but
gamma radiation (and similarly, X-rays, due to their similar electromagnetic nature)
is always measured in due of penetration capability and thus, the significance.
The basis of radioactivity lies in the atomic nucleus, which is composed of protons
and neutrons, with protons exhibiting a charge magnitude comparable to that of
electrons but opposite in sign, and neutrons remaining neutral as they bind these
CHAPTER 1. INTRODUCTION 7
protons [2]. Radionuclides with an unfavourable neutron-proton ratio undergo decay,
thereby losing energy and becoming other nuclides or isotopes with different atomic
numbers (Z) or mass numbers (A) [2]. The atomic number Z represents the number
of protons present in an atom, while the mass number A is the sum of protons
and neutrons [2]. A nucleus is considered stable when its neutron-to-proton ratio is
approximately equal to one [2]. In general, nuclei with an atomic number greater
than 83 are unstable, such as uranium (Z=92), which is often used fuel in NPPs [2],
[5].
The process by which atomic nuclei seek equilibrium, resulting in the release of
radiation, is invisible to the human eye and cannot be directly observed by humans
[2]. However, radiation can be measured using dosimeters or with radiometers, which
provide a quantifiable reading of the radiation levels. Before doing so, however, it
must be understood that the amount of radioactivity is measured by the activity,
which is the number of disintegrations of a radionuclide per unit time, which is
measured in Becquerel (Bq) or reciprocal seconds (s−1), where 1 Bq = 1 s−1 [3].
The reciprocal second refers to the frequency of events or decays in seconds in a
radioactive material [3]. This can be further discussed in terms of absorbed dose,
which is the amount of energy transferred by ionising radiation to a substance, i.e.
a unit of mass [2], [3]. This is measured in Gray (Gy), where the unit is the joule
per kilogram (J.kg−1), where 1 Gy = 1 J.kg−1 represents the absorption of 1 joule of
radiation energy per kilogram of matter [3]. Unlike absorbed dose, Sievert (Sv) takes
into account the biological effects of radiation on different human tissues [2]. This is
achieved by assigning different types of radiation a radiation weighting factor (wR)
[2]. The weighted values are then summed to calculate the equivalent dose, where
the unit is J.kg−1 [2]. Table 1.1 shows these weighting factors and their respective
continuous functions for neutron radiation as a function of energy.
CHAPTER 1. INTRODUCTION 8
Radiation type Radiation weighting factor (wR)
Photons (γ and X-rays) 1
Electrons (β) 1
Alpha particles (α) 20
Neutrons (n) A continuous function of neutron energies:
wR =
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩
2.5 + 18.2 · e−[ln(En)]2/6
5.0 + 17.0 · e−[ln(2En)]2/6
2.5 + 3.25 · e−[ln(0.04En)]2/6
En < 1 MeV
1 MeV ≤ En ≤ 50 MeV
En > 50 MeV
Table 1.1: Radiation types and weight factors defined by ICRP (International
Commission on Radiological Protection) in 2007 [2]
This brings us to the effective dose, which takes into account the radiation
weighting factor as well as the varying sensitivity of different tissues and is again
expressed in Sieverts [2], [3]. This is calculated by weighting the equivalent dose
for each tissue or organ by the tissue weighting factor (wT ), whereby the sum of all
tissue weighting factors for the body would be one, and then summing the result.
Table 1.2 shows these weighting factors. Equivalent and effective doses cannot be
calculated directly but are assessed using radiation dose quantities and modeled with
computational phantoms, which are computer models of human anatomy composed
of numerous voxels [2]. Each voxel is assigned a specific tissue type and organ identity
based on gender. [2]. These phantoms simulate how radiation distributes in the body
and are created using computational geometry or 3D imaging [2].
CHAPTER 1. INTRODUCTION 9
Tissue or organ wT
∑︁
wT
Bone marrow, colon, lung, stomach, breast, and remainder tissues 0.12 0.72
Gonads 0.08 0.08
Bladder, esophagus, liver and thyroid 0.04 0.16
Bone surface, brain, salivary glands skin 0.01 0.04
Table 1.2: Tissue types and weight factors defined by ICRP in 2007. Remainder
tissues include adrenals, nasal and oral passages, pharynx, larynx, gall bladder, heart,
kidneys, lymphatic nodes, muscle, oral mucosa, pancreas, prostate, small intestine,
spleen, thymus, uterus and cervix [2]
For operational calculations, there are definitions of internal and external exposure,
of which this study only considers external exposure, as the data collected for this
study only covers the external exposure of personnel. Internal exposure occurs when
radioactive substances enter the body by ingestion, inhalation or absorption, for
example, through wounds or breathing [2], [3]. External exposure, on the other hand,
occurs when the body is exposed to external radiation from particles emitted by a
radioactive source as we have previously discussed [2].
External exposure can be measured in ambient dose equivalent (H ∗ (d)) for area
monitoring and personal dose equivalent (Hp(d)) for individual monitoring [2]. This
study focuses on the personal dose equivalent as it is central to personal monitoring
of occupational exposure. Hp(d) is measured at a depth d of 10mm for the effective
dose or 0.07mm for the skin dose, i.e. Hp(10) is suitable for measuring the deeper
penetrating dose (which is usually derived from personal dosimeters) and Hp(0.07) for
measuring the shallower penetrating dose [2]. Thus, Hp(10) is sufficient to simulate,
for example, the neutron or gamma dose as these penetrate deeper into the tissues,
while Hp(0.07) is more suitable for measuring, for example, the beta dose as it does
not have the same penetration capability [2]. For this reason, TVO uses Hp(10),
as radiation dose measurements with electronic dosimeters are intended for gamma
CHAPTER 1. INTRODUCTION 10
radiation in particular. It should be mentioned that, in addition to these reasons,
the beta dose is also more difficult to measure due to its local penetrating capability,
so that, also in the case of TVO, a chest dosimeter may not detect a radiation source
located, for example, at the level of the foot.
As a result of radiation in the power plant environments and the subsequent need
for the monitoring and control of radiation doses for both the environment and the
personnel in that environment, the power plant sites are divided into three areas: 1)
controlled area, 2) supervised area, and 3) an unclassified area [14]. The classification
of these areas is determined by estimating radiation exposure and the potential risk
to personnel [15], [16]. This estimation relies on measuring dose rates and evaluating
radionuclide concentrations in the air, as well as surface contamination levels (activity
coverage) [6]. In particular, the classification of controlled and supervised areas must
consider the nature of the work being carried out in the area and the magnitude of
the radiation risk that is inherent to that work [16]. This can be accomplished, for
example, with different measurements.
Controlled area is defined as any area where either the dose rate exceeds 3 µSv/h,
or where spending 40 hours per week in the area could result in a dose greater than 1
mSv per year [6], [14]. In this instance, the work necessitates measures to safeguard
against ionising radiation, due to the potential risks of radiation or contamination
[16]. Supervised area is defined as an area where a personnel’s effective dose may
exceed 1 mSv per year, or where the equivalent dose to the lens of the eye could
exceed 15 mSv per year [16]. Additionally, the equivalent dose to the skin, hands,
arms, feet, or ankles may exceed 50 mSv per year in such an area [6], [16]. The
area situated beyond the aforementioned zones is classified as unclassified and, as a
consequence, is not considered significant in terms of radiation protection [6].
Subsequently, the controlled area should be subdivided into at least three distinct
zones, employing the same estimations utilized for the initial zoning [6]. In the case of
CHAPTER 1. INTRODUCTION 11
TVO, the aforementioned zones are designated as green, orange, and red, respectively
[5]. The corresponding dose rates for these zones are illustrated in Figure 1.2. This
study focuses exclusively on the work carried out in the controlled area.
Figure 1.2: OL1 and OL2 controlled area divided into distinct zones [5]
The Figure 1.2 shows the controlled area of the plants during power operation,
when the plant is supplying electricity to the national grid. However, this changes
as follows when the power plant is in shutdown: areas 1 turn green and areas 2
turn orange. It is essential to state that the data covers work performed during the
maintenance periods when these rules apply.
In the event that the work is conducted within a green area, no restrictions are
in place and the work may be carried out in accordance with the traditional working
hours of 40 hours per week [17]. However, when working in the orange zone, it is
needed to plan the work in advance and work areas need to be secured or supervised
[17]. Similarly, work in the red zone must be planned and, in addition, carried out
CHAPTER 1. INTRODUCTION 12
in a short-term manner [17]. It also requires specific dose assessments and detailed
planning [17]. Additionally, the work area must be secured or controlled [17].
In the controlled area, the dose and dose rate of individuals are continuously
monitored using thermoluminescent dosimeters (TLD) and electronic dosimeters
[17], [18]. Electronic dosimeters collect data on a visit-by-visit basis, whereas the
data from TL dosimeters are cumulative over a period of time [17], [18]. This means
that electronic dosimeters measure the dose for the whole visit and TL dosimeters
measure the cumulative dose per month. This latter figure is also reported to STUK
as required by law. This study focuses on electronic dosimeters and the data available
from them, as this is a more applicable representation of the data in the context of
this study.
TVO uses a variety of instrumentation for the quantification of radiation exposure.
As was stated, the focus of this study is the electronic dosimeters in use for personal
radiation exposure monitoring. In this process, TVO uses Mirion Technology products,
the DMC2000S and DMC3000 models [19], [20]. These dosimeters must operate
in accordance with STUK regulations, which are based on the provisions required
by the Radiation Act (859/2018) [21]. These regulations state that for personal
dose monitoring, the unit of measurement is the personal dose equivalent, which we
have already discussed, with a maximum measurement uncertainty of 42% [21]. The
minimum response range for TLD and electronic dosimeters, specifically for photon
radiation (including gamma rays and X-rays), is shown in Table 1.3.
Radiation type Response range R
Photon radiation (γ and X-rays) Eph > 10 keV 0.71
[︂
1− 2·H0/1.33
H0/1.33+Href
]︂
≤ R
Photon radiation (γ and X-rays) Eph ≤ 10 keV 0.5
[︂
1− 2·H0/1.5
H0/1.5+Href
]︂
≤ R ≤ 2
Table 1.3: Response range for photon radiation with energy greater than 10 keV
(kiloelectronvolts), and those with energy less than or equal to 10 keV [21]
CHAPTER 1. INTRODUCTION 13
In this context, the response R of the dosimeter is expressed as R = G
Href
, where
G is the dose determined by the dosimeter, and Href is the true dose. The parameter
H0 refers to the registration threshold, which represents the minimum dose that
the dosimeter records [21]. The symbol Eph in the table refers, in this case, to
the mean energy of photon radiation, typically denoting gamma or X-ray radiation.
Kiloelectronvolt is a measure of energy, where 1 keV equals 1000 electronvolts. This
illustrates the dose response and behavior of dosimeters.
In terms of the electronic dosimeters used, the DMC2000S detects gamma and
X-ray radiation fields between 60 keV and 6 MeV and operates in the range 1µSv - 10
Sv; 10µSv/h - 10 Sv/h, while the DMC3000 is more accurate and detects radiation
fields between 15 keV and 7 MeV and operates in the range 1µSv - 10 Sv; 0.1µSv/h
- 20 Sv/h [22]–[25]. However, it is also possible to go outside the measuring ranges
up to a certain point, as accuracy deteriorates.
Both of these dosimeters measure the personal dose equivalent Hp(10), as men-
tioned previously and required by the regulations [21]–[23]. In addition, the dosimeters
measure the corresponding dose rate [22], [23]. The measurement uncertainty of
these dosimeters is also within the limits required by the regulations, ensured by
calibration, testing and keeping the internally defined limits below the limits required
by the regulations, which in the case of TVO is ± 15% [24], [25]. Furthermore,
modules for measuring beta and neutron doses are available for the dosimeters,
although beta modules are not in use at TVO. Neutron modules are only used when
required by the work [23].
These dosimeters also have personal or work dose and dose rate limits that can
be used to detect abnormal dose trends at an early stage [26], [27]. These are set via
the radiation management system to the lowest visit dose limit from a range of work,
daily, monthly and annual dose limits, based on radiation personnel class, gender
and work dose code, and if the daily, monthly or annual dose limit is exceeded, the
CHAPTER 1. INTRODUCTION 14
personnel is prevented from entering the controlled area [26], [27]. Alarm limits are
discussed in more detail in Chapter 3.2.1.
Dosimeters are therefore a way of controlling the work and the resulting radiation
dose, and also a way of warning the personnel of the dose or dose rate, based on
legal requirements and also on economic objectives, since the radiation dose received
can also be measured in monetary terms outside the health risk. In this case, the
organisation does not need to hire and train additional staff if the radiation dose
can be kept as low as reasonably achievable. In the 1990s, it was estimated in the
USA that avoiding one millisievert of radiation exposure could save between $20 and
$2600, depending on the work task [28]. In a 2002 survey in Finland, the value of
one man-Sv was estimated to be 77.21 euros, based on the recommended value from
1991 [2].
The aforementioned policies and protection measures are founded upon the
premise that work involving occupational exposure should be covered by a system
of protection for practices [8]. Consequently, STUK has also defined the previous
specifications as requirements in its YVL C.2 regulation, which is in compliance
Finnish legislation [6]. The following principles should apply to such work: 1)
justification of practices, 2) optimization of protection, and 3) use of dose constraints
[1], [8]. The justification ensures that activities involving radiation are only performed
if the benefits outweigh the risks [1]. Optimization of protection balances protection
levels with economic and social factors, aiming to keep doses As Low As Reasonably
Achievable (ALARA), ensuring personnel exposure remains minimal during justified
work [1], [2], [8]. Finally, dose constraints serve as reference points, typically a
fraction of the total dose limit, to prevent excessive exposure. At TVO, this is
applied through the ALARA program with set reference levels [1], [27]. Discussion
of these methods at TVO is provided in Chapter 3.1.
CHAPTER 1. INTRODUCTION 15
On the basis of these guidelines, therefore, it must be assumed that radiation
doses will accumulate for personnel in any case, because the work and maintenance
would be too expensive if no doses were to accumulate for personnel at all. Therefore,
planning, design and work should always be based on the ALARA principle of keeping
the doses as low as reasonably achievable [3].
The ALARA principle, like the other policies on which TVO bases its operations,
is not unique to the power plants operated by TVO, but many, if not all, operators in
the nuclear industry base their policies on the same scientifically proven policies, such
as the ALARA policy created by the ICRP (formerly IXRPC) in the 1977, which
is based on the policies created by the IXRPC (International X-Ray and Radium
Protection Committee or Commission) in 1928 [29], [30]. For example, the 1994
US regulations on the application of ALARA guide the ALARA regulations to the
corporate policy level, which is supported in practice by the Radiation Protection
Manual [28]. In addition, the use of RWP (Radiation Work Permit) has long been
recognised as a way of controlling procedures during work, such as working time
and protective equipment [28], [31]. ALARA is also used to assign responsibilities,
whereby the role of the radiation protection manager in an organisation is to enable
the practical application of ALARA principles by the engineering organisation [28].
These same policies are also in use at TVO, although their application is at a different
level.
In addition to TVO, STUK monitors radiation doses based on TVO’s data. Figure
1.3 shows the collective radiation doses received by personnel at Olkiluoto since plant
operations began, as reported by STUK [32]. These doses are measured using TL
dosimeters and have significantly declined since the early 2000s, thanks to improved
technical and radiation protection measures. TVO also monitors environmental
radiation and emissions around the plant, though this falls outside the scope of this
study [33].
CHAPTER 1. INTRODUCTION 16
Figure 1.3: Annual collective radiation doses at TVO NPPs [32]
In the following Chapters, this study will focus on external exposure to ionising
electromagnetic radiation (γ and X-rays) within the controlled area. This is because
the data available for analysis exclusively concern dose data from these types of
radiation. However, the biological effects of radiation are treated as a whole, so other
types of radiation are discussed here in terms of these effects.
This study builds on the foundations laid in this Chapter, examining the effects of
radiation discussed in Chapter 3, and finally reviewing the current radiation-related
machine learning applications used in the nuclear power industry. This review forms
the basis of the experimental section in Chapter 4, the results of which are discussed
in more detail in Chapter 5, which also includes a discussion of the implementation
of an appropriate model through to production as part of a network information
system solution. The study concludes with Chapter 6, which provides a summary of
the research findings. Chapter 3 addresses research questions one, while Chapter 4
addresses research question two. Finally, Chapter 5 answers research question three.
2 Research process
The material for this study was obtained during the final two quarters of 2024
through the utilisation of the Volter database, belonging to the University of Turku, in
conjunction with the PubMed and Web of Science databases. Furthermore, legislative
publications were sourced from the Finlex and Stuklex websites. In addition, this
study has also utilised internal sources of TVO, which are not publicly accessible.
The aforementioned sources are indicated as unpublished in the bibliography with
the specific annotation "unpublished".
In the case of non-refined manual searches, the titles, keywords and abstracts
of the materials were targeted, which included books, conference proceedings and
articles. The methodology used for the search was comprehensive rather than narrow
in order to reach as many relevant publications as possible related to the research
topic. The publications were limited to those in English and Finnish and were not
restricted by the publisher’s Impact Factor (IF) or by publication date, as this would
not have had the desired effect due to the historical dependence of the topic.
CHAPTER 2. RESEARCH PROCESS 18
Most publications were retrieved from the Web of Science database using either
an advanced keyword search or a manual match search. A total of 80 publications
(N=80) were found for use in this study, most of which were found using search terms
related to radiation, its effects, machine learning and regulations. These publications
were assessed for their usefulness, but the study did not take a position on whether
the publications were peer-reviewed. The high-level search process, inclusion and
exclusion criteria are shown in Figure 2.1.
Figure 2.1: Search results that includes a breakdown of sources (manual search
results can be from any source)
This research uses existing knowledge, secondary data, to establish the necessary
context for experimentation, as no prior implementations of this specific machine
learning problem were found in the literature. Consequently, a survey of similar
methods applied to machine learning and radiation prediction was conducted, as
discussed in Chapter 3.3. The experimental data used in this study is proprietary to
TVO and will not be publicly shared.
This study aims to follow the CRISP-DM methodology already mentioned. This
process model consists of six steps, the first of which aims to achieve a project
understanding, in which it is understood what is being done and with what objective
[11]. We have already defined this in Chapter 1. It should be noted that this study
is based on a more technical approach than that of a study with an economic or
social objective. The steps of CRISP-DM are shown in Figure 2.2.
CHAPTER 2. RESEARCH PROCESS 19
Figure 2.2: CRISP-DM process flow overview [11]
If the flow of this study is compared to the introduced flowchart of CRISP-DM,
Chapter 1 corresponds to understanding the project, Chapter 3 to understanding
the data, Chapter 4 to data preparation, modelling and evaluation. Chapters 5 and
6 continue with evaluation and introduce deployment as a new aspect.
The CRISP-DM process model was selected because it is the industry-independent
de facto standard, despite being published over 20 years ago [34]. Overall, this process
model is iterative and flexible, which also means that CRISP-DM follows a natural
flow rather than forcing the flow of projects [35]. It is evident that CRISP-DM
is not directly intended for machine learning as it fails to consider the fact that
machine learning models degrade over time [35], [36]. Consequently, post-deployment
operations must also be taken into account, as discussed in more detail in Chapter
5.2 [36].
3 Radiation and machine learning
This research adopts a exploratory approach to understand the data and the issues
associated with it. The meaning and origins of the data are discussed from the
perspective of what policies and regulations they are based on, i.e. why something is
done as it is done. In addition to creating understanding, we establish the foundation
for these policies and regulations, one of which is to minimise the health risks
associated with radiation. The analysis of these health risks represents a significant
aspect of this study, which will also facilitate the reader’s comprehension of the
sensitivity of the data and the subject matter.
The following Chapters will address the data, subsequently examining the nuclear
power plant environment and the processes through which data is formed. We then
proceed to analyse the effects of radiation doses, before concluding with a discussion
of the current applications of machine learning in a similar context, along with the
general understanding of machine learning and the models utilised in this study.
3.1 Legislative approach to dose regulation
As previously noted, TVO’s operations are subject to numerous radiation-related
regulations and directives, which are overseen by STUK [6]. Similarly, in radiation
protection and radiation-related work, the principles are set forth in the Radiation
Act (859/1987), the Nuclear Energy Decree (161/1988), the Nuclear Energy Act
(990/1987), the Government Decree on Ionising Radiation (1034/2018), and the
3.1 LEGISLATIVE APPROACH TO DOSE REGULATION 21
Ministry of Social Affairs and Health Decree on Ionising Radiation (1044/2018) [15],
[16], [37]–[39]. In terms of this study and in accordance with YVL C.2 regulation,
STUK oversees the practical safety measures on radiation protection of personnel
and monitoring of radiation exposure in nuclear facilities [6]. It is important to
note that while STUK now enforces radiation safety protocols, the original design
and operation of nuclear power plants in Finland were based on US requirements
from the 1970s [40]. Over time, these requirements evolved through national and
international cooperation via laws and regulations [40]. For instance, early safety
requirements did not mandate preparedness for severe accidents [40].
Under the Radiation Act (859/1987), radiation protection is based on the prin-
ciples of justification, optimisation and protection of the individual [6]. These can
be seen as an interpretation of the way in which we have already discussed the
protection of work that may involve ionising radiation, using a system of protection
for practices [27]. TVO applies these policies by defining an ALARA programme, of
which this study focuses on OL1 and OL2 and dose limits [27]. For occupational
doses, TVO has set an internal limit of 10 mSv per year, which acts as a maximum
annual dose instead of the 20 mSv limit stipulated by law, for effective doses received
by personnel [27].
TVO has identified in its ALARA programme a number of ways to reduce
doses, such as preventing and controlling fuel leaks, source term minimisation,
decontamination, work planning and development of work methods, daily dose limits
and work codes in the work dosimetry system, a risk-based rate control programme,
contamination control and annual maintenance [27]. Of these, this study addresses
the data-driven quantifiable attributes which are the daily dose limits and work codes
of the occupational dosimetry system. Other methods are directly reflected in the
resulting doses through the implemented practices.
3.2 NUCLEAR POWER PLANT SETTING 22
The legislation and the resulting STUK regulations limit the intake of radiation
doses so that the annual effective dose does not exceed 20 millisievert (mSv) per
personnel [6], [7], [38]. In practice, it is not possible to get anywhere near such a
dose, in part because of the lower dose limits in force. The aforementioned dose limit
is also affected by the radiation classification of a personnel exposed to radiation, i.e.
class A and B [6], [7], [41]. For a personnel classified as class A, the effective dose may
exceed six millisieverts per year (15 mSv for the eye and 150 mSv for the hands and
feet) [6]. If this is not the case, the personnel falls into class B [41]. Typically, class
A workers are employed in roles such as radiation protection technicians or reactor
operators [41]. For example, the threshold for considering a change in personnel
classification from B to A class is 3.5 mSv among other requirements [6]. In addition
to these dose limits, TVO has defined its own period and work-specific limits, as
previously referenced. These limits are overseen by the radiation management system,
with the work dose code (WDC) serving as a central component of this management
mechanism. These limits will be discussed in the Chapter 3.2.1 and represent a
significant component of the data set analysed.
3.2 Nuclear power plant setting
From the point of view of this study, data collection starts as soon as the required
work have been defined for a given annual maintenance. In this case, at the specified
time, the personnel enters the controlled area of either the OL1 or OL2 plants
(site visit) to carry out the work. Before doing so, the personnel logs in with their
electronic dosimeter using the work dose code assigned to the work. The personnel
will then carry out the work on the defined system. In this case, as soon as the work
itself, whether completed in one visit or not, have been completed, the personnel
leaves the controlled area and signs their electronic dosimeter out. This constitutes a
3.2 NUCLEAR POWER PLANT SETTING 23
single site visit, which always has one dose as measured by the electronic dosimeter.
A single site visit may therefore include several different work tasks under the WDC.
The data used in this study span over annual maintenances from April 4, 2012
to May 22, 2023. The starting date of 4 May 2012 was chosen because, in the two
years prior, the outputs of the OL1 and OL2 plants were each increased by 20 MW.
Additionally, the plants were upgraded with new low-pressure turbines, generator
cooling systems, seawater pumps, and internal isolation valves for the main steam
pipes [5]. The final data sample was obtained on 31 May, 2023 as the year 2024
was left out for testing purposes. Overall, between 2012 and 2023, there have been
hundreds of thousands of site visits during the annual maintenance periods, with
each visit representing a single data point from a raw data perspective. The use of
automated and access-management-dependent data collection processes has resulted
in the near complete data set, with the exception of a few observations, which are
addressed in the next Chapter 3.2.1.
3.2.1 Site visit formation
As previously stated, the dose received by personnel is subject to constraints de-
pending on different time intervals. To further illustrate this point, a male A-class
personnel may receive a dose of 1.5 mSv per day during the annual maintenance
period, for example, assuming the annual internal limit of 10 mSv is not exceeded [26].
It should be noted that this calculation does not take into account any additional
constraints, such as those specified by the WDC, which limits the dose by a per-visit
dose limit. WDC is required to enter the controlled are of the plants and is a key
element of data and radiation dose monitoring. Personnel uses this code to log in to
the controlled area to carry out their work.
WDC is made up of three parts: 1) plant ID, 2) system ID and 3) project ID.
WDC is therefore given in the form XYYYZZ, referring to the previous structure [42].
3.2 NUCLEAR POWER PLANT SETTING 24
For example, in the case of the Olkiluoto 1 plant, system 100 and project 01, the
code would be 110001. Using this system, the dose limit can be set on a work basis,
thereby allowing for better dose control [42]. This is further complemented by a dose
rate limits, which can also be utilised to alert the personnel if the predetermined
limit is exceeded. If no maximum dose or dose rate limit is set for the WDC, the
default values are used [26], [42].
In this thesis, the associated dose and alarm limits will be treated as features, with
WDCs being parsed in order to identify the specific plant and system being worked
on, as well as the corresponding project. In addition to visit-related data, this study
utilises information regarding the personnel involved. This includes the personnel’s
radiation class, the completion status of mandatory radiation work training (e.g.
entry-level and advanced training when required), and whether the personnel is
employed as a subcontractor or not. Each site visit to a controlled area is linked
to a specific time, thus enabling the dataset to be observed as a time series. The
time spent on each visit can be derived from these time-stamped records. We will be
discussing more about the data and how it can be processed for the machine learning
task in Chapter 4.1.
3.2.2 Radiation phenomena and dosage
As briefly touched upon previously, electromagnetic radiation is a collective term for
a group of radiation types, which includes γ (gamma) and X-ray (X-ray) radiation.
Unlike other forms of radiation, these are measured on a per-visit basis when entering
the controlled area at NPPs using electronic dosimeters and are therefore examined
in this study. This Chapter will provide a more detailed discussion of these types of
radiation, which will then be used as a point of reference in the next Chapter when
exploring the biological effects of radiation and the reasons behind its detrimental
effects on the human body.
3.2 NUCLEAR POWER PLANT SETTING 25
Ionising radiation can be classified into two main categories: charged particles and
neutral radiations [3]. The latter group includes gamma rays, X-rays and neutrons, of
which neutrons are not considered in this study because this is not measured in terms
of the data used. Unlike charged particles, neutral radiations do not directly ionise
matter through Coulombic interactions, meaning the repulsive forces between two
charged particles based on the distance and electric charges between them, as neutral
radiations lack the needed electric charge (energies in question are limited to a few
MeV) [3]. Instead, they interact with matter primarily through indirect processes,
including: 1) Compton effect, 2) photoelectric effect and 3) pair production [2], [3].
These result in indirect ionising radiation [2].
The Compton effect, also referred to as photon-electron scattering, represents a
process whereby a photon of energy equal to its rest mass energy, E = hν, interacts
with an electron of rest mass me, where E represents the energy of the photon, h
Planck’s constant and ν the frequency of the photon [2], [3]. Despite the electrons in
an atom being bound to the nucleus, their binding energy is considerably less than
that of typical gamma rays. As illustrated in Figure 3.1, the photon is deflected and
loses energy in the collision, becoming a photon of new energy E ′ = hν ′ and the
electron gains energy and moves away from the atom leaving it ionised [2], [3]. This
collision depends on the photon scattering angle θ and the principles of conservation
of energy and momentum [3]. In this process, the greatest energy loss by the photon
occurs when it is scattered backwards (at an angle of 180°) [3]. In the event that E
is considerably greater than the rest energy of the electron, the final photon energy,
designated as E ′, is approximately half of the initial energy [3]. Compton effect is
the most dominant at energy range of 0.1 MeV to 10 MeV [2].
3.2 NUCLEAR POWER PLANT SETTING 26
Figure 3.1: The Compton effect: photon scattering results in energy transfer to an
electron, with the photon losing energy and being deflected. The atom is left ionised
[2], [3]
Photoelectric effect (photon–electron) is an interaction, whereby a photon transfers
all of its energy to an electron, which is then ejected from the atom and loses its
energy, leaving a positively charged ion and ionising the medium [2], [3]. In this
instance, the photon is absorbed, resulting in the ejection of an electron, which is
known as a photoelectron (Figure 3.2) [2], [3]. This photoelectron gets ejected with
and energy of EeK = Eγ − I, where Eγ is photon’s energy, I is the potential for
ionisation of the electron to the atom [3]. The photoelectric effect is dominant at
lower photon energies and is only possible in the energy ranges of tens to hundreds
of kiloelectronvolts (keV) [2]. Both the photoelectric and Compton effect processes
may provide electrons with sufficient energy to ionise other atoms [3]. Additionally,
following the ejection of an electron, light emission or X-ray production may occur
[3].
3.2 NUCLEAR POWER PLANT SETTING 27
Figure 3.2: Photoelectric effect: energy is transferred from a photon to an electron.
This results in the ejection of the electron from the atom as a photoelectron, and
the creation of a positively charged ion [2], [3]
In pair production and annihilation the photon energy is converted into mass,
typically in the vicinity of a nucleus due to the presence of a strong electromagnetic
field [3]. This only occurs when the photon energy exceeds 1.022 MeV [2]. In pair
production, a photon interacts with the Coulombic field of a nucleus, resulting in
the creation of an electron (negatron) and a positron (Figure 3.3) [3]. The energy of
the photon is transformed into mass, with the total mass equalling the combined
mass-energy of the electron and positron, which is 1.022MeV [3]. Any photon energy
in excess of 1.022 MeV is distributed as kinetic energy to the electron and positron,
in accordance with the equation EpairK = E
e+
K + E
e−
K = Eγ − 2Ee0, where Eγ is the
photon energy, Ee+K and Ee
−
K are the kinetic energies of the positron and electron,
and Ee0 is the rest mass energy of the electron [3].
Once the positron has lost its kinetic energy, it rapidly combines with a nearby
electron in a process known as annihilation [3]. This process results in the complete
annihilation of both particles, accompanied by the emission of two gamma photons,
3.2 NUCLEAR POWER PLANT SETTING 28
each with an energy of 0.511 MeV [3]. The emission of these photons occurs in
opposite directions, resulting in the conservation of momentum through the transfer
of photon energy [3].
In both pair production and annihilation, high-energy particles, such as electrons
and positrons, possess sufficient energy to ionise atoms by ejecting electrons from
their outer shells as they traverse matter [2]. This process results in the production
of ionising radiation, which can subsequently cause further ionisation events as
the particles lose energy through collisions [2]. The gamma rays emitted during
annihilation also contribute to the overall level of ionising radiation, as they can
interact with matter through indirect processes, as previously outlined, which further
displace electrons from atoms and lead to ionisation [2].
Figure 3.3: Pair production and annihilation: an photon interacts with a nucleus,
producing an electron and positron. The positron eventually annihilates with an
electron, releasing two 511 keV gamma photons in opposite directions to conserve
momentum [3]
Due to these indirect processes, as well as the ability of other types of ionizing
radiation to disturb molecular bonds, the following bonds are most affected, in
order: 1) metallic bond (least), 2) ionic bond and 3) covalent bond (most) [3]. Of
these bonds, covalent bonds are the most abundant in biological tissues and are
therefore the most vulnerable to ionizing radiation [3]. This can lead, among other
things, to direct damage to cellular molecules, such as DNA, and indirectly through
3.2 NUCLEAR POWER PLANT SETTING 29
chemical reactions that can further damage biological structures [3]. Electronic
dosimeters therefore measure these effects and model the quantifiable radiation dose
to the human body. In the data, the resulting radiation dose, the predicted value in
classified form, is expressed in microsieverts µSv.
3.2.3 Biological effects of radiation
Radiation can have a wide range of biological effects, which are typically classified
into two main categories: 1) deterministic and 2) stochastic effects [8], [43]. In
addition to these, physiological effects can be classified as 1) somatic, 2) genetic
and 3) teratogenic [3]. These are combined in that deterministic effects are rapidly
apparent in cell death or malfunction in clinical manifestations, while stochastic
effects are irregular or statistical in nature, with somatic differentiation referring to
the body or its condition, genetic referring to genes that enable gene inheritance,
and teratogenic referring to fetal and embryonic development [2], [3], [8], [43]. From
the perspective of this study, we are examining the genetic effects from a stochastic
standpoint, as these are the most relevant in the context of a nuclear power plant
environment. These stochastic effects are expressed in the ways shown in Figure
3.4, either through carcinogenic cell division or errogenous repair, possibly affecting
offspring through chromosomal mutations [8]. While the effects of radiation can also
be deterministic, this is not within the scope of our focus, as it is not a probable
occurrence, despite the theoretical possibility. However, as a reference value, a sudden
dose of 0.2 Sv (200 mSv) does not show a somatic clinical effect, but a dose over 4 Sv,
for example, is likely to result in death without treatment [3]. For example, at the
time of the Chernobyl nuclear accident, approximately 134 workers and firefighters
received doses of 0.7 Sv (700 mSv) to 13.4 Sv (13400 mSv), resulting in a reported
28 deaths from direct exposure [44].
3.2 NUCLEAR POWER PLANT SETTING 30
Figure 3.4: Overview of the stochastic effects of ionising radiation (not including
deterministic effects) [2], [43]. Icons emptycell-3d-2 and emptycell-3d-3 by Servier
are licensed under CC-BY 3.0 Unported
The literature has identified a correlation between radiation dose and an increased
risk of chromosomal and chromatid abnormalities [45]. These abnormalities act as
early markers of stochastic genetic effects, such as cancer [45]. It has been evidenced
that DNA damage responses in individual cells are significant for the development of
cancer cells, in addition to gene and chromosomal mutations, even at low doses [43].
In particular, it has been identified that high dose and dose rate positively correlate
also with non-cancer mortality [46]. Chromosome analyses have demonstrated that
chromosomal defects, particularly in individuals exposed to long-term low-dose
radiation (LDR), are associated with increased genomic instability [45], [47]. Studies
have also shown that occupational exposure to LDR (a few hundred mSv) can result
in chromosomal aberrations especially in shorter-term [45]. The LDR in question
represents, in some cases, the typical dose received by personnel in their lifetime in
the context of nuclear power plants (NPPs) [4], [45]. Nonetheless, it has been found
that even when the effective doses do not exceed the permissible limit of 20 mSv
per year, personnel in nuclear power plants exposed to ionising radiation exhibit
3.2 NUCLEAR POWER PLANT SETTING 31
significantly higher levels of chromatid and chromosome aberrations compared to
control groups [45].
Additionally, a statistically significant correlation has been identified between
radiation dose and non-cancer mortality, particularly in relation to cardiovascular
disease [46]. There has also been reports of radiation exposure and a link to
general circulatory diseases [46]. This indicates that cancer is not the sole risk
factor, although it is frequently associated with the stochastic effects of radiation.
Furthermore, elevated levels of stress, anxiety, and post-traumatic stress disorder
(PTSD) have been observed among personnel and civilians following nuclear accidents
[48]. Such psychological stress may indirectly increase susceptibility to stochastic
effects.
Furthermore, it has been found that in radiation-exposed family triads where
father was occupationally exposed to radiation (mean lifetime gonadal dose from
gamma radiation 1.65 ± 0.08 Sv) experienced heightened mutation frequency, evi-
denced by an increased frequency of chromosomal abnormalities [47]. It is important
to acknowledge that these effects cannot be fully assessed in isolation, as lifestyle
factors also contribute to the overall picture. However, the existing literature tried
to account for these variables.
As previously stated, LDR remains a concern for plant personnel with regard
to genetic and epigenetic alterations, even at relatively low doses. In South Korea
between 2012 and 2021, despite average individual doses remaining well below
regulatory limits, approximately 0.39 mSv per year, a fraction of personnel received
higher doses in specific work tasks [49]. It is additionally noteworthy that typically
majority of personnel receive a minimal radiation dose, if any, but a small number
receive significantly higher doses in certain occupations [49], [50]. This highlights the
importance of the issue in certain work settings and the data imbalance that should
be considered when processing the data for this study [49]. However, the results of
3.2 NUCLEAR POWER PLANT SETTING 32
all studies are not directly comparable, because for example, Korea had an annual
effective dose limit of 50 mSv in 2023 compared to Finnish 20 mSv per year [6], [49].
At this time, taking into account the literature, it must be stated that there is an
insufficient amount of data available to conduct a comprehensive statistical analysis
of the effects of radiation, although we can identify chromosomal abnormalities [2], [3],
[47]. This is due to the ethical concerns associated with deliberately exposing humans
to harmful radiation levels for experimental purposes. The majority of available
data are derived from incidents and medical or occupational exposures, which are
insufficient for comprehensive statistical analysis [3]. A substantial proportion of
the historical data is drawn from the aftermath of the nuclear bombings in Japan in
1945 [2], [3].
However, as stated in the 2007 ICRP publication, even with exceptions, the
cellular processes and dose-response data support the premise that within low-dose
ranges (below approximately 100 mSv), the likelihood of cancer or heritable effects
increases in direct proportion to the equivalent dose in relevant organs and tissues,
even though there is an absence of direct evidence that radiation exposure leads to
heritable diseases in offspring [43]. Nevertheless, the ICRP has reached the conclusion
that there exists compelling evidence to suggest that ionising radiation can result in
heritable effects in experimental animals [43].
In the context of radiation effect models, the Linear No-Threshold (LNT) model
is regarded as the most conservative approach. LNT model assumes that the risk of
cancer or other harmful effects increases in a linear fashion with dose, even at very
low exposure levels, extrapolating to zero exposure [2], [3]. This model is also in use
at TVO. However, this model has been the subject of considerable criticism on the
grounds that it lacks direct empirical evidence, particularly in relation to low-dose
exposures [3]. Alternative models, such as hormesis suggest that there may be a
certain level of exposure below which radiation might not cause harm [3]. However,
3.3 RELATED STUDIES AND MACHINE LEARNING 33
these models are also controversial and lack scientific consensus. Therefore, it is
generally assumed that no specific threshold exists below which radiation can be
considered entirely harmless, while there is some estimates that doses under 500 mSv
does not cause genomic instability over generations [2], [3].
Figure 3.5: Radiation effect to dose models. Available data does not conclusively
support effects in the low-dose range [3]
3.3 Related studies and machine learning
This Chapter discusses the various machine learning implementations in nuclear
power plant environments that can be found in the public literature because, as
mentioned, no similar implementation of radiation monitoring using machine learning
could be found in the perspective of this study. The purpose of this Chapter is to
establish an understanding of what implementations exist, what models have been
used and what methods have been used to evaluate them. This provides the context
for the following Subchapter, 3.3.2, which outlines the definition of machine learning,
the steps involved and the machine learning techniques used in this study.
3.3 RELATED STUDIES AND MACHINE LEARNING 34
First, however, it is important to note that machine learning differs significantly
from the current, more traditional way of predicting received radiation doses. Current
manual prediction requires expert knowledge and manual work, which is also based
on trends and dose results from previous work, which can be difficult to process due
to the large amount of data. Machine learning, on the other hand, is more efficient
at handling large amounts of data and can detect non-linearity between variables,
meaning it can uncover complex patterns and relationships that traditional methods
may miss, leading to more realistic predictions. This is also reflected from the fact
that the MAPE value already mentioned was 18.76% since the last power increase
in 2012. A sufficiently accurate model can therefore reduce the human tendency to
make mistakes in a data-driven way.
3.3.1 Present-day applications
This study will review seven existing studies that have used machine learning to
predict radiation levels, although the implementations of these studies are not directly
comparable to the approach taken in this study. These seven studies focus primarily
on nuclear power, as these were selected based on the relevance to this research.
However, there are also examples of radiation prediction using machine learning in
the hospital and pharmaceutical industries.
The studies in question address either the reactor itself or the reactor building,
in addition to environmental radiation monitoring utilising machine learning. The
most closely aligned approach to the subject of this study was the estimation of the
personal dose equivalent using photon energies measured by TL dosimeters, which
was also successfully accomplished [51]. Other applications of machine learning have
been utilised, for instance, in the aftermath of the Fukushima disaster to predict
the ambient dose rate and in Germany for anomaly detection using the gamma
dose rate, taking into account environmental conditions [52], [53]. Additionally, in
3.3 RELATED STUDIES AND MACHINE LEARNING 35
the field of reactor engineering, stress intensity in reactor pressure vessels and the
impact of radioisotopes in primary coolant loops have been modelled to examine
the influence of corrosion products, with the objective of predicting and minimising
radioactive corrosion levels and, consequently, reducing radiation dose levels at the
reactor shutdown [54]–[56]. An alternative approach was the implementation of a
monitoring system utilising machine learning to monitor radioactive materials in
nuclear facilities. This system employed a sensor network to model the tracking
of radioactive sources and identify nuclide types. This system could be used, for
instance, in storage facilities for the protection of waste in the event of theft [57].
It is worthy of note that existing machine learning solutions do not inherently
utilise data that is linked to personnel working in NPPs. Therefore, the methodology
proposed in this study, which entails modelling radiation doses in a site visit oriented
manner based on the nature of the visit, represents a distinctive approach to predicting
the radiation doses associated with occupational activities. In exploring suitable
machine learning models for this methodology based on the reviewed studies, we
observed that Decision Trees (D-Tree), Random Forests (RF) and Artificial Neural
Networks (ANN) were the most common models used [51]–[53], [57]. In addition to
these, modelling was also done with Gaussian Bayesian Networks (GBN), Gaussian
Processes (GP) and Light Gradient Boosting Machines (LightGBM) [54]–[56]. A
detailed explanation of the models used in this study can be found in Chapter 3.3.2,
and therefore, they will not be discussed in this Chapter.
Approximately half of the studies discussed applied some form of cross-validation
for evaluation purposes [54], [56], [57]. In particular, regression-type problems were
evaluated using Root Mean Squared Error (RMSE) or Negative Log-Likelihood
(NLL) in several cases [52], [55], [56]. In other instances, accuracy and a type of
confusion matrix implementation were common [51], [53], [54], [57]. Additionally,
3.3 RELATED STUDIES AND MACHINE LEARNING 36
hyperparameter optimisation was utilised for the purpose of optimising the model’s
performance, for example, through the use of grid search [54].
In conclusion, the available literature demonstrates a number of different applica-
tions of machine learning in the context of nuclear power, specifically in the prediction
of radiation levels and doses. However, there is a notable absence of methods that
incorporate data directly linked to personnel activities in NPPs. This study addresses
this gap by introducing a distinctive site visit based modelling strategy to predict
radiation doses associated with site visits.
3.3.2 Overview of machine learning
In addition to the supervised machine learning models already mentioned, it should
be noted that supervision is one of the main approaches of machine learning, along
with unsupervised learning and reinforcement learning [9]. This study specifically
focuses on supervised machine learning (hereafter referred to as machine learning).
This differs from other machine learning approaches in that the training data contains
the correct answers (targets) to some input data, in the case of (xi, tj), where xi
are the inputs and tj are the targets indexed by i, j [9]. Input i runs from 1 to the
number of input dimensions m ja target j runs from 1 to the number of output
dimensions n [9]. The point of the output y (yj, where j runs from 1 to the number
of output dimensions) in this case is to produce predictions based on the input data,
which is then compared to the target values to assess the model’s accuracy [9]. This
leads to the main purpose of machine learning, which is to generalise, i.e. to create
sensible outputs from inputs that have not been observed before. In the case of
classification, we consider input vectors, from which we decide to which, in this case
discrete class N, the vector belongs [9].
Comparing the data preparation, modelling and evaluation phases of CRISP-DM
presented in Chapter 2, these phases can be described in more detail from the
3.3 RELATED STUDIES AND MACHINE LEARNING 37
perspective of the machine learning classification system shown in Figure 3.6. It is
important to note that these stages are not independent, but rather interrelated, and
that some methods combine these stages, for example, feature selection and classifier
design can occur together [58]. In the following sections, we will provide a detailed
discussion of these steps and explain what typically happens in practice, helping the
reader gain a clearer understanding of the overall process.
Figure 3.6: Design stages of a machine learning classification system [58]
In this study, the term data refers to our time-series bound features (input and
targets) discussed in Chapters 3.2.1 and 4.1. Once the data has been collected, we
want to further refine it by modifying it into new features that are more compact and
informative than the original features, often through mathematical transformations,
domain-specific knowledge or combining features [58]. This is called feature generation.
This will usually lead to a reduction in dimensionality and increase the overall quality
of the data, even though no feature selection has been made [58]. Overall, feature
generation makes the machine learning algorithms identify relevant patterns more
effectively and reduces the computational complexity [58].
The objective of the next phase, feature selection, is to select the most important
of these generated features in order to reduce the dimensionality of the input, while
ensuring that the input retains as much of its class discriminatory information
as possible [58]. The aim is to obtain features that exhibit a large between-class
distance and a small within-class variance [58]. This implies that the between-class
values should be as distant as possible, while the within-class values should be as
3.3 RELATED STUDIES AND MACHINE LEARNING 38
close to each other as possible. This objective, along with the most informative
features, is achieved through the data preparation techniques and model feature
importances outlined in Chapters 4.1 and 4.3 respectively. Explicit feature selection
is not a prerequisite for the tree-based models used in this study and thus it is
not implemented, given these models are inherently capable of selecting the most
informative features. Nevertheless, most informative features will be assessed during
the evaluation process.
The next step is choosing and designing the classification algorithm (classifier),
which depends on the available data [11], [58]. In this study, we use a multi-
class classifiers because the target features have more than two possible outcomes.
Optimizing the model’s hyperparameters, which are specific to each model, is also
important [11]. For example, a hyperparameter could be the maximum depth of a
decision tree. These hyperparameters affect how the model learns from the data and
how it behaves. In the Chapter 4.2, we will apply these machine learning techniques
to the collected and preprocessed data.
Finally, one of the most important steps is model evaluation, which determines
whether the model has actually learned from the data. This objective can be achieved
through the implementation of three different methods: 1) the resubstitution method,
2) the holdout method, and 3) the leave-one-out method [58].
In this study, holdout method is used. This approach was chosen because the
amount of data applicable in this study would be too computationally expensive to
cross-validate. In this study, we used the holdout method to divide the data into
three parts: 1) the training data, 2) the validation data and 3) the test data in a
60/20/20% split [59]. In contrast, the leave-one-out method would have allowed us
to use the entire dataset for both training and testing, but this would not have been
feasible in terms of the time required to train and test the models [58]. It is evident
that the holdout method’s limitations are twofold. Firstly, it restricts the amount
3.3 RELATED STUDIES AND MACHINE LEARNING 39
of training and test data that can be utilised, as these cannot be mixed, which can
effect the models capacity to generalise [59]. Secondly, it is challenging to determine
the optimal split between the training and test sets. Allocating a higher proportion
of the data to the training set may enhance model accuracy by reducing excess mean
error and variance associated with finite datasets [58], [59]. However, this can result
in a reduced test set, which can compromise the reliability of evaluation metrics due
to increased variability in error estimates [58]. Alternatively, a larger test set may
provide more accurate performance estimates, but at the expense of reduced training
data, which can potentially increase the classifier’s overall error probability [58], [59].
In the following discussion, the models utilised in this study will be outlined,
along with the reasoning behind their selection. These models are presented in Table
3.1. A subsequent general overview will be provided of the theoretical basis of each
model, together with an outline of their principles, features, and their relevance in
addressing the challenges of this study. These include the already mentioned data
skewness, resulting in a low number of data points in high dose rates, and the fact
that the problem definition is a multiclass task.
Model Type Algorithmic approach
Random Forest Classifier Ensemble of Decision Trees
Balanced Random Forest Classifier Ensemble of Decision Trees
LightGBM Classifier Gradient Boosted Trees
XGBoost Classifier Gradient Boosted Trees
Easy Ensemble Classifier AdaBoost Ensemble
Table 3.1: Overview of models used in this study, classification types and algorithmic
approaches [60]–[64]
Random Forest is an ensemble learning method that builds more traditional Deci-
sion Trees and combines their outcomes through majority voting into a single result
[62]. These trees are built by randomly sampling the data through bootstrapping
3.3 RELATED STUDIES AND MACHINE LEARNING 40
and introducing only a subset of features at any given sample, allowing the remaining
data to be used for Out-of-bag (OOB) sample model estimation [62]. The model
also allows the importance of features to be examined by examining how much a
given feature reduces impurity across all trees by measuring Gini impurity, which
quantifies how much classes are mixed within a given tree node [62]. Random Forest
model was selected as the primary model for this study due to its ease of use and
the fact that the Law of Large Numbers (LLN) dictates that the model consistently
converges and does not lead to overfitting issues [62]. Consequently, the predictions
of the Random Forest model are known to stabilise as the number of trees increases,
thereby ensuring consistent results [62]. This characteristic, when combined with
capacity to manage large datasets makes Random Forest a suitable option for our
initial modelling. Random Forest’s high-level logic is presented in Figure 3.7.
Figure 3.7: Random Forest simplified. Majority vote of decision trees decides the
class for the data instance [62]
3.3 RELATED STUDIES AND MACHINE LEARNING 41
The Random Forest model was further explored through the application of the
Balanced Random Forest (BRF) model, which, while aligning with the principles of
the Random Forest, features an alternative approach through the implementation
of bootstrap sampling for the minority class with replacement sampling the same
number of samples from a majority class as bootstrapped from the minority class
[63]. In vanilla Random Forest bootstrapping there is no special consideration for
class balance [62]. This approach was taken to address our data imbalance.
Extreme Gradient Boosting (XGBoost) is used in this study because, as an
ensemble learning method, which builds a strong classifier from an ensemble of
weaker classifiers aligns well with the imbalance and size of our dataset [60]. By
sequentially building decision trees where each tree corrects the errors made by the
previous ones and with the use of a gradient descent approach to minimize the loss
function, leveraging both the first and second derivatives of the loss for efficient
optimization, XGBoost suits well to our data imbalance challenges [60]. During the
model’s training, XGBoost computes a weighted score for each tree split based on a
gain metric, selecting the best splits for use and uses a level-wise tree growth strategy,
where trees are expanded level by level to maintain balance and improve efficiency
[60]. Regularisation is used to smooth out the latest learned weights, shrinkage
to scale new weights and subsampling, which is also used in the more traditional
Random Forest to prevent overfitting [60]. This makes XGBoost very sophisticated
method for a such machine learning problem, but at the same time requires more
computing power. XGBoost’s level-wise growth strategy is presented in Figure 3.8.
3.3 RELATED STUDIES AND MACHINE LEARNING 42
Figure 3.8: XGBoost’s level-wise growth strategy visualised. Black indicates terminal
nodes, leaves, while grey indicates the node selected to grow next
LightGBM is an advanced version of gradient boosting, building on the strengths
of ensemble methods such as Random Forest mentioned before [61]. The LightGBM’s
leaf-wise growth strategy differs from the building of decision trees from Random
Forests in that LightGBM uses an implementation of Gradient Boosting Decision
Trees (GBDT), which build decision trees sequentially by exploiting the residual
errors of previous decision trees by fitting the negative gradients using Gradient-Based
One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) techniques [61],
[62]. In contrast, Random Forest trees are determined without iterative refinement
and XGBoost builds trees by expanding these level by level [60], [62]. In LightGBM,
GOSS prioritises data instances with larger gradients in order to maintain efficiency
in downsampling while preserving important information [61]. Meanwhile, EFB
organises features into groups, ensuring that only a limited number of features are
considered at any given time, without compromising the quality of the model by
their contributions [61]. Consequently, LightGBM is well-suited for modelling our
large-scale dataset and outperforms the previously discussed XGBoost in terms of
computation time and memory usage [61]. LightGBM’s leaf-wise growth strategy is
presented in Figure 3.9.
3.3 RELATED STUDIES AND MACHINE LEARNING 43
Figure 3.9: LightGBM’s leaf-wise growth strategy visualised. Black indicates terminal
nodes, leaves, while grey indicates the node selected to grow next
Easy Ensemble (EE), similarly to Balanced Random Forest, is an ensemble
learning method that generates balanced sub-problems but instead uses Adaptive
Boosting (AdaBoost) classifiers to solve the given problem [64]. AdaBoost is then a
boosting algorithm that combines weak classifiers sequentially to correct errors made
by the previous classifiers by changing weights related to the misclassified instances
[64]. Easy Ensemble has been developed to handle data imbalance by undersampling
subsets of the majority class to train multiple classifiers, thus ensuring that no data is
lost as occurs with traditional undersampling methods [64]. These different classifiers,
given a subset of the majority class, are given all minority classes to achieve balanced
training data, and the previously mentioned AdaBoost algorithm is used for training
these classifiers [64]. The results of these classifiers are combined into a single output
[64]. A notable distinction between Easy Ensemble and Balanced Random Forest
lies in their utilisation of balanced bootstrap samples [63], [64]. While Balanced
Random Forest utilises these samples for training decision trees in a random manner,
Easy Ensemble utilises them to generate boosted ensembles [64]. Easy Ensemble’s
high-level logic is presented in Figure 3.10.
3.3 RELATED STUDIES AND MACHINE LEARNING 44
Figure 3.10: AdaBoost with balanced bootstrap samples used to create Easy Ensemble
classifier [64]
4 Machine learning approach
This research uses the Python programming language (version 3.12) to process data,
model and evaluate these models. Libraries such as SciPy (v1.14.1), Scikit-learn
(v1.5.2), NumPy (v2.1.2), pandas (v2.2.3) and Matplotlib (v3.9.2) are also used in
support of this research [65]–[69].
In the following subchapters, the discussion will proceed as follows: firstly, the data
will be explored, and the processes that were utilised in its handling will be described.
This will build on the features and data labels that have been previously mentioned.
Subsequently, we proceed to the modelling stage, followed by the presentation of the
modelling results.
4.1 Data
The data utilised in this study was obtained from an internal database regarding
the aforementioned radiation management system currently employed at TVO. The
processing of this data, collected from 2012 onwards at the time of the annual
maintenance periods, was initially started through database queries, mainly from the
radiation database. Additional data from other relevant databases, such as personnel,
skills management and time management databases, were then added to enrich the
dataset. The data was retrieved from the database as a CSV file, which was then
processed in a separate virtual environment. The initial data and its features are
presented in Table 4.1.
4.1 DATA 46
Raw Data Variables
IDENT 123456789
EXT_WORKER 0
ORG_NAME Teollisuuden Voima Oyj
COURSE_ENTRY 1
COURSE_ADVANCED 1
WORK_START_DATE 24.05.2023 10:51
WORK_END_DATE 24.05.2023 11:15
TIME_USED_MINUTES 24
WORK_DOSE_CODE 200000
PLANT 2
SYSTEM 0
PROJECT 0
WORKER_CLASS B
RADIATION_DOSE 0
DOSE_ALARM 300
DOSE_RATE_ALARM 250
INFO OL2 reaktorirakennus, yleiset työt (engl. OL2
reactor building, general work)
Table 4.1: Example data instance (here as randomly generated) after the initial data
query. INFO relates to the according WORK_DOSE_CODE
During the preliminary data processing, it was observed that while the data
exhibited excellent quality, it did contain some anomalous values, including visits to
plants that were inordinately lengthy. Consequently, all data points that exceeded
the 13-hour visit limit were removed. The 13-hour limit was selected as this is the
duration at which the electronic dosimeter issues an alert regarding the visit length.
4.1 DATA 47
The data was then scanned for duplicates and these were removed as a precaution.
The total number of data points removed was a few thousand. In addition to these
anomalies, the data lacked personnel classification information for approximately
100 plant visits. This was resolved by calculating the doses of the personnel in
question and comparing it with the legal regulations, thereby providing a rationale
for the categorisation of the personnel who made the visit as either category A or
B. Category A personnel can receive a maximum effective dose of 20 mSv per year
and category B personnel can receive a maximum effective dose of 6 mSv per year
[6], [16]. In addition to these preprocessing steps, the timestamps of the plant visits
were tested to ensure that the exit time could not be prior to the entry time. No
such erroneous visits were found.
Upon closer inspection of the preliminary filtered data, it was confirmed that the
radiation dose data deviated from a normal distribution, with a bias towards lower
doses. This is because most occupations do not involve high levels of exposure to
ionising radiation. This skewness can be observed in Figure 4.6, which shows the
classed radiation doses after data preparation.
The data were further processed, involving the transformation of variables into
either categorical or continuous types, among the variables, only ’dose alarm’ and
’radiation dose’ were defined as continuous, while the remainder were set as categorical.
The rationale behind setting the ’dose alarm’ as continuous is due to its dynamic
nature, whereby it is adjusted based on the characteristics of each individual’s visit
and the specific dose threshold, as previously outlined in this study. The radiation
dose was used to construct classes for the machine learning classification task, as
discussed later. After the type conversions, we proceeded to outlier detection, which
is the removal of data points that are significantly different from the rest of the data
[70]. This was done because the aim of the study was not to model anomalies, but
specifically the visits and their respective typical doses. We used the Interquartile
4.1 DATA 48
Range (IQR) proximity rule in this task because the radiation doses were not normally
distributed [70]. IQR is defined as the range of values between the first and third
quartiles of a data set. That is to imply that if a data point differs from the lower
limit of the 25th quantile by -1.5 times the IQR and from the upper limit of the 75th
quantile by +1.5 times the IQR, then the data point is classified as an outlier [70].
IQR can be calculated as follows:
x < Q1 − 1.5× (Q3 −Q1) or x > Q3 + 1.5× (Q3 −Q1)
The identification of these outliers using the IQR method resulted in the removal
of tens of thousands of instances of data. The majority of these deleted instances, or
site visits per WDC, were for the generic WDCs: 100000 and 200000. These WDCs
accounted for more than half of the total number of instances that were deleted. This
suggests that the generic codes are being misused because of their ease of use. In this
case, the personnel does not need to remember or know the exact WDC required for
the work, or that there is no specific WDC in use that would be needed to exclude
these outliers from the rest. Removing these outliers significantly improved the
performance of the machine learning models, which we discuss further in Chapter 4.2.
Figure 4.1 shows the percentage distribution of outlier counts per year for both sites.
This figure also demonstrates how the type of annual maintenance (fuel change and
maintenance and fuel change only) varies from year to year and affects the outlier
amounts.
4.1 DATA 49
Figure 4.1: Outliers yearly from the total percentage of outliers. Grouped by general
WDCs and rest are bundled together by plant (OL1 or OL2)
The raw data and the visualisations derived from them are not presented
in this study, and all the Statistics, Figures and Tables from this point
onwards have been derived from preprocessed data.
Once the outliers have been identified and removed, the data can be examined in
greater detail for the published part of this study. Following the removal of outliers,
the size of the dataset remains in the hundreds of thousands. Firstly, the objective
is to visualise the trend between the time and dose. As demonstrated in Figure 4.2,
an increase in visit time does not appear to have a significant impact on the dosage,
with the high-dose work being completed promptly. The heatmap representation
further reveals that the data is heavily skewed towards lower doses, which are also
more proportional to the time spent. The absence of prolonged high-dose visits is
evident in the heatmap, as there are no data points to model such visits.
4.1 DATA 50
Figure 4.2: Heatmap presentation of visit length and the according radiation dose in
a logarithmic scale
At TVO, personnel undergo training in radiation work through two distinct
courses: firstly, there is a compulsory entry-level course, and secondly, there is an
advanced course, which is not compulsory but is attended when necessary. Figure
4.3 illustrates the compulsory and advanced courses in their respective boxplots.
However, the interpretation of these plots is complicated by the fact that the doses
received by both groups are very close to zero. To provide a more meaningful inter-
pretation, logarithmically scaled plots were created, which revealed that personnel
who completed the advanced course received equivalent low-level radiation doses
compared to the personnel who only completed the entry-level course, yet at higher
doses, those who completed the advanced course received fewer doses. This is further
supported by the statistical analysis which showed that the mean dose received
by personnel who had only completed the entry-level course was approximately
2µSv higher than the mean dose received by personnel who had also completed the
advanced course.
4.1 DATA 51
Figure 4.3: Boxplot of radiation doses and courses completed by personnel. For data
that was normally undistributed and relational, the Wilcoxon signed-rank test was
used to show whether there were differences in doses. A p-value of 0.0 indicates a
significant difference between radiation doses between the groups
We then sought to determine whether any of the work dose codes exhibited a
high degree of significance with respect to doses. As illustrated in Figure 4.4, it was
evident that the general codes (X10000) for the containment building and the work
with the actuators for the control rods (WDC X22100) had the highest accumulated
doses since 2012, particularly for the OL1 plant. The generic codes were expected to
become over-represented in this statistic, as can be seen particularly for the OL2
plant.
4.1 DATA 52
Figure 4.4: Accumulated doses for WDCs visualised since 2012 annual maintenances
However, if we separate the system from the WDCs, we can see from Figure 4.5
that, in common between OL1 and OL2 plants, the cooling system of the shutdown
reactor has produced the highest doses (321), also more than the general codes. We
can see that systems 221, 331, 313 and 200 in particular produce the highest doses,
if we exclude 321 mentioned before, and the general systems 100 and 0. From the
general codes, it is not possible to deduce more specific reasons for the visits, which
is challenging for this study and for doses gathered, as interpretability is reduced.
Therefore, an analysis of the variation in such doses was conducted, which revealed
a significant amount, suggesting potential misuse of the codes.
4.1 DATA 53
Figure 4.5: Accumulated doses for OL1 and OL2 plant systems since 2012 annual
maintenances. System is parsed from the WDCs, for example 132100 would mean
OL1 321-system, which is the cooling system of the shutdown reactor
After exploring the data, we wanted to use feature generation to add informa-
tiveness through domain-specific knowledge [58]. Therefore, we decomposed parts of
WDCs into their own features as we already demonstrated earlier. We also added a
variable for the type of annual maintenance (fuel change and maintenance or fuel
change only) and added a feature based on the time spent on the visit to indicate
whether the visit was short, medium or long (less than 2h, less than or more than
8h). We also removed the variables related to organisations, WDC info, dates and
IDs because we thought that it would be dangerous to teach the model to predict
doses based on these variables, let alone by date or ID. We then created a discrete
class distribution from the dose, removing the continuous radiation dose from the
data to prevent data leakage. This continuous value (expressed in microsieverts)
was classified into four distinct classes, labels 0 to 3, with their respective intervals
specified in Table 4.2. The classification intervals were determined through a com-
4.1 DATA 54
bination of domain expertise and random search, with the objective of maximizing
the informative value of the labels. This approach entailed randomly shifting the
interval boundaries while maintaining proximity to the initially domain-knowledge
described intervals.
Class label Class interval* Ratio**
0 [0, 5) 53.8
1 [5, 25) 9.31
2 [25, 75) 2.45
3 [75, 546] 1.0
Table 4.2: Class labels and their respected intervals and ratios from 0 to max dose
of 546µSv
* = expressed in microsieverts (µSv)
** = relatively scaled each class count by normalizing it against the smallest class
size to highlight skewness after data preparation
Based on the ratios presented in Table 4.2, it is clear that the distribution of
radiation doses is highly skewed. The mean dose is substantially higher than the
median and the standard deviation of 21.29µSv further indicates variability in the
data, with doses ranging from a minimum of 0µSv to a maximum of 546µSv. The
75th percentile of the doses falls below the 4µSv threshold, suggesting that higher
values have a significant impact on the overall distribution.
Subsequently, given the skewed and time-series related nature of the temporal
data, we split these into the training, validation and testing sets (60/20/20) as
mentioned earlier. This split was done so that the oldest data instances ended up
in the training data and the most recent ones in the testing data, to avoid data
leakage from the past to the future and to use historical data to present how doses
have evolved through time [71]. The training data included data points from 2012 to
2018, validation from 2018 to 2021 and test data from 2021 to 2023. This analysis
would be seasonal if we used all the data available, but we removed this aspect by
4.1 DATA 55
modelling only the periods associated with annual maintenance [72]. Skewed dose
classes after splitting are illustrated in Figure 4.6.
Figure 4.6: Skewed radiation dose classes after train/validation/test split
As a result of these data processing steps we have features that are delineated in
Table 4.3. Categorical variables were one-hot encoded before assigning the features
to be trained. Normalisation of continuous variables was not performed due to the
models employed. As a result, we ended up with a few hundred features.
4.2 APPLYING MACHINE LEARNING 56
Variable Description Data Type
External Boolean indicating if the personnel is external Categorical
Entry-course Boolean indicating if the course is completed Categorical
Advanced-course Boolean indicating if the course is completed Categorical
Time in minutes Site visit length Continuous
Work dose code Code for entry to controlled area Categorical
Plant Plant that is entered Categorical
System System inside the plant Categorical
Project Project for the specified system Categorical
Personnel class Radiation work classification Categorical
Dose alarm Dose alarm used for the entry Continuous
Dose rate alarm Dose rate alarm used for the entry Categorical
Time category Low, medium or long visit time Categorical
Outage Type of the annual maintenance Categorical
Table 4.3: Variables feature engineered from the raw data
4.2 Applying machine learning
Moving forward, we will utilise the data that has been preprocessed and the models
that have been presented in order to proceed with the machine learning. The models
presented in Table 3.1 were trained using training data with different parameters.
These models with different parameters were then evaluated using validation data,
with the test set being kept truly separate from the training and validation phases
throughout this process. Just under 500 different parameter combinations were
validated between the models, and the best combination per model was selected
based on the results obtained. These best-performing models with the respected
4.2 APPLYING MACHINE LEARNING 57
parameters were then tested using test data. The models and their validated
parameters are presented in Table 4.4.
Model Hyperparameters
Random Forest max_depth: 3, 5, 7
n_estimators: 800, 1200, 1400, 1600
max_features: ’sqrt’, None
min_samples_leaf: 1, 3, 10, 15
Balanced Random Forest max_depth: 3, 5, 7
n_estimators: 800, 1200, 1400, 1600
max_features: ’sqrt’, None
min_samples_leaf: 1, 3, 10, 15
XGBoost n_estimators: 800, 1200, 1400, 1600
learning_rate: 0.01, 0.05, 0.1
max_depth: 3, 5, 7
subsample: 0.8, 1.0
min_child_weight: 1, 2, 5
LightGBM n_estimators: 800, 1200, 1400, 1600
learning_rate: 0.01, 0.05, 0.1
bagging_fraction: 0.8, 1.0
feature_fraction: 0.8, 1.0
Easy Ensemble n_estimators: 10, 20, 30, 50, 100
Table 4.4: Hyperparameters for the different models validated in this study
While validating these parameters, it was found that the imbalance of the classes,
and in particular the limited number of data instances in the minority classes, resulted
in challenges in predicting them, as would be expected. The problem was addressed
by rebalancing the minority classes inversely proportional to their frequencies, which
4.3 EVALUATION 58
resulted in improved validation results. These weights were imposed on all models
except Easy Ensemble and Balanced Random Forest, as their internal structures
are capable of handling the balancing [63], [64]. For the remaining models, internal
balancing was not used because they did not produce significantly different results.
Instead, these models were given weighted classes with the data.
An attempt was made with a data centered approach to resolve the said data
imbalance by using naive down- and oversampling methods for the training data, such
as randomisation, but these did not yield better results [73]. Rather, these worsened
the results because downsampling removes informativeness by removing data points
and oversampling produces noise to the dataset and can lead to overfitting [73]. More
advanced approaches also exist, but these were not pursued with this study.
In the next Chapter 4.3 we will address the metrics used in this study to find the
most optimal parameters obtained through validation. The same metrics are also
used for testing that is achieved by combining the training and validation data and
re-training the models using this data. The models will then be tested against the
test data and analysed to identify the areas of success and the areas that require
further development in Chapter 5.
4.3 Evaluation
In this study, Area Under the Receiver-Operating Characteristic (ROC AUC) and
Weighted F1 scores were selected as the validation and testing metrics. The definitions
of these metrics can be found in Table 4.5. In addition to these metrics, the ROC
and Precision-Recall curves (PR-curves) were analysed as well as Confusion Matrices,
alongside hyperparameter validation and final testing.
4.3 EVALUATION 59
Metric Definition
ROC AUC Score
TPR (True Positive Rate) =
TP
TP+ FN
FPR (False Positive Rate) =
FP
FP+ TN
ROC AUC =
∫︂ 1
0
TPR(x) d(FPR(x))
where TP is True Positives, FP is False Positives,
TN is True Negatives and FN is False Negatives
Weighted F1 Score
Precision =
TP
TP+ FP
Recall =
TP
TP+ FN
F1weighted =
N∑︂
i=1
wi · 2 Precisioni · RecalliPrecisioni + Recalli
where wi = niN , ni is the number of instances in
class i, and N is the total number of instances
Table 4.5: ROC AUC and Weighted F1 metrics and their respective definitions [9],
[74]
The metrics selected were based on the premise that the dataset displayed
significant imbalance, a situation in which traditional accuracy metrics may prove
to be misleading due to the dominance of the majority classes [9], [75]. ROC AUC
metric was selected as it offers a balance between the True Positive Rate and the
False Positive Rate (the larger the ROC AUC the better, 0.5 signifies random and
1.0 perfect chance) [11], [74], [75]. In imbalanced datasets ROC AUC remains
4.3 EVALUATION 60
informative due to its independence from the changes in the class distribution [74],
[75]. ROC-curves, on the other hand, can visualise the performance of classifiers
regardless of class imbalance [75]. AUC is therefore the probability that the classifier
will rank a randomly selected positive instance higher than a randomly selected
negative instance [74], [75]. Conversely, the F1 score considers both Precision and
Recall [9]. Precision ensures that the model does not misclassify negative instances
as positive, while Recall ensures that the model detects positive instances as many
as possible, meaning in that high Precision means low FPR and high Recall low
FNR (False Negative Rate) [9]. The Weighted F1 score functions similarly but adds
the weights for each class based on the number of true instances in that class, thus
reflecting performance more accurately for minority classes. Table 4.6 shows the
validation results for the best hyperparameters selected by ROC AUC score.
4.3 EVALUATION 61
Model and Results Selected Hyperparams
Random Forest
Results:
ROC AUC: 0.8434
F1 Weighted: 0.7607
max_depth: 7
n_estimators: 1600
max_features: ’sqrt’
min_samples_leaf: 1
Balanced Random Forest
Results:
ROC AUC: 0.8447
F1 Weighted: 0.7603
max_depth: 7
n_estimators: 800
max_features: ’sqrt’
min_samples_leaf: 1
XGBoost
Results:
ROC AUC: 0.8997
F1 Weighted: 0.7708
n_estimators: 1200
learning_rate: 0.05
max_depth: 7
subsample: 0.8
min_child_weight: 5
LightGBM
Results:
ROC AUC: 0.9009
F1 Weighted: 0.7674
n_estimators: 1600
learning_rate: 0.01
bagging_fraction: 0.8
feature_fraction: 0.8
Easy Ensemble
Results:
ROC AUC: 0.7706
F1 Weighted: 0.6458
n_estimators: 10
Table 4.6: Validation results for the models used, including hyperparameters and
performance metrics ROC AUC and Weighted F1 score
From the validation results in Table 4.6, it is clear that LightGBM produces the
best results, which is also reflected in the ROC AUC score for the minority classes.
There is also not much difference between Balanced and traditional Random Forest
4.3 EVALUATION 62
models, except for the predictions of class 3. However, it is already evident that the
models are struggling to accurately predict the minority classes. The best validation
result obtained with LightGBM was most affected by the change in learning rate,
which is visualised in Figure 4.7 using different metrics.
Figure 4.7: LightGBM learning rate parameter and its effect on respected metrics
According to the validation results, in the tree-based models, a trend was identified
that suggests a positive relationship between tree depth and the ROC AUC score,
along with a variable impact on the Weighted F1 score. However, the number of
trees alone exhibited minimal influence on model performance, with the exception
of XGBoost. Notably, LightGBM demonstrated a decline in performance with
increasing tree depth when this parameter was considered independently. Among
the models examined, Easy Ensemble exhibited significantly poorer performance
compared to the other models, as reflected also in its weaker performance in majority
class classification.
5 Results
After acquiring the test results, it can be determined, as in the validation, that
the LightGBM model demonstrates the most optimal performance in terms of our
classification task, despite encountering challenges. These challenges relate to the
model’s inability to effectively differentiate between minority classes. Consequently,
the majority class (0 − 5µSv) can be distinguished from the other classes with a
high degree of confidence, while the higher doses (above 75µSv) can be distinguished
with a medium degree. However, the intermediate classes 1 and 2 remain challenging
to distinguish with the available data. This can also be seen in the PR-curve
behaviour shown in Figure 5.1. PR-curve visualises Precision-Recall tradeoff at
different thresholds [75]. The models were also found to be considerably more
effective after combining training and validation data prior to testing against test
data.
With the exception of LightGBM, the interpretation of other models via the
PR-curves and Weighted F1 results has been shown to be equally effective in the
classification of classes 1 and 2. Random Forest (also balanced) performed better in
the classification of class 3 at the expense of classification of the class 0.
CHAPTER 5. RESULTS 64
Figure 5.1: LightGBM Precision-Recall curve
The models were also validated and tested without general WDCs due to the noise
they contain, yet this did not result in a positive impact on the models performances.
However, their incorporation did result in an increase in the mean ROC AUC and
Weighted F1 score by approximately 20% on average. Furthermore, a scenario was
created in which outliers were not removed, which resulted in a substantial decline
in performance, both with and without the generic WDCs.
In comparison to other models, LightGBM and XGBoost achieved higher correct
classification rates for instances within classes 1 and 2. However, their performance
was suboptimal for class 3 predictions. The Confusion Matrix of LightGBM is
presented in Figure 5.2. In contrast, Random Forests showed a 10% decrease in the
correct classification rates for classes 1 and 2, yet exhibited almost a 15% increase
CHAPTER 5. RESULTS 65
in class 3 predictions. For the class 0 classification, Random Forest demonstrated a
comparable performance to LightGBM and XGBoost, as illustrated by the Confusion
Matrix in Figure 5.3. However, no significant differences were observed between the
balanced and traditional Random Forest models. The classification performance
achieved by the Balanced Random Forest with classes 0 and 1 was marginally better
than that of the Random Forest.
Figure 5.2: LightGBM Confusion Matrix
CHAPTER 5. RESULTS 66
Figure 5.3: Random Forest Confusion Matrix
However, when solely interpreting Confusion Matrices, it can be observed that
only classes 0 and 3 achieve usable results while the results of classes 1 and 2 are
comparable to a random chance. It can be observed that the models predict larger
classes on average, which suggests that the models are not able to distinguish between
classes, especially for the higher classes.
As is evident, interpreting numerous classifications poses a challenge in managing
the entirety of the output space. When considering a classification of only four classes,
the Confusion Matrix transforms into an n× n matrix, comprising n2 − n potential
error classifications [75]. In our case, 42 − 4 = 12. This approach, however, is not
without its limitations. The prediction of classes in this case is based on argmax,
where the highest probability between the classes is taken, a method that may not
always be correct, especially if the probabilities are close to each other as a result of
imbalance. This is a weakness of the Confusion Matrices, as it is a threshold-specific
metric. To address these limitations and the resulting complexity, we created ROC
CHAPTER 5. RESULTS 67
curves for each class using a One-vs-the-Rest (OvR) approach, where we designate
one class as positive and the others as negative [75]. This approach was also used
to estimate the AUC score [75]. Due to the imbalance, their average was estimated
by micro-averaging, a method that treats each class as binary and aggregates the
average of the contribution of each class. Therefore, the ROC AUC curves are used
to examine the model’s ability to distinguish between classes [75]. Through the
comparison of all of the metrics, it is possible to address the model’s capacity to
differentiate between classes and translate that ability into predictions at the decision
level. The ROC AUC curves for LightGBM and Random Forest are presented in
Figures 5.4 and 5.5, respectively.
Figure 5.4: LightGBM ROC AUC curves, AUC and micro-average AUC scores
As can be seen, LightGBM achieves higher AUC values at all thresholds, which
is also reflected in the micro-average AUC value. In particular, LightGBM improves
significantly for class 0 and class 2. For both models, class 3 is the easiest to
distinguish. Similar behaviour can be observed for the other models considered, among
which Easy Ensemble stands out significantly with its weaker ranking capabilities.
CHAPTER 5. RESULTS 68
Furthermore, when Confusion Matrices are taken into account, LightGBM achieves
a better balance across classes, while Random Forest relies heavily on its strong
performance for class 3. Based on these metrics, including Weighted F1 scores, it
can be concluded that LightGBM has more consistent performance across classes,
while Random Forest has class-specific strengths.
Figure 5.5: Random Forest ROC AUC curves, AUC and micro-average AUC scores
However, these findings suggest that by restructuring the experimental approach
as a binary task, with a exclusive focus on predicting extreme values, the performance
of the models could be significantly improved. While the current approach does not
yield optimal results, there is nonetheless considerable value in the prediction of
extreme outcomes.
In the follow-up analysis, the feature importances are assessed to understand
which features the models prioritise to achieve these results. As illustrated in Figure
5.6, the feature importances of LightGBM reveal its significant reliance on time
spent, which is reasonable given that longer visit times result in a lower average
dose, while shorter visit times potentially lead to a higher dose on average (high-
CHAPTER 5. RESULTS 69
dose work is completed promptly). It is noteworthy that the dose limit and worker
classification hold considerable importance, with the high-dose systems also impacting
the predictive outcomes as we saw with the data analysis. A particularly noteworthy
feature is the presence of an OL2 plant, which may imply that the doses at the OL2
deviate from those at the OL1, despite both being identical. The presence of external
personnel may indicate that such contractors could be exposed to higher doses than
regular personnel due that their specialized knowledge is often required for specific
work tasks which can result in higher doses.
Figure 5.6: LightGBM feature importances
However, Random Forest places importance on different features compared to
LightGBM. While both models prioritize the time spent on a visit as the most
significant feature, Random Forest distributes its reliance more evenly across other
features, as shown in Figure 5.7. Nevertheless, LightGBM and Random Forest
share the same systems as most important features, but this was also the case
with other models. Balanced Random Forest favours almost identical features
compared to traditional Random Forest, a result that was to be expected. In
contrast, XGBoost places significant emphasis on workload code 100001 and project
CHAPTER 5. RESULTS 70
01, although this is distributed more uniformly across all features when compared
with LightGBM. However, LightGBM consistently achieves optimal results across
all metrics, suggesting that time is the most significant predictor and the personnel
dose limit and category are of considerable importance. The presence of the OL2 in
the most important features necessitates further observation, as it is an outlier that
other models fail to identify.
Figure 5.7: Random Forest feature importances
In summary, LightGBM achieves the best ROC AUC and micro-averaged scores,
while XGBoost achieves the best Weighted F1 score. The Random Forests achieved
comparable scores, while Easy Ensemble performed significantly worse in all metrics.
LightGBM achieves the most optimal results, demonstrating that it is the most
efficient model in this comparison. However, it falls short of delivering the level of
performance required for the classification task at hand. A summary of the results
can be found in Table 5.1. From the point of view of achieving the objective of this
study, we can conclude that this could not be achieved with the available data.
5.1 EVALUATION ANALYSIS 71
Model Test Results
Random Forest ROC AUC: 0.8460,
Weighted F1: 0.7325
Balanced Random Forest ROC AUC: 0.8454,
Weighted F1: 0.73088
XGBoost ROC AUC: 0.8931,
Weighted F1: 0.7532
LightGBM (Most optimal) ROC AUC: 0.8938,
Weighted F1: 0.7519
Easy Ensemble Classifier ROC AUC: 0.7705,
Weighted F1: 0.6168
Table 5.1: Overall test results with ROC AUC and Weighted F1 scores
5.1 Evaluation analysis
In addition to the prior data extraction in Chapter 4.1, an attempt was made to
integrate the work order system data with the radiation data. However, this proved
to be a particularly challenging task, which would have enabled a significantly more
accurate assessment of the scope of the visits. The system associated with each
site visit is known from the radiation database, however, integrating data from the
work order system would have provided more detailed insights into the specific tasks
performed to the component level during each site visit. Unfortunately, the attempt
to link the data by personnel ID and the time of the visit and the work proved to
be unsuccessful. This was due to the fact that a single personnel may have had
multiple work activities in progress simultaneously, which resulted in the inability
to determine the precise scope of the site visit. This would have resulted in the
generation of numerous false positives, amounting to several million.
5.2 OPERATIONAL USE 72
This serves as a reminder that the quality and informativeness of data is not
always sufficient for machine learning. In this particular instance, it can be concluded
that the radiation data lacks the necessary level of detail to generate reliable and
informative results. The lack of integration between different databases, which would
have been essential in this case, highlights an opportunity for future improvements
to enhance the informativeness of the data.
In addition to the issue of data informativeness, it may also be necessary to
consider more advanced methods for data over and downsampling, especially if the
classification task is to be made more precise by reducing the size of the class intervals.
At present, this is a highly challenging task, as obtaining data for all classes and
splits may not be possible due to time dependencies in the data. This is due to the
fact that, at higher doses, the necessary data points simply do not exist.
The situation is made even more complicated by the fact that the model in
question is meant to be used in a field that’s critical to safety and health. This makes
the performance expectations for the model extremely high, and unfortunately, these
requirements weren’t met.
However, it is important to acknowledge that this was the first experimental
setting for such a problem, which also provided valuable insights into the limitations
and possibilities of applying machine learning to similar tasks or in a broader context.
Moreover, the limitations and findings of this study offer a important source of
insight for similar tasks at TVO.
5.2 Operational use
In practice, however, the implementation of machine learning and related approaches
does not function on a create and forget basis, yet rather demands continuous
development and monitoring. Figure 5.8 below illustrates the life cycle of an ML-
enabled software system. It is important to note that these stages do not necessarily
5.2 OPERATIONAL USE 73
follow a waterfall pattern, as they can also occur in different orders or in parallel
[76]. The figure then illustrates the essential steps required to deploy and upkeep
the ML solution in production.
Figure 5.8: Lifecycle of ML-enabled software system [76], [77]
In our case, if a similar, however, better performing machine learning model were
to be put into production use, the readiness and suitability of the data must first be
assured, which requires, as in this case, domain expertise [78]. It is also important to
ensure that even if the data is suitable, that this corresponds to operational realities,
for example, whether we can always use certain features in the prediction [78].
Nevertheless, machine learning, its interpretability, the data and its processing
represent only a fraction of the entire system that requires consideration. At this
point in the research, we will focus on the steps following the model evaluation in
its broader context. It has previously been demonstrated that the quality of the
modelling itself is dependent on the data and its respective issues and as such, these
will not be revisited. However, it is still important to acknowledge the existence of
5.2 OPERATIONAL USE 74
additional limiting factors associated with modelling, in addition to what we have
already described in this study, including computational resources, complexity, and
regularisation problems [76].
In this context, model deployment includes the integration, monitoring and
maintenance of the models [76], [77]. The next phase of cross-cutting concerns brings
broader ethical, legal, trust and security issues into the picture [76]. These will be
briefly discussed below.
Integration means creating the infrastructure and interfaces for the model, but
also the workflow itself, where it is important to involve the people who created the
model with those who maintain the runtime environment, which helps to deliver
benefits in terms of quality and speed of product delivery [76].
This brings us to the next stages of monitoring and maintenance, where we can
ensure that the model is actually functioning as it should and is not deviating from
normal and expected behaviour [76]. This means that, over time, models may deviate
as data no longer aligns with their original training. [76]. These are so-called concept
drifts, where, for example, an external event affects the input data over time [76].
This enables continuous learning, allowing retraining as the model’s shortcomings
emerge [76], [77].
In the process of analysing the cross-cutting aspects of the operational use, it
becomes visible that ethical concerns result from biases that are embedded in the
training data, which in turn can lead to unintended discrimination [76]. Conversely,
legal challenges stem from the necessity for regulatory frameworks, such as the
General Data Protection Regulation (GDPR), which are designed to protect sensitive
data, but often lag behind the rapid pace of AI development [76]. In the context
of building trust with end-users, it is important to emphasise transparency, clear
communication, and the interpretability of models that are tailored to specific use
cases [76], [79], [80]. Also the use of applied black-box models should also be
5.2 OPERATIONAL USE 75
considered, as these are very difficult, if not impossible, to interpret [79]. Security
concerns, such as adversarial attacks including data poisoning, model stealing and
model inversion, must be addressed when deploying such models [76].
In the case of TVO, this signifies that in the future, the implementation of a
comparable AI solutions will require the dedicated infrastructure and computational
resources. This emphasises the need for not only proficient personnel in infrastructure
administration but also specialised knowledge in data analysis and the field of AI. It
is also important to address the role of model monitoring in ensuring the reliability of
future predictions, a matter with direct consequences for the quality of the predictions.
The use of models, particularly deep learning models, calls for cooperation with
information security, even if the models are run on the intranet due to the previously
mentioned issues. However, it is apparent that the most significant initial step is the
recruitment of skilled personnel to carry out similar projects in the future.
6 Conclusions
The objective of this study was to address radiation and its effects in the nuclear power
plant environment according to current standards and ultimately to predict radiation
doses during personnel visits to OL1 and OL2 plants. Furthermore, we considered
from a holistic perspective what such modelling would require for production use.
The study demonstrated that, in terms of the data used, doses measured by
electronic dosimeters at the time of visits are suitable for analysis. Specifically,
the radiation types measured in this context are γ (gamma) and X-ray (X-ray)
radiation, which are collectively measured in microsieverts and classified as neutral
radiations, on which the modelling was consequently based. These types of radiation
are considered as indirect processes, meaning they do not directly ionise matter.
The study found that the resulting accumulation of radiation doses can be
considered from a stochastic point of view, especially in terms of genetics. The
assessment of radiation effects at low doses, however, poses significant challenges due
to a lack of data. Partly because of these potential effects, the whole process, from
the measurement of the doses themselves to the actual visit, is closely controlled and
regulated by both TVO and STUK.
The data analysed in this study proved to be of a high quality, with the exception
of a few weaknesses, but nevertheless not sufficiently informative to provide reliable
predictions. Consequently, it was determined that the data requires further refinement
from alternative data sources. To this end, five different machine learning models
CHAPTER 6. CONCLUSIONS 77
were considered: 1) Random Forest, 2) Balanced Random Forest, 3) XGBoost,
4) LightGBM and 5) Easy Ensemble with AdaBoost. Of these, LightGBM and
Random Forest emerged as the most effective models. However, each model had its
limitations, with LightGBM demonstrating better overall performance and Random
Forest indicating class-specific strengths.
The findings of this study indicate a need for further research, particularly in
the area of data collection. It has been observed that enhancing the data might be
feasible, though this can be quite challenging in certain instances. Additionally, it
would be beneficial to explore a more comprehensive layout that incorporates data
also from other than the annual maintenance periods. In such cases, the utilisation
of deep learning systems could be a relevant approach. The use of real-time data
for model maintenance is another essential aspect that, due to the study’s limited
time and scope, was not able to be explored. This highlights the complex nature
and significant time demands of the subject.
However, the OL3 plant is in the process of generating similar data, and due
to the plant’s design, the accumulated doses will be lower, resulting in reduced
variability. Theoretically, this should allow prediction at smaller class intervals
without manipulating the data. In particular, it would be beneficial to develop a
model that compares the OL3 data with that generated at OL1 and OL2, with the
objective of achieving better results. However, this is only a future possibility, as at
the time of writing, there is only data from the first year of operation.
This research highlights the key problem with AI solutions, which is data depen-
dency. The emphasis is on quality over quantity, data must tell a story and not just
repeat itself. In similar projects, the first step is always to identify the needs and
the capabilities to meet those needs, because an AI solution can only be as good as
the data applied to it.
References
[1] O. A. Osibote, Ionizing and Non-ionizing Radiation. London, United Kingdom:
IntechOpen, 2020, pp. 1–10, 101–116. doi: 10.5772/intechopen.77474.
[2] H. Domenech, Radiation Safety: Management and Programs. Miami, FL, USA:
Springer International Publishing AG, 2016, pp. 1–71, 111–118, 143–182, 259–
274. doi: 10.1007/978-3-319-42671-6.
[3] R. L. Murray and K. E. Holbert, Nuclear Energy - An Introduction to the
Concepts, Systems, and Applications of Nuclear Processes (7th Edition). San
Diego, CA, USA: Elsevier, 2015, pp. 5–6, 72–82, 89–98, 139–149, 155–163. doi:
10.1016/C2012-0-02697-X.
[4] M. Hurlbert, L. Shasko, J. Condor, and D. Landrie-Parker, “Radiation Workers
and Risk Perceptions: Low Dose Radiation, Nuclear Power Production, and
Small Modular Nuclear Reactors”, Journal of Nuclear Engineering, vol. 4,
pp. 258–277, 2023. doi: 10.3390/jne4010020.
[5] Teollisuuden Voima Oyj. “OL1 & OL2 Ydinvoimalaitosyksiköt”. (2023), [Online].
Available: https://tvo.fi/material/sites/tvo/pdft/k9su4vcbz/OL1_ja
_OL2_-laitosyksikot._Tekninen_esite.pdf (visited on 09/05/2024).
[6] Radiation and Nuclear Safety Authority. “Radiation Protection and Exposure
Monitoring of Nuclear Facility Workers”. YVL C.2. (2019), [Online]. Available:
https://www.stuklex.fi/en/ohje/YVLC-2 (visited on 08/12/2024).
REFERENCES 79
[7] “Assessment of Compliance with YVL Instructions C.2”, Teollisuuden Voima
Oyj, Tech. Rep. 160021, 2020, OL1/OL2 (unpublished).
[8] General Principles for the Radiation Protection of Workers. Oxford, United
Kingdom: International Commission on Radiological Protection (ICRP), 1997,
pp. 5–27, 43–46, 49–51. doi: 10.1016/S0146-6453(97)88275-9.
[9] S. Marsland, Machine Learning: An Algorithmic Perspective, Second Edition.
Boca Raton, FL, USA: CRC Press LLC, 2014, pp. 6–11, 6–11, 22–25, 8–458.
doi: 10.1201/b17476.
[10] Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A Comprehensive Review on
Machine Learning in Healthcare Industry: Classification, Restrictions, Opportu-
nities and Challenges”, Sensors, vol. 23, no. 9, 2023. doi: 10.3390/s23094178.
[11] M. R. Berthold, C. Borgelt, F. Höppner, and F. Klawonn, Guide to Intelligent
Data Analysis: How to Intelligently Make Sense of Real Data. London, United
Kingdom: Springer Nature, 2010, pp. 1–35, 97–105. doi: 10.1007/978-1-848
82-260-3.
[12] Teollisuuden Voima Oyj. “Ydinvoimalaitosyksikkö Olkiluoto 3”. (2023), [Online].
Available: https://www.tvo.fi/material/sites/vanhattvo/2022082
5132746/7bmHsNHjV/ydinvoimalaitosyksikko_ol3_fin.pdf (visited on
09/05/2024).
[13] Teollisuuden Voima Oyj. “Nuclear power plant units Olkiluoto 1 and Olkiluoto
2”. (2007), [Online]. Available: https://www.tvo.fi/uploads/File/nuclear
-power-plant-units.pdf (visited on 09/13/2024).
[14] “General Procedure for Radiation Protection”, Radiation Protection Manual,
Teollisuuden Voima Oyj, Tech. Rep. 103296, 2019, OL1/OL2/OL3 (unpub-
lished).
REFERENCES 80
[15] Ministry of Justice. “Radiation Act 9.11.2018/859”. (2018), [Online]. Available:
https://www.finlex.fi/en/laki/kaannokset/2018/en20180859 (visited
on 08/12/2024).
[16] Ministry of Justice. “Government Decree on Ionizing Radiation 22.11.2018/1034”.
(2018), [Online]. Available: https://finlex.fi/en/laki/kaannokset/2018
/en20181034 (visited on 08/12/2024).
[17] “Activities in the Controlled Area”, Radiation Protection Manual, Teollisuuden
Voima Oyj, Tech. Rep. 138101, 2024, OL1/OL2/OL3/Posiva (unpublished).
[18] “Operation and Quality Assurance of the TL Dosimeter System”, Radiation
Protection Manual, Teollisuuden Voima Oyj, Tech. Rep. 105092, 2023, General
(unpublished).
[19] “Portable Radiation Monitoring Equipment Final Safety Datasheet”, Teol-
lisuuden Voima Oyj, Tech. Rep. 107779, 2024, 556/JYC - OL1/OL2/OL3
(unpublished).
[20] “User Instructions for Work Dosimeter System”, Radiation Protection Manual,
Teollisuuden Voima Oyj, Tech. Rep. 139765, 2021, OL1/OL2/OL3 (unpub-
lished).
[21] Radiation and Nuclear Safety Authority. “Radiation and Nuclear Safety Author-
ity Regulation on Measurements of Ionizing Radiation”. Annexes 1 and 2. (2021),
[Online]. Available: https://www.stuklex.fi/en/maarays/stuk-s-7-2021
(visited on 09/13/2024).
[22] MPG Instruments. “DMC2000S Operating Manual”. (2021), [Online]. Available:
https://ps-irrad.web.cern.ch/ps-irrad/assets/doc/info/DMC2000
S_Operating_Manual.pdf (visited on 09/12/2024).
REFERENCES 81
[23] Mirion Technologies. “DMC3000 Data Sheet”. (2023), [Online]. Available: http
s://assets-mirion.mirion.com/prod-20220822/cms4_mirion/files/pd
f/spec-sheets/dmc-3000-personal-electronic-dosimeter-data-sheet
.pdf (visited on 09/12/2024).
[24] “Mirion DMC 2000S SA Dosimeter Inspection and Operation Manual”, Main-
tenance Manual, Teollisuuden Voima Oyj, Tech. Rep. 139928, 2023, OL1/OL2-
/OL3 (unpublished).
[25] “Mirion DMC-3000 Electronic Dosimeter Inspection and Operation Manual”,
Maintenance Manual, Teollisuuden Voima Oyj, Tech. Rep. 179876, 2023, OL1-
/OL2/OL3 (unpublished).
[26] “Alarm Limits in Electronic Dosimeters”, Radiation Protection Manual, Teol-
lisuuden Voima Oyj, Tech. Rep. 119131, 2023, OL1/OL2/OL3/Posiva (unpub-
lished).
[27] “ALARA Program”, Radiation Protection Manual, Teollisuuden Voima Oyj,
Tech. Rep. 108286, 2023, General (unpublished).
[28] Dose Control at Nuclear Power Plants: (Report No. 120). Bethesda, MD, USA:
National Council on Radiation Protection and Measurements (NCRP), 1994,
pp. 1–20, 46–51, 57–61, 88–95. [Online]. Available: https://app.knovel.com
/hotlink/toc/id:kpDCNPPRN2/dose-control-at-nuclear/dose-control-
at-nuclear.
[29] W. R. Hendee and F. Marc Edwards, “ALARA and an Integrated Approach
to Radiation Protection”, Seminars in Nuclear Medicine, vol. 16, pp. 142–150,
1986. doi: https://doi.org/10.1016/S0001-2998(86)80027-7.
[30] International X-Ray and Radium Protection Committee or Commission (IXRPC),
“International Recommendations for X-ray and Radium Protection”, British
REFERENCES 82
Journal of Radiology, vol. 1, pp. 358–363, 2014. doi: 10.1259/0007-1285-1-
10-358.
[31] “Use of the Radiation Work Permit”, Radiation Protection Manual, Teollisuuden
Voima Oyj, Tech. Rep. 105107, 2023, General (unpublished).
[32] J. Marttila, “Ydinenergian käytön turvallisuusvalvonta: Vuosiraportti 2023”,
STUK-B, vol. 315, pp. 1–86, 2023. [Online]. Available: https://urn.fi/URN:
ISBN:978-952-309-597-7.
[33] V. Sinikka, V. Vesa-Pekka, T. Jani, T. Mikko, T. Tiina, and M. Aleksi, “Moni-
toring of Radioactivity in the Environment of Finnish Nuclear Power Plants:
Annual Report 2023”, STUK-B, vol. 318, pp. 1–47, 2023. [Online]. Available:
https://urn.fi/URN:ISBN:978-952-309-598-4.
[34] C. Schroeer, F. Kruse, and J. M. Gomez, “A Systematic Literature Review on
Applying CRISP-DM Process Model”, Procedia Computer Science, vol. 181,
pp. 526–534, 2021. doi: 10.1016/j.procs.2021.01.199.
[35] J. S. Saltz, “CRISP-DM for Data Science: Strengths, Weaknesses and Potential
Next Steps”, in 2021 IEEE International Conference on Big Data (Big Data),
New York, NY, USA: IEEE, 2021, pp. 2337–2344. doi: 10.1109/BigData525
89.2021.9671634.
[36] S. Studer, T. B. Bui, C. Drescher, et al., “Towards CRISP-ML(Q): A Ma-
chine Learning Process Model with Quality Assurance Methodology”, Machine
Learning and Knowledge Extraction, vol. 3, pp. 392–413, 2021. doi: 10.3390
/make3020020.
[37] Ministry of Justice. “Nuclear Energy Decree 12.2.1988/161”. (1988), [Online].
Available: https://www.finlex.fi/en/laki/kaannokset/1988/en19880161
(visited on 08/12/2024).
REFERENCES 83
[38] Ministry of Justice. “Nuclear Energy Act 11.12.1987/990”. (1987), [Online].
Available: https://www.finlex.fi/en/laki/kaannokset/1987/en19870990
(visited on 08/12/2024).
[39] Ministry of Justice. “Ministry of Social Affairs and Health Decree on Ionising
Radiation 22.11.2018/1044”. (2018), [Online]. Available: https://www.finlex
.fi/en/laki/kaannokset/2018/en20181044 (visited on 08/12/2024).
[40] K. Janne, “Chernobyl Accident as a Turning Point of the Developing of the
Radiation Monitoring System: Radiation Monitoring Before and After Cher-
nobyl Accident”, PhD Thesis, University of Eastern Finland, 2021, pp. 28–30.
[Online]. Available: https://urn.fi/URN:ISBN:978-952-61-3736-0.
[41] “Personal Dosage Control and Implementation of Health Examination”, Radi-
ation Protection Manual, Teollisuuden Voima Oyj, Tech. Rep. 104246, 2023,
OL1/OL2/OL3/Posiva (unpublished).
[42] “Work Dose Codes”, Radiation Protection Manual, Teollisuuden Voima Oyj,
Tech. Rep. 112406, 2023, OL1/OL2/Posiva (unpublished).
[43] International Commission on Radiological Protection (ICRP), “The 2007 Rec-
ommendations of the International Commission on Radiological Protection”,
Annals of the ICRP, vol. 37, pp. 49–58, 2007. doi: 10.1016/j.icrp.2007.10
.003.
[44] U.S. Nuclear Regulatory Commission. “High Radiation Doses”. (2020), [Online].
Available: https://www.nrc.gov/about-nrc/radiation/health-effects
/high-rad-doses.html (visited on 10/07/2024).
[45] Y. J. Kim, J. W. Lee, Y. H. Cho, Y. J. Choi, Y. Lee, and H. W. Chung,
“Chromosome Damage in Relation to Recent Radiation Exposure and Radiation
Quality in Nuclear Power Plant Workers”, Toxics, vol. 10, no. 94, 2022. doi:
10.3390/toxics10020094.
REFERENCES 84
[46] M. Gillies, D. B. Richardson, E. Cardis, et al., “Mortality from Circulatory
Diseases and Other Non-cancer Outcomes Among Nuclear Workers in France,
the United Kingdom and the United States (INWORKS)”, Radiation Research,
vol. 188, pp. 276–290, 2017. doi: 10.1667/rr14608.1.
[47] T. Azizova, E. Grigoryeva, G. Zhuntova, E. Kirillova, and C. Loffredo, “Database
of Families of Workers Chronically Exposed to Radiation: Data and Biospecimen
Resources”, Health Physics, vol. 120, pp. 201–211, 2021. doi: 10.1097/HP.000
0000000001300.
[48] A. Hasegawa, K. Tanigawa, A. Ohtsuru, et al., “Health Effects of Radiation
and Other Health Problems in the Aftermath of Nuclear Accidents, with
an Emphasis on Fukushima”, The Lancet, vol. 386, pp. 479–488, 2015. doi:
10.1016/S0140-6736(15)61106-0.
[49] C. Song, T. Y. Kong, S. Kim, et al., “High-Radiation-Exposure Work in Korean
Pressurized Water Reactors”, Nuclear Engineering and Technology, vol. 5,
pp. 1874–1879, 2024. doi: 10.1016/j.net.2023.12.048.
[50] V. V. Kosterev, A. G. Tsov’yanov, A. G. Sivenkov, and Y. N. Bragin, “Worker
Radiation Exposure”, Atomic Energy (New York, N.Y.), vol. 120, pp. 148–152,
2016. doi: 10.1007/s10512-016-0110-2.
[51] M. S. Pathan, S. M. Pradhan, T. P. Selvam, and B. K. Sapra, “A Multi-stage
Machine Learning Algorithm for Estimating Personal Dose Equivalent Using
Thermoluminescent Dosimeter”, Machine Learning: Science and Technology,
vol. 5, no. 9, 2024. doi: 10.1088/2632-2153/ad1c31.
[52] M. Sasaki and Y. Sanada, “Improvement of Training Data for Dose Rate Distri-
bution Using an Artificial Neural Network”, Journal of Advanced Simulation in
Science and Engineering, vol. 9, pp. 30–39, 2022. doi: 10.15748/jasse.9.30.
REFERENCES 85
[53] H. Breitkreutz, J. Mayr, M. Bleher, S. Seifert, and U. Stöhlker, “Identification
and Quantification of Anomalies in Environmental Gamma Dose Rate Time
Series Using Artificial Intelligence”, Journal of Environmental Radioactivity,
vol. 259–260, no. 9, 2023. doi: 10.1016/j.jenvrad.2022.107082.
[54] Y. Park, J. H. Choi, J.-B. Choi, and M. K. Kim, “A Stress Intensity Predictive
Model for Reactor Pressure Vessel via Coupled Signal Processing and Machine
Learning Model”, Journal of Mechanical Science and Technology, vol. 37,
pp. 2881–2890, 2023. doi: 10.1007/s12206-023-0514-6.
[55] P. Ramirez-Hereza, D. Ramos, D. T. Toledano, J. Gonzalez-Rodriguez, A.
Ariza-Velazquez, and N. Doncel, “Score-based Bayesian Network Structure
Learning Algorithms for Modeling Radioisotope Levels in Nuclear Power Plant
Reactors”, Chemometrics and Intelligent Laboratory Systems, vol. 237, no. 5,
2023. doi: 10.1016/j.chemolab.2023.104811.
[56] S. A. Balanya, D. Ramos, P. Ramirez-Hereza, et al., “Gaussian Processes for
Radiation Dose Prediction in Nuclear Power Plant Reactors”, Chemometrics
And Intelligent Laboratory Systems, vol. 230, no. 19, 2022. doi: 10.1016/j.ch
emolab.2022.104652.
[57] M. K. Baek, Y. S. Chung, S. Lee, I. Kang, J. J. Ahn, and Y. H. Chung,
“Design of a Nuclear Monitoring System Based on a Multi-sensor Network
and Artificial Intelligence Algorithm”, Sustainability, vol. 15, no. 7, 2023. doi:
10.3390/su15075915.
[58] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Fourth Edition.
Burlington, MA, USA: Academic Press, 2009, pp. 1–9, 262–265, 323–326, 570–
577. doi: 10.1016/B978-1-59749-272-0.X0001-2.
[59] S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in
Machine Learning”, ArXiv, pp. 1–49, 2018. doi: 10.48550/arXiv.1811.12808.
REFERENCES 86
[60] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System”, in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, New York, NY, USA: Association for Computing
Machinery, 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
[61] G. Ke, Q. Meng, T. Finley, et al., “LightGBM: A Highly Efficient Gradient
Boosting Decision Tree”, in Proceedings of the 31st International Conference
on Neural Information Processing Systems, Red Hook, NY, USA: Curran
Associates, Inc., 2017, pp. 3149–3157. [Online]. Available: http://papers.nip
s.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-d
ecision-tree.pdf.
[62] L. Breiman, “Random Forests”, Machine Learning, vol. 45, pp. 5–32, 2001. doi:
10.1023/A:1010933404324.
[63] C. Chen, A. Liaw, and L. Breiman, “Using Random Forest to Learn Imbalanced
Data”, pp. 1–12, 2004. [Online]. Available: https://statistics.berkeley.e
du/sites/default/files/tech-reports/666.pdf.
[64] X.-Y. Liu, J. Wu, and Z.-H. Zhou, “Exploratory Undersampling for Class-
Imbalance Learning”, IEEE Transactions on Systems, Man, and Cybernetics,
Part B (Cybernetics), vol. 39, pp. 539–550, 2009. doi: 10.1109/TSMCB.2008
.2007853.
[65] P. Virtanen, R. Gommers, T. E. Oliphant, et al., “SciPy 1.0: Fundamental
Algorithms for Scientific Computing in Python”, Nature Methods, vol. 17,
pp. 261–272, 2020. doi: 10.1038/s41592-019-0686-2.
[66] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine Learn-
ing in Python”, Journal of Machine Learning Research, vol. 12, pp. 2825–2830,
2011. doi: 10.48550/arXiv.1201.0490.
REFERENCES 87
[67] C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array Programming
with NumPy”, Nature, vol. 585, pp. 357–362, 2020. doi: 10.1038/s41586-020
-2649-2.
[68] W. McKinney, “Data Structures for Statistical Computing in Python”, in
Proceedings of the 9th Python in Science Conference, Austin, TX, USA: SciPy,
2010, pp. 56–61. doi: 10.25080/Majora-92bf1922-00a.
[69] J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science
& Engineering, vol. 9, pp. 90–95, 2007. doi: 10.1109/MCSE.2007.55.
[70] S. Galli, Python Feature Engineering Cookbook, Second Edition. Birmingham,
United Kingdom: Packt Publishing, Limited, 2022, pp. 1–358, isbn: 978-
1804611302.
[71] G. Bontempi, S. Ben Taieb, and Y.-A. Le Borgne, “Machine Learning Strategies
for Time Series Forecasting”, in Business Intelligence: Second European Summer
School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures.
Berlin, Germany: Springer Berlin Heidelberg, 2013, pp. 62–77. doi: 10.1007
/978-3-642-36318-4_3.
[72] M. P. Ptotic, M. B. Stojanovic, and P. M. Popovic, “A Review of Machine
Learning Methods for Long-Term Time Series Prediction”, in 2022 57th Inter-
national Scientific Conference On Information, Communication And Energy
Systems And Technologies (ICEST), Ohrid, North Macedonia: IEEE, 2022,
pp. 205–208. doi: 10.1109/ICEST55168.2022.9828618.
[73] H. Kaur, H. S. Pannu, and A. K. Malhi, “A Systematic Review on Imbalanced
Data Challenges in Machine Learning: Applications and Solutions”, ACM
Computing Surveys, vol. 52, pp. 1–36, 2019. doi: 10.1145/3343440.
[74] A. Gossmann. “Probabilistic interpretation of AUC”. (2018), [Online]. Available:
https://www.alexejgossmann.com/auc/ (visited on 10/08/2024).
REFERENCES 88
[75] T. Fawcett, “An introduction to ROC analysis”, Pattern Recognition Letters,
vol. 27, pp. 861–874, 2006. doi: doi.org/10.1016/j.patrec.2005.10.010.
[76] A. Paleyes, R.-G. Urma, and N. D. Lawrence, “Challenges in Deploying Machine
Learning: A Survey of Case Studies”, ACM Computing Surveys, vol. 55, pp. 1–
29, 2023. doi: 10.1145/3533378.
[77] J. Chandrasekaran, C. Tyler, N. McCarthy, E. Lanus, and L. Freeman, “Test &
Evaluation Best Practices for Machine Learning-enabled Systems”, arXiv.org,
pp. 1–20, 2023. doi: 10.48550/arXiv.2310.06800.
[78] N. Polyzotis, S. Roy, S. E. Whang, and M. Zinkevich, “Data Lifecycle Challenges
in Production Machine Learning: A Survey”, Sigmoid Record, vol. 47, pp. 17–28,
2018. doi: 10.1145/3299887.3299891.
[79] R. Mukhamediev I, Y. Popova, Y. Kuchin, et al., “Review of Artificial In-
telligence and Machine Learning Technologies: Classification, Restrictions,
Opportunities and Challenges”, Mathematics, vol. 10, no. 15, 2022. doi: 10.33
90/math10152552.
[80] U. Bhatt, A. Xiang, S. Sharma, et al., “Explainable Machine Learning in
Deployment”, in Proceedings of the 2020 Conference on Fairness, Accountability,
and Transparency, Barcelona, Spain: Association for Computing Machinery,
2020, pp. 648–657. doi: 10.1145/3351095.3375624.