WAR MACHINE LEARNING
AI in Defence
Lauri Vasankari
TURUN YLIOPISTON JULKAISUJA – ANNALES UNIVERSITATIS TURKUENSIS
SARJA - SER. F OSA - TOM. 83 — TECHNICA - INFORMATICA— TURKU 2026
University of Turku
Faculty of Technology
Department of Computing
Information and Communication Technology
Doctoral Programme in Technology
Supervised by
Professor, Jukka Heikkonen
University of Turku
PhD, Paavo Nevalainen
University of Turku
PhD, Luca Zelioli
University of Turku
Reviewed by
Professor emeritus, Mika Hyytia¨inen,
National Defence University
Professor, Miklo´s Kre´sz, University of
Szeged, Hungary
Opponent
Professor, Juha Ro¨ning
University of Oulu
The originality of this publication has been checked in accordance with the University
of Turku quality assurance system using the Turnitin OriginalityCheck service.
ISBN 978-952-02-0645-1 (PRINT)
ISBN 978-952-02-0646-8 (PDF)
ISSN 2736-9390 (PRINT)
ISSN 2736-9684 (ONLINE)
Painosalama, Turku, Finland, 2026
I should dedicate this dissertation to many, but it would not repay any of them.
Instead, I dedicate this work to pigheadedness, relentlessness, and perseverance.
UNIVERSITY OF TURKU
Faculty of Technology
Department of Computing
Information and Communication Technology
VASANKARI, LAURI: War Machine Learning
Doctoral dissertation, 312 pp.
Doctoral Programme in Technology
April 2026
ABSTRACT
This dissertation examines the development and integration of machine learning
within the military domain, arguing that the primary constraint and greatest opportu-
nity for advancing military Artificial Intelligence (AI) is the data ecosystem. Across
research in computer vision (CV), reinforcement learning (RL), federated learning
(FL), and generative AI (GenAI), the analyses consistently show that progress is
limited by systemic issues related to data availability, quality, and infrastructure.
The work synthesizes findings from six original publications to demonstrate that
practical military AI requires a shift from an algorithm-centric view to a holistic,
system-focused perspective that treats data as a first-class operational capability. To
bridge the gap between high-level strategy and granular technical research, this dis-
sertation adapts the Cross-Industry Standard Process for Data Mining (CRISP-DM)
as a framework for assessing military AI applications.
Key findings from the studies validate this thesis. A CV study on sonar imagery
highlighted model failure due to poor-quality sensor data, underscoring the need for
integrated data pipelines. RL research revealed that a lack of high-fidelity simulators
and operational data hampers real-world transfer. The investigation into GenAI iden-
tified a dependency on proprietary models misaligned with military needs, proposing
FL as a secure, collaborative paradigm for developing military-specific foundation
models. Finally, an ethical analysis addresses the ”reliability-oversight paradox” in
autonomous systems, proposing a new human-machine teaming model of human
support rather than simple oversight.
In conclusion, this dissertation claims that the effective integration of AI into
military forces depends on building a robust data ecosystem that includes expertise
and understanding on doctrinal and policy-making levels, data and algorithm under-
standing on the technical level as well as governance, operator-in-the-loop feedback
and annotation mechanisms, and interoperable infrastructure.
KEYWORDS: artificial intelligence, defence, military, machine learning, deep learn-
ing
i
TURUN YLIOPISTO
Teknillinen tiedekunta
Tietotekniikan laitos
Tietotekniikka
VASANKARI, LAURI: War Machine Learning
Va¨ito¨skirja, 312 s.
Teknologian tohtoriohjelma
Huhtikuu 2026
TIIVISTELMA¨
Ta¨ma¨ va¨ito¨skirja tarkastelee koneoppimisen hyo¨dynta¨mista¨ ja integrointia toimintaan
asevoimissa ja sotilaallisessa toimintaympa¨risto¨ssa¨. Tyo¨n keskeinen va¨ite on, etta¨
dataekosysteemi on seka¨ merkitta¨vin rajoite etta¨ suurin mahdollisuus sotilaallisen
tekoa¨lyn (AI) kehitykselle. Konena¨o¨n (CV), vahvistusoppimisen (RL), hajautetun
oppimisen (FL) ja generatiivisen tekoa¨lyn (GenAI) tutkimusalueita koskevat julka-
isut osoittavat johdonmukaisesti, etta¨ edistysta¨ rajoittavat systeemiset ongelmat liit-
tyen datan saatavuuteen, laatuun ja ympa¨ro¨iva¨a¨n tietotekniseen infrastruktuuriin.
Tyo¨ syntetisoi kuuden alkupera¨isjulkaisun tulokset osoittaakseen, etta¨ ka¨yta¨nno¨n-
la¨heinen sotilaallinen tekoa¨ly vaatii siirtyma¨a¨ algoritmi- ja tekoa¨lymallikeskeisesta¨ na¨ko¨kul-
masta kokonaisvaltaiseen, systeemikeskeiseen la¨hestymistapaan, jossa dataa ka¨sitella¨a¨n
keskeisena¨ operatiivisena kyvykkyytena¨. Kaventaakseen kuilua korkean tason strate-
gian ja ka¨yta¨nno¨n teknisen tutkimuksen va¨lilla¨ ta¨ma¨ va¨ito¨skirja soveltaa CRISP-
DM-viitekehysta¨ (Cross-Industry Standard Process for Data Mining) sotilaallisten
tekoa¨lysovellusten arviointiin.
Tutkimusten keskeiset tulokset vahvistavat ta¨ma¨n teesin. Konena¨ko¨o¨n keskit-
tynyt tutkimus kaikuluotainkuvista osoitti mallien epa¨onnistuvan heikkolaatuisen sen-
soridatan vuoksi, mika¨ korostaa integroitujen dataputkien tarvetta. Vahvistusop-
pimisen tutkimus paljasti, etta¨ korkealaatuisten simulaattoreiden ja operatiivisen datan
puute haittaa menetelmien siirta¨mista¨ todelliseen ka¨ytto¨ympa¨risto¨o¨n. Generatiivisen
tekoa¨lyn tutkimuksessa tunnistettiin riippuvuus sotilaallisiin tarpeisiin soveltumat-
tomista kaupallisista malleista ja ehdotettiin hajautettua oppimista turvallisena ja
yhteistoiminnallisena mallina sotilaska¨ytto¨o¨n tarkoitettujen perusmallien kehitta¨mis-
eksi. Eettinen analyysi ka¨sittelee luotettavuuden ja valvonnan va¨lista¨ paradoksia au-
tonomisissa ja a¨lykka¨issa¨ ja¨rjestelmissa¨ ja ehdottaa uutta ihmisen ja koneen yhteis-
toimintamallia, joka perustuu ihmisen tukeen pelka¨n valvonnan sijaan.
Lopuksi ta¨ma¨ va¨ito¨skirja esitta¨a¨, etta¨ tekoa¨lyn tehokas integrointi asevoimiin
on riippuvainen vankan dataekosysteemin rakentamisesta. Ta¨ma¨ ekosysteemi edel-
lytta¨a¨ asiantuntemusta ja ymma¨rrysta¨ doktriinien ja politiikan tasolla, teknisen tason
data- ja algoritmiymma¨rrysta¨ seka¨ hallintamalleja, operaattorin palautteen ja toimin-
nan huomioivia mekanismeja ja kokoavaa infrastruktuuria.
ASIASANAT: tekoa¨ly, puolutus, asevoimat, koneoppiminen, syva¨oppiminen
ii
Foreword and Acknowledgements
Apart from my supervisors, reviewers, opponent and custos, I am quite sure other
people I ought to thank for aiding me in this endeavor will not eventually read this
dissertation. I do not blame them.
Instead, as a published internal monologue, I wish to thank my supervisor, pro-
fessor Jukka Heikkonen, for his unyielding assistance and support in my complete
academic journey, through two Master’s Degrees to this Doctor of Technology de-
gree. We are still some years away from the Star Wars moment where the circle is
complete, but without Jukka, this circulation might not have ever started. Fellow su-
pervisors, Luca Zelioli and Paavo Nevalainen deserve acknowledgment due to their
support in this endeavor, and Luca also as a fellow researcher making some of the
original publications possible.
I also owe my interest in AI to my father, who gave me the initial push, now
more than seven years ago, that has thus served as the pivot point in my career, from
a naval officer to an AI professional. My life would not be on this track without him,
even if we discount the initial onset of life provided. Same applies to my mother,
albeit the scope of her influence is less poignant on subject matter expertise.
I owe thanks to my superiors and colleagues within the military as within my
current company, the most impactful being Petteri Hemminki, Christian Anders-
son and my collaborating researchers Aapo Koski, Kalle Saastamoinen, and Adrian
Borzyszkowski, as well as Mark Rempel, Marten Schaad and Maximilian Moll. Spe-
cial thanks is owed to my friend, colleague, and research partner Jan Joutsi. Support
from Matti Ristima¨ki and Heikki Ha¨rko¨nen has also been invaluable.
Finally, I owe my apologies to my loved ones, for being unavailable in this never
ending pursuit of something still out of reach. Twisting the words of Elaine Rich,
this is a journey towards things that, at the moment, remain unreached. I hope none
of you hold a grudge against me.
Just like Uncle Scrooge I find resemblance in Robert W. Service’s poem The
Spell of the Yukon:
”Yet it isn’t the gold that I’m wanting so much as just finding the gold.”
27.3.2026
Lauri Vasankari
Table of Contents
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Original Publications . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and definitions . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation, objectives, and methodology . . . . . . . . . . . 6
1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . 9
2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Academic publications . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Governmental policies and strategies . . . . . . . . . . . . . 16
2.3 Think tanks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Other literature . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 AI and Military . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 The Intelligence Artifice . . . . . . . . . . . . . . . . . . . . . 22
3.2 Military Domain . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Capabilities . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Organization . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.4 Decision-making processes . . . . . . . . . . . . . . 31
3.2.5 Military Information Systems . . . . . . . . . . . . . 33
3.2.6 On Complexity . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Domain features of data . . . . . . . . . . . . . . . . . . . . . 38
3.4 Main application areas . . . . . . . . . . . . . . . . . . . . . . 39
4 Machine Learning research areas . . . . . . . . . . . . . . . . 43
4.1 Computer Vision background . . . . . . . . . . . . . . . . . . 48
4.1.1 CV in military domain . . . . . . . . . . . . . . . . . . 53
4.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 RL in military domain . . . . . . . . . . . . . . . . . . 59
iv
TABLE OF CONTENTS
4.3 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 FL in military domain . . . . . . . . . . . . . . . . . . 66
4.4 Generative Artificial Intelligence . . . . . . . . . . . . . . . . 67
4.4.1 Natural Language Processing . . . . . . . . . . . . . 67
4.4.2 Generative Models . . . . . . . . . . . . . . . . . . . 68
4.4.3 GenAI in military domain . . . . . . . . . . . . . . . . 71
4.5 Ethical considerations regarding AI systems . . . . . . . . . 74
4.6 Testing, evaluation, validation, and verification . . . . . . . . 76
4.7 Field observations from Ukraine . . . . . . . . . . . . . . . . 78
4.8 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . 79
5 Contribution of this thesis . . . . . . . . . . . . . . . . . . . . . 83
5.1 Publication I: Deep Mix: AI in Littoral Sonar Operations . . . 83
5.1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1.2 Methods and Data . . . . . . . . . . . . . . . . . . . 83
5.1.3 Results and contribution . . . . . . . . . . . . . . . . 84
5.1.4 Author’s contribution . . . . . . . . . . . . . . . . . . 84
5.2 Publication II: Strategizing the Shallows: Leveraging Multi-
Agent Reinforcement Learning for Enhanced Tactical Decision-
Making in Littoral Naval Warfare . . . . . . . . . . . . . . . . 85
5.2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2.2 Methods and Data . . . . . . . . . . . . . . . . . . . 85
5.2.3 Results and contribution . . . . . . . . . . . . . . . . 86
5.2.4 Author’s contribution . . . . . . . . . . . . . . . . . . 87
5.3 Publication III: Reinforcement Learning for decision support
in defense and security: A systematic review . . . . . . . . . 87
5.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3.2 Methods and Data . . . . . . . . . . . . . . . . . . . 87
5.3.3 Results and contribution . . . . . . . . . . . . . . . . 88
5.3.4 Author’s contribution . . . . . . . . . . . . . . . . . . 89
5.4 Publication IV: Emerging trends in federated learning: from
model fusion to federated X learning . . . . . . . . . . . . . 89
5.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4.2 Methods and Data . . . . . . . . . . . . . . . . . . . 89
5.4.3 Results and contribution . . . . . . . . . . . . . . . . 90
5.4.4 Author’s contribution . . . . . . . . . . . . . . . . . . 90
5.5 Publication V: GenAI in Military: Trends and Opportunities . 91
5.5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5.2 Methods and Data . . . . . . . . . . . . . . . . . . . 91
5.5.3 Results and contribution . . . . . . . . . . . . . . . . 91
5.5.4 Author’s contribution . . . . . . . . . . . . . . . . . . 92
v
Lauri Vasankari
5.6 Publication VI: The dilemma of AI reliability . . . . . . . . . . 92
5.6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.6.2 Methods and Data . . . . . . . . . . . . . . . . . . . 93
5.6.3 Results and contribution . . . . . . . . . . . . . . . . 93
5.7 Conceptual framework . . . . . . . . . . . . . . . . . . . . . . 94
5.8 Methodological Framework for Synthesis . . . . . . . . . . . 96
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.1 Summary of Key Findings . . . . . . . . . . . . . . . . . . . . 98
6.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3 Limitations and Future Research . . . . . . . . . . . . . . . . 101
Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Original Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
vi
Abbreviations
Abbreviation Meaning
AGI Artificial General Intelligence
AI Artificial Intelligence
API Application Programming Interface
AR Augmented Reality
AUC Area Under Curve
BERT Bidirectional Encoder Representations from Transformers
C2 Command and Control
CNN Convolutional Neural Networks
COA Course of Action
COP Common Operating Picture
CONOPS Concept of Operations
CoT Chain-of-Thought
CRISP-DM Cross-Industry Standard Process for Data Mining
CV Computer Vision
DARPA Defense Advanced Research Projects Agency
DDQN Double Deep Q-Networks
DIANA Defence Innovation Accelerator for the North Atlantic
DL Deep Learning
DoD Department of Defence
DP Differential Privacy
DSS Decision Support System
EW Electronic Warfare
FedRL Federated Reinforcement Learning
FL Federated Learning
FMTL Federated Multi-Task Learning
FPCA Federated Principal Component Analysis
FTL Federated Transfer Learning
GAN Generative Adversarial Network
GenAI Generative Artificial Intelligence
GNSS Global Navigation Satellite System
GOFAI Good Old-Fashioned AI
Lauri Vasankari
IID Independent and Identically Distributed
IRL Inverse Reinforcement Learning
IS Information System
ISR Intelligence, Surveillance and Reconnaissance
IT Information Technology
KD Knowledge Distillation
KL Kullback-Leibler
LAWS Lethal Autonomous Weapon Systems
LBP Local Binary Patterns
LLM Large Language Model
LRM Large Reasoning Model
MADDQN Multi-Agent Double Deep Q Network
MAPPO Multi-Agent Proximal Policy Optimization
MARL Multi-Agent Reinforcement Learning
MCM Mine Countermeasure
MCTS Monte Carlo Tree Search
MDP Markov Decision Process
MDMP Military Decision-Making Process
MILCO Mine-like Contact
ML Machine Learning
MLP Multilayer Perceptron
MoE Mixture of Experts
NLM Neural Language Model
NLP Natural Language Processing
NN Neural Network
OODA Observe-Orient-Decide-Act
OR Operations Research
OT&E Operational Test and Evaluation
PCA Principal Component Analysis
PLA People’s Liberation Army
POMDP Partially Observable Markov Decision Process
POSG Partially Observable Stochastic Game
R-CNN Regions with CNN features
RAG Retrieval Augmented Generation
RF Random Forest
RL Reinforcement Learning
RLHF Reinforcement Learning from Human Feedback
RNN Recurrent Neural Network
ROC Receiver Operating Characteristic
SOP Standard Operating Procedure
SoR System-of-Record
viii
SSM Soft Systems Methodology
SSS Side Scan Sonar
STO Science and Technology Organization
SVM Support Vector Machine
T&E Test and Evaluation
TEMP Test and Evaluation Master Plan
TEVV Test, Evaluation, Validation and Verification
UAV Unmanned Aerial Vehicle
UGV Unmanned Ground Vehicle
UMAP Uniform Manifold Approximation and Projection
USV Unmanned Surface Vehicle
UUV Unmanned Underwater Vehicle
VAE Variational Autoencoder
VGG Visual Geometry Group
ViT Vision Transformer
VR Virtual Reality
WCSS Within-Cluster Sum of Squares
XAI Explainable Artificial Intelligence
YOLO You Only Look Once
List of Original Publications
This dissertation is based on the following original publications, reproduced with the
permission of the copyright holders, which are referred to in the text by their Roman
numerals:
I. Lauri Vasankari, Adrian Borzyszkowski, Luca Zelioli, Jukka Heikkonen. Deep
Mix: AI in Littoral Sonar Operations. J. Marine. Sci. Appl. (2025).
https://doi.org/10.1007/s11804-025-00695-4
II. Lauri Vasankari, Kalle Saastamoinen. ”Strategizing the Shallows: Leveraging
Multi-Agent Reinforcement Learning for Enhanced Tactical Decision-Making
in Littoral Naval Warfare,” In: Maglogiannis, I., Iliadis, L., Macintyre, J.,
Avlonitis, M., Papaleonidas, A. (eds) Artificial Intelligence Applications and
Innovations. AIAI 2025. IFIP Advances in Information and Communication
Technology, vol 712, Springer, Cham, pp 129–141, 2024.
https://doi.org/10.1007/978-3-031-63215-0 10
III. Maarten Schadd, David S. Berman, Carolyn Chen, Mika Cohen, John Dorsch,
Alexander Gegov, Maximilian Moll, Oliver Rose, Anna Ro¨sner, Kalle Saas-
tamoinen, Thomas Schiller, Andreas Strand, Lauri Vasankari, Mark Rempel.
”Reinforcement Learning for decision support in defense and security: A sys-
tematic review,” Annals of Operations Research, Springer, 2025. In publica-
tion.
IV. Shaoxiong Ji, Yue Tan, Teemu Saravirta, Zhiqin Yang, Yixin Liu, Lauri Vasankari,
Shirui Pan, Guodong Long, Anwar Walid, ”Emerging trends in federated learn-
ing: from model fusion to federated X learning”, International Journal of Ma-
chine Learning and Cybernetics, Springer, pages 3769–3790, 2024.
https://doi.org/10.1007/s13042-024-02119-1
V. Lauri Vasankari, Aapo Koski, ”GenAI in Military: Trends and Opportunities”,
Scandinavian Journal of Military Studies, 8(1), pages 416–434. 2025.
https://doi.org/10.31374/sjms.415
VI. Lauri Vasankari, ”The Dilemma of AI Reliability,” in Research Papers on Ar-
tificial Intelligence in the Military Operational Environment and Wargaming,
Saulius Keturakis, Arto Mutanen, Antti Rissanen & Jouko Vankka (eds.), Na-
tional Defence University, Series 2: Research Reports No. 9, Helsinki, 2026,
pp. 38–48. ISBN 978-951-25-3585-9.
1 Introduction
Artificial Intelligence (AI) is widely regarded as the next revolution in warfare, pri-
marily as the enabler of autonomous weapon systems [1]. Beyond the role in au-
tonomous systems AI is a multi-use, general-purpose technology with broad applica-
tions in warfighting, on different scales and tasks. Research institutions like RAND
have conducted in-depth analyses of AI military impact, highlighting that the cur-
rent development is commercially driven rather than state or defence-industry led
[2]. Consequently, AI influence is pervasive, affecting all military functions beyond
generic battlefield operations.
The application of AI in the military domain is not a new phenomenon. It can
be argued that its first use in a military context occurred during World War II. At
that time, Alan Turing and the Hut 8 team in Bletchley Park employed the Ban-
burismus procedure, a method involving sequential Bayesian probability, electro-
magnetic Bombe computers [3], and manual analysis, to improve on the work of
Polish cryptologists led by Marian Rejewski to decrypt the Enigma machine used by
the Nazi armed forces [4]. Whether this constitutes a true application of AI remains
a subject of debate and hinges on a precise definition, of which no universal standard
has been agreed upon. This matter will be elaborated upon in Section 1.1.
The current, continuing trend of AI is built on its subfield known as Machine
Learning (ML). In essence, ML is predictive modeling that utilizes an iterative train-
ing loop which is used to approximate a function that maps inputs to outputs. Hence,
it is described as learning from data. The aim is to create a model that can perform
well on an unforeseen data. Essentially, there exists a hypothesis spaceℋ of applica-
ble functions or features, namely hypothesis maps ℎ that projects input 𝑥 ∈ 𝑋 into
output 𝑦 ∈ 𝑌 [5]. Therefore, if the annotations, i.e., actual outputs are known, a
supervised learning ML algorithm can be described as
ℎ(𝑥)→ 𝑦 ∼ 𝑦, (1)
where the distance between the predicted output 𝑦 and actual, known output 𝑦 is
calculated with a cost or loss function ℒ, which is then used to update the function
ℎ ∈ ℋ to minimize the error. A classical loss function is the Euclidean distance,
ℒ(𝑦, 𝑦) = √︀∑︀𝑛𝑖=1(𝑦𝑖 − 𝑦𝑖)2. For an unsupervised learning task, where the anno-
tated 𝑦 does not exist, the error is usually calculated as distance between inputs 𝑥,
and then used to combine similar inputs into groups, a technique also known as clus-
1
Lauri Vasankari
tering. A hybrid solution known as semi-supervised learning has annotations for
some of the data, while some or most of the annotations have to be deducted for the
rest by the applied model or algorithm in the hypothesis space.
When the hypothesis map employed is a neural network with multiple layers,
the technique is referred to as Deep Learning (DL), a term referencing the net-
work’s depth [5; 6]. Fundamentally, DL neural networks, of which Multilayer Per-
ceptrons (MLPs) are the simplest and most straightforward approach, function as
universal approximators applicable to a vast range of tasks. Most, if not all, mod-
ern breakthroughs in AI stem from the use of very deep Neural Networks (NNs).
The large number of neurons, or network parameters, enables them to learn highly
complex patterns from massive datasets and to generalize effectively across diverse
application domains, from image analysis to natural language processing.
Fueled by advancements over the past three decades and now largely driven by
the private sector [7], the use of AI has also proliferated within armed forces world-
wide. As mentioned afore, in past decades the initial military interest in AI focused
primarily on machine autonomy [8]. However, since the introduction of the trans-
former architecture with its self-attention mechanism [9] and the following launch of
general-use natural language interfaces [10; 11], the focus has expanded to treat AI
as a transformative capability in its own right. The potential to process ever-growing
volumes of data is another significant driver, offering the ability to enhance situa-
tional awareness and facilitate faster, more accurate decision-making for a strategic
advantage. AI is now recognized as a fundamental technology that may alter all as-
pects of human life, including warfare. NATO, for instance, has classified AI as a
key disruptive technology, with an aim to maintain the technological edge by advanc-
ing AI [12]. Other strategic initiatives and public statements portray AI as a critical
game-changer, potentially the proverbial ”silver bullet” for achieving or maintaining
military supremacy.
Reflecting the growing interest in and the emergence of novel military applica-
tions for AI, this thesis investigates the practical application of AI, specifically ML,
across various military and defence domains and contexts. The investigation fo-
cuses on specific applications within the subfields of Reinforcement Learning (RL),
Federated Learning (FL), Computer Vision (CV), and Generative Artificial Intelli-
gence (GenAI).
1.1 Background and definitions
Common terminology and agreed-upon definitions are crucial for communicating
complex ideas, a principle articulated by thinkers such as Francis Bacon [13] and
conceptually mirrored by Rene´ Descartes’ pursuit of foundational certainty [14; 15].
Bacon cautioned that imprecise words hinder the exchange of information and the
advancement of knowledge, while Descartes advocated for clear and distinct ideas
2
Introduction
instead of solely relying on words that can easily become disconnected from the sub-
ject itself. For centuries, this pursuit of precision and shared understanding has been
essential to science and society. The absence of a universally accepted definition for
AI, coupled with the variety of terms in use and the apparent disconnect between
the word and the distinct, original idea in common discourse, present a significant
obstacle to its systematic adoption.
The field of military AI is particularly affected by this ambiguity, as the act of
defining terms is a strategic decision in itself. The Defence Acquisition University
under the U.S. Department of Defense (DoD), recently renamed as Department of
War, has highlighted a critical need to ”align around logical AI definitions and ter-
minology” [16], publishing its own analyses and glossaries. In academia, Russell
and Norvig [17] provide a comprehensive overview of various definitions for AI,
discussing the merits and drawbacks of each. For the purposes of this thesis, AI is
defined as a computational system that applies logic and probabilities to solve prob-
lems that traditionally require human intelligence or are beyond human capabilities.
This definition builds upon Elaine Rich’s assertion that ”AI is the study of how to
make computers do things at which, at the moment, people are better” [18]. By this
standard, the efforts of Hut 8 can be classified as an early AI application; despite
the absence of digital computers, the computational processes employed exceeded
the human capabilities of the era. Since then, the field has advanced rapidly, with
modern AI solutions demonstrating near-ubiquitous applicability, as documented in
the AI Index Report 2025 [7]. The report indicates that AI has matched or surpassed
human baseline performance in many narrow fields, leaving only the most complex
reasoning tasks as the domain of human intelligence.
In contemporary discourse, the term AI is often used to describe an active, human-
like entity or system, a tendency known as anthropomorphism, which means the at-
tribution of human characteristics to non-human entities [19]. One would not, for
example, use the term mathematics in such sense. To maintain terminological con-
sistency, this thesis avoids treating AI as an anthropomorphic agent that interacts
with its environment in a human-like manner. McDermott [20] and Mitchell [21]
have discussed this issue in terms of wishful mnemonics: terms that falsely sug-
gest human-like properties in AI applications. A prominent current example is the
term hallucination, used to describe the phenomenon where Large Language Mod-
els (LLMs) produce fallacious or nonsensical outputs due to their non-deterministic
architecture [22; 23]. While the term evokes a human-like subjective experience, the
underlying computational causes are well understood. A more precise term might be
confabulation, though it could also be seen as anthropomorphic. Therefore, a neutral,
non-anthropomorphic term such as erroneous generation is preferable.
At the highest level, AI can be divided into symbolic and connectionist ap-
proaches. The symbolic field, which relies on programmed rules and knowledge
bases, is often referred to as Good Old-Fashioned AI (GOFAI) and has seen di-
3
Lauri Vasankari
minished focus in the modern era, though its principles remain crucial to software
development [17]. In contrast, the connectionist field, which focuses on learning pat-
terns from data, has dominated recent decades. The most prominent connectionist
approach, ML, is a subfield of AI that augments the parent definition with the con-
cept of learning from data without explicitly programmed rules [17]. In this context,
”learning” is the iterative process of tuning a model’s parameters to minimize error
on a given dataset, thereby improving the accuracy of its hypothesis map ℎ as shown
in Equation 1.
Instead of referring to AI as a monolithic entity, this dissertation considers its ap-
plicable forms to be AI models. Building on this framework, an AI system is created
when an AI model is integrated into a broader Information System (IS). Alterna-
tively, in the symbolic GOFAI paradigm, an AI system can be built by algorithmi-
cally replicating the knowledge and decision-making processes of human experts, a
leading approach from the 1950s to the 1980s [17]. Although some modern regu-
latory frameworks, such as the EU AI Act [24], explicitly exclude such rule-based
systems from their scope, this dissertation considers them as AI systems because they
fit the thesis’s definition of a computational system applying logic to solve problems
that originally require humans.
The term model itself carries multiple meanings. In Operations Research (OR),
it refers to a mathematical formulation of a problem to be optimized [25; 26]. In
symbolic AI, a logical model is a set of rules expressed in formal logic [17]. In ML,
a model is the mathematical function produced by a training algorithm, defined by
its architecture and a set of learned parameters. In RL, however, the term model typ-
ically refers to a representation of the environment, while the AI artifact is called an
agent [27]. An RL agent learns a policy through trial-and-error interaction with the
environment. To avoid ambiguity in this dissertation, AI model will refer to an ML
artifact or a system’s core logic, while environment model will be used exclusively
in the context of RL. This convention ensures that ”AI model” consistently refers to
the mechanism by which an AI system maps inputs to outputs.
Integral to the development of an AI model are the processes of training, vali-
dation and evaluation. Training is the iterative process where model parameters are
tuned to minimize error on a training dataset. A separate validation set is used to
monitor the training process. This dataset allows for the assessment of the model’s
performance on unseen data during training. While providing insight to the training
accuracy, it also helps to assess overfitting, a situation where the model memorizes
the training data at the expense of generalization. Conversely, underfitting occurs
when a model is too simple to capture the underlying patterns in the training data.
Finally, an evaluation set, also known as a test set, is used to provide an unbiased as-
sessment of the final model’s expected real-world performance. The need for larger
datasets for more complex models is explained by concepts such as Rademacher
complexity, which measures a model’s capacity to fit random noise [28].
4
Introduction
To properly scope this research, the terms national security, defence and military
must also be clarified. National security is the broadest of these, encompassing the
safeguarding of a nation-state against all existential threats to its core values, terri-
tory, and population through the coordinated use of diplomatic, informational, mil-
itary, and economic power [29]. It addresses a wide array of challenges, including
but not limited to military threats.
Defence is the specific subset of national security concerned with countering
external military threats. As a concept, it encompasses the full spectrum of state
measures to resist a military attack. Its primary political objective is the preserva-
tion of the state, distinguishing it from offense, which aims at conquest [30]. While
the distinction can blur at the tactical or operational level, where offensive actions
may serve a strategic defensive goal, the overarching purpose remains preservation.
For this thesis, defence is defined as the comprehensive framework of national capa-
bilities, including military forces, strategic doctrines, industrial resources, and tech-
nological systems—oriented toward deterring aggression and protecting the nation’s
sovereignty and interests from external threats. The military, in turn, is the principal
instrument of the defence framework. It refers to the state-sanctioned armed forces
that constitute the state’s monopoly on the legitimate use of physical force [31]. The
military serves as an instrument of political will, capable of applying force, or the
threat of force, to achieve political objectives [30]. This definition excludes non-state
actors such as private military contractors (PMCs), insurgents, and terrorist organi-
zations.
A doctrine describes the fundamental principles that guide how military forces
conduct their operations and actions to reach their objectives, of which the U.S.
doctrine for the armed forces [32] acts as an example. A Concept of Operations
(CONOPS), as described in Joint Operation Planning [33], is a more concise ex-
pression of a commanders intent and the path to execution, usually developed for a
specific task, operation or mission. A Standard Operating Procedure (SOP) is, as de-
scribed in ADP 5-0 The Operations Process [34], a detailed, step-by-step, instruction
that describes how to perform a recurring, preplanned or routine task or action, for
example a river crossing for maneuverable ground forces.
Regarding the premise of AI in the military domain, the emphasis is often on
the speed and cognitive intelligence that can be acquired through deploying novel
AI based technology. These features are often framed as decision advantage [35];
making better decisions faster, and being more resilient against adversary’s devel-
opment and actions. Decision advantage and resilience are, however, not new or
revolutionary ideas, as these attributes have been highlighted by decorated strategists
since Sun Tzu [36]. He highlighted foreknowledge, deception, speed, momentum
and formlessness, which correlate with intelligent actions, intelligence, speed and
resilience. As an another example, Alexander Suvorov highlighted three principles:
speed, ”eye judgment”, and onslaught [37]. The eye judgment or ”eye measure”
5
Lauri Vasankari
refers to commanders battlefield intuition, the ability to assess a situation instantly
and accurately, constituting of the whole context that encompasses the terrain, the
enemy, own troops, decision points and timing. From a chosen perspective, the
adaptation of AI in military domain can be seen as the latest frontier at which these
recognized factors are taken to the next level in applying the same ancient principles
of successful warfighting.
1.2 Motivation, objectives, and methodology
To address the research gap described in detail in Chapter 2, this research investigates
the practical applicability of ML into military domain from multiple perspectives and
application levels to identify solutions, challenges and future trends that will define
the battlefields of the upcoming decades. There has been a considerable push in mil-
itary AI research in the past years, as denoted in Chapter 2, of which the AI vision
of the United States Department of Defense serves as an example [38]. Simultane-
ously, the perception of AI and its implications in warfare have indoctrinated a lot
of variance, when experts and recognized spokespeople for AI claim more and more
extravagant promises for the near feature.
The underlying, system-level hypothesis is that AI can and will act as a core ca-
pability that will largely contribute to the success of the party that better utilizes it
in operations, as predicted by NATO [12; 39] and other major military stakehold-
ers [40; 41]. Hence, the purpose of this study is to provide an expert insight on
the imminent implications of AI in the military context and what future applications
and impacts can be deemed most likely according to the current state of research,
development and deployment. The research contribution lies in rooting the current
scope into the scientific background and quantifying the claims and future visions
into respective truth anchors. Quantification, in this case, refers to projecting the
theories into real-life use cases that provide concrete indications of the actual appli-
cability and capabilities beyond hypotheses. The underlying motivation and required
expertise is inherited from the author’s military background and experience, which
is combined to the field of view on the implications of AI and ML. Failure to un-
derstand the premise, requirements and limitations of AI as a technology can and
arguably will have a tremendous impact on future conflicts that may shape the future
of, for example, Western democracies. Therefore, the primary motivation for this
dissertation is to serve as an exploratory research to aid in estimating the maturity
and the depth of reach of the use and applicability of AI in Western military forces.
While the research is general in its methodology and aims to produce generally ap-
plicable results, the research perspective inherits an European point of view from to
the author’s background and motivation to partake and enhance the development of
European capabilities in this regard.
The objectives of this thesis are
6
Introduction
• Performing a system-level investigation into the applicability, impact and needs
of military AI capabilities.
• Providing empirical evidence on the performance and shortcomings of a selec-
tion of state-of-the-art ML methods in narrow military problems.
• Providing insight into future directions and challenges of novel ML research
areas within the military domain.
• Integrating ethical considerations as a system-level factor into military AI ca-
pability development.
• Adapting Cross-Industry Standard Process for Data Mining (CRISP-DM) frame-
work as a methodological tool to bridge technical AI research with capability
development and strategic level decision-making.
The objectives are divided into two groups, technology assessment and applica-
bility assessment, that build understanding on different levels.
Technology assessment utilizes a systematic review on ML paradigms and meth-
ods, namely RL, FL and GenAI, to provide background information on current state
of the art as well as challenges related to each technology.
In applicability assessment, the mathematical premise, computational considera-
tions and the possibilities and challenges related to these technical factors are exam-
ined through applicatory research. ML methods, in this case from the fields of CV
and RL, are applied to solve constrained and narrow military problems to provide the
concrete evidence and generalizable insights.
Together, these points of view are combined into the high-level understanding of
the possibilities and limitations of AI in the military, enhanced by practical issues
such as data availability, security constraints, problem complexity, explainability of
outcomes and ethical considerations. The applicability of the CRISP-DM process
from industry to defence is evaluated as a methodological approach and an objective.
The justification for this displayed objective selection lies in the system-level
point of view of this research, which itself stems from the hypothesis that narrow
applicability studies nor systematic reviews on a certain subtopic can tackle the fun-
damental bottlenecks of military AI development. This hypothesis has been worded
in author’s previous thesis on military sciences, which explored the use and devel-
opment of AI from the perspective of the Finnish Navy, and stated in 2022 that ”the
degree to which the data could be utilized was very low, and collecting the data
was challenging. This was not due to organizational resistance: the reception to
all information requests was very positive, and there was expressed interest in the
study’s results. The difficulties stem from data capture and archiving practices, the
distributed locations of databases, fragmentation of information, and security clas-
sifications. This is a similar challenge to that faced by the U.S. Navy, for which
assembling sufficiently large data volumes into a usable form is identified as a crit-
ical need to enable research and development work” [42]. These observations gave
7
Lauri Vasankari
grounds to formulate the hypothesis that over different paradigms and research areas,
the system-level issues are largely similar, and the bottleneck is less in the AI meth-
ods and algorithms and more in the overall digital readiness to develop and deploy
such capabilities.
In summary, this research does not dive particularly deep into a particular tech-
nology within the field of AI, but instead examines the broad scope of methods and
solutions in the military context from a system level perspective, providing grounded
and thought-through insight on the military domain, and an assessment of applicable
methodologies to create insight and establish functioning frameworks and policies
to enable and exploit novel AI solutions. The overarching scientific contribution is
the definition and validation of the ’Military Data Ecosystem’ as a primary capa-
bility. This thesis demonstrates that the effective integration of AI is not primarily
an algorithmic or AI model capability challenge, but a systemic one, providing a
new theoretical framework that redefines data from an ephemeral byproduct into a
managed operational asset.
To achieve these objectives, this thesis is built upon a series of targeted studies.
The objective of technology assessment is primarily addressed through a system-
atic reviews of Reinforcement Learning (Publication III) and Federated Learning
(Publication IV). The applicability assessment is realized through hands-on research
in applying Computer Vision to sonar imagery (Publication I) , using Multi-Agent
Reinforcement Learning for tactical decision-making (Publication II) , exploring op-
portunities in Generative AI (Publication V) , and examining AI deployment ethics
(Publication VI).
The research presented in this dissertation is conducted as a cumulative work
based on six original, peer-reviewed publications. This compilation thesis allows
for an in-depth exploration of multifaceted research questions through a series of fo-
cused studies. The individual methodologies employed in each study, ranging from
systematic literature reviews to the empirical application of machine learning mod-
els, are detailed within the respective publications (I-VI).
The overarching methodology for this dissertation is synthesis. It follows a struc-
tured approach to build a comprehensive understanding of the application of ML in
the military domain. The research strategy was designed to align with the objectives
outlined above, progressing from foundational technology assessments to practical
applicability assessments.
The process involved:
1. Identifying Core Research Areas: The primary research areas of CV, RL, FL,
Natural Language Processing (NLP), and GenAI were selected based on their
emerging prominence and disruptive potential within the defence sector, as
well as the fundamental differences between the paradigms. Essentially, CV
and NLP can be viewed as ML paradigms on different modalities that can be
8
Introduction
turned into generative methods as GenAI, while RL is a different approach
to learning altogether, and FL functions as a possible umbrella for distributed
application of any of these paradigms.
2. Systematic Investigation: Each core area was investigated through one or more
original publications. This involved both systematic reviews of existing lit-
erature to establish the state-of-the-art and challenges (as in Publication III
and Publication IV) and applicatory research where ML models were devel-
oped and tested against specific military problems (as in Publication I, Pub-
lication II, and Publication V) and an ethical, epistemic analysis of the non-
technical challenges with AI.
3. Synthesizing Findings: The final step, which is the primary work of this the-
sis’s introductory and concluding chapters, is to synthesize the findings from
these individual publications. This synthesis aims to construct a holistic view,
connecting the low-level technical insights from narrow problem-solving to
the high-level strategic and operational implications for military forces.
The theoretical framework through which the original publications are synthe-
sized is CRISP-DM. While not exactly academic, it shares resemblance to well-
established theoretical methodologies such as Soft Systems Methodology (SSM)
[43] and OR [26]. While SSM and OR are more profound scientific methods, CRISP-DM
brings the theoretical background into a concrete and applicable process that is meant
to provide concrete results for business and enterprises. Hence, for the scope of this
research, CRISP-DM is transformed into a military-compatible format to assess the
findings of the original papers. The layout of the CRISP-DM framework is displayed
and described in Chapter 3.
The analysis was supported with a research visit that was conducted to Ukraine,
an unfortunate but prime example of modern nationwide warfare in effect. The au-
thor visited several sites in Kyiv region between July 15 and July 25, 2025, while
participating in a defence-focused venue to meet with local and international star-
tups and military personnel to gather insights into the current state of technology and
innovation, and the applicability of AI in the contemporary forms of tactical warfare.
1.3 Organization of the thesis
This thesis is structured to guide the reader from fundamental concepts of AI and
the framework of military context towards specific, tangible research contributions,
and finally to a high-level synthesis of the findings. An in-depth literature review
in Chapter 2 provides an overview of the field from multiple publication perspec-
tives including academia, governmental publications, think tanks, and non-scientific
expert literature. Chapter 3 provides the necessary background on AI, defence and
military organizations. Chapter 4 introduces the key ML research areas that form
the basis of the original publications. Chapter 5 then details the core contribution of
9
Lauri Vasankari
this thesis by summarizing and synthesizing the findings of the six included publica-
tions, demonstrating a bottom-up approach where practical, low-level insights from
specific applications are used to inform high-level considerations about the future of
AI in defence. Finally, Chapter 6 concludes the dissertation by summarizing the key
outcomes and discussing their broader implications and future research directions.
10
2 Literature review
This section reviews some of the most influential work on AI in military context, fo-
cusing on expert publications, dedicated think tanks as well as recognized scientific
papers. This literature review creates the background for this dissertation and high-
lights the research gap that is addressed, as despite the breadth of research, there is a
lack of operational-level bridging to fully integrate AI capabilities into the warfight-
ing reality.
2.1 Academic publications
The AI research field is expanding at an accelerating pace, also in the defence and
military fields. For example, a ScienceDirect database query ”(”artificial intelli-
gence” OR ”machine learning”) AND (military OR warfare OR ”armed forces” OR
navy OR ”air force” OR army)” for title, abstract or author-specified keywords, fil-
tered to include engineering and computer science papers, returned 408 results from
1992 to halfway through 2025 when queried on first of August in 2025. The number
of research papers per year grows exponentially, as shown in Figure 1. The words
”defence/defense” and ”security” are excluded as they induce hits on cyber defence
and security, which is often unrelated to stark military context, albeit being just as
applicable to the military as well. The sheer volume and rapid growth of this techni-
cal literature make a comprehensive review intractable, but also highlights a critical
challenge: the potential for a widening gap between the highly specialized academic
research and the strategic-level policy discussions reviewed later.
11
Lauri Vasankari
Figure 1. ScienceDirect publications per year for military AI indicating a growing trend.
To narrow down the results, being a dissertation on ML and AI, the bibtex in-
formation with abstracts was downloaded, combined and processed with three small,
locally run language models, Llama 3.1 7B [44], Mistral 7B [45] and Gemma3 4B
[46]. The task of the language models was to determine, based on meta informa-
tion and abstracts, if the paper is actually focused in the military field and not just
mentioning it in some context, according to the following prompts:
PROMPT_SCHEMA_EXAMPLE = {
"name": "<string: article title>",
"is_military_aiml": "<boolean: true if the abstract is about
AI/ML in a military context>",
"topic": "<one of TOPIC_CHOICES>",
"method_type": "<one of METHOD_CHOICES>",
"key_findings": "<string: 1-3 concise points summarizing the
key findings>",
}
SYSTEM_INSTRUCTIONS = (
"You are a meticulous research analyst. Read the abstract and
metadata. "
"Decide if the work is about artificial intelligence or machine
learning in a MILITARY context. "
"Choose ONE topic and ONE method_type from the provided
choices. "
"Answer STRICTLY as minified JSON matching the schema. Do not
include explanations, Markdown, or backticks."
)
USER_TEMPLATE = (
"Paper metadata as JSON follows. Return a single JSON object
with keys: name, is_military_aiml, topic, method_type,
key_findings.\n\n"
12
Literature review
"ALLOWED TOPICS: {topics}\n"
"ALLOWED METHOD TYPES: {methods}\n\n"
"JSON SCHEMA EXAMPLE (values are placeholders):\n{schema}\n\n"
"PAPER: {paper_json}"
)
After the processing, the results were examined for all 408 papers, and if the
paper got marked as relevant (is military aiml = True) by at least two out of three
small models, it was examined more closely. This approach narrowed down the
search to 72 papers.
This analysis of 72 scholarly articles reveals a clear and concentrated focus
within the domain of computational intelligence in military applications. The litera-
ture is predominantly characterized by applied research aimed at developing tangible
solutions, particularly in the areas of autonomous systems, intelligence gathering,
and decision support. The distribution of research topics underscores a significant
academic and practical interest in three primary areas, which together account for
nearly 78% of the reviewed literature.
Command and Control (C2) and decision support is the most dominant theme,
with 20 articles. The research explores a wide scope of topics from systems that
simulate operational procedures [47] to providing early warnings to improve inter-
national stability [48].
Autonomous systems & Robotics are the second-largest category with 10 articles
followed by Intelligence, Surveillance and Reconnaissance (ISR) with 9 articles. ISR
research highlights the critical role of data processing and analysis in modern military
operations. The focus is on leveraging computational methods to extract actionable
intelligence from vast amounts of sensor data. Uncrewed systems research is heavily
focused on practical applications, such as using machine learning for real-time object
recognition for unmanned vehicles, e.g., Buluswar and Draper [49], and predicting
structural responses to blast loads [50], indicating a drive towards creating more
resilient and intelligent unmanned platforms.
Other topics such as Logistics and maintenance (8 articles), Cybersecurity (6 ar-
ticles), Medical (3 articles), and Personnel (3 articles) represent smaller but notable
areas of research. In contrast, foundational domains like Electronic Warfare (1 ar-
ticle) and Communications (1 article) appear significantly underrepresented in this
body of literature. For Communications, it has to be stated that three papers concern
networks, but under another topic, such as Unmanned Aerial Vehicles (UAVs) or cy-
ber security [51; 52; 53]. The analyzed papers are represented in as a topic summary
in Table 1.
The methodological landscape of this particular sample is overwhelmingly skewed
towards practical implementation, reinforcing the applied nature of the research field.
Application and implementation was the methodological approach for a vast major-
ity of the papers in Table 1, covering 51.39% of the papers. This indicates that the
13
Lauri Vasankari
Topic Count Citations
Command & control / deci-
sion support
20 Aˆngelo Lellis Moreira et al. [54]; de Arau´jo Costa et al.
[55]; James and Herget [56]; Jiang et al. [57]; hsien Liao
[47]; Zabala-Lo´pez et al. [58]; Sa´nchez-Ruiz and Miranda
[59]; Liebowitz and Davis [60]; Liu et al. [61]; Masud et al.
[62]; No˜mm and Venables [63]; Oh et al. [64]; Perry et al.
[65]; Mechergui and Jayakumar [66]; Mendonc¸a et al.
[67]; Xia et al. [68]; Yadav and Kim [69]; Aha [70]; Scrim-
geour [48]; Zhao et al. [71].
Autonomous systems &
robotics
10 Altinors et al. [72]; Batista et al. [73]; Buluswar and
Draper [49]; Fualdes and Barrouil [74]; Gilmore [75];
Hosseinzadeh et al. [51]; Kaur et al. [53]; Sutton and
Roberts [76]; Rahmani et al. [77]; Cai et al. [78].
Intelligence, surveillance &
reconnaissance (ISR)
9 Akbal et al. [79]; Wei et al. [80]; Guo et al. [81]; Zhao
and Morikawa [82]; Hashemi and Hall [83]; Kılıc¸ et al.
[84]; Kwon and Lee [85]; Mehta and Shah [86]; Petrov
et al. [87].
Logistics & maintenance 8 Baker et al. [88]; Mohril et al. [89]; Candelieri et al.
[90]; Bortolan Neto et al. [50]; Li et al. [91]; Malkoff
[92]; Vasilikis et al. [93]; Boutselis and McNaught [94].
Cybersecurity & electronic
warfare
6 Akbani et al. [52]; Whelan et al. [95]; Sojitra et al. [96];
Maathuis and Cools [97]; Almaslukh [98]; Shamshirband
et al. [99].
Personnel 3 Hoecherl et al. [100]; Wasilefsky et al. [101]; Zhang et al.
[102].
Medical 3 Satava [103]; Ahamed et al. [104]; Gondalia et al. [105].
Human–machine teaming /
HCI
2 Canan et al. [106]; Sheridan [107].
Air operations 2 Wittig and Onken [108]; De Giorgi and Quarta [109].
Small arms 2 Chandan et al. [110]; Yang et al. [111].
Wargaming & simulation 2 Klahr [112]; Knapp et al. [113].
Targeting & fire control 2 Govindarajan et al. [114]; Li et al. [115].
Communications 1 Aloqaily et al. [116].
Electronic Warfare 1 Wang et al. [117].
Innovation & adaption 1 Kagiwada [118].
Table 1. Topics with representative citations.
14
Literature review
field is primarily concerned with building, testing, and implementing computational
solutions to specific, narrow military problems rather more exploring foundational
concepts. 16.67% of the papers introduced methodological novelty in AI or ML, al-
though the novelty and methodological impact is open for reinterpretation especially
in comparison with some application papers. 9.72% papers proposed framework or
architecture solutions, with or without experimentation, some akin to methodological
novelty papers. Review and survey papers constituted for 20.83% of the papers, and
the innovation paper by Kagiwada [118] is in fact an essay from personal experience
in the field.
For a short look-through, in the top 25 ScienceDirect query results, sorted by rel-
evance, there are 16 open access papers with direct military relevance. These can be
classified into C2 & DSS, small arms, autonomous systems & robotics, personnel-
related topics, Decision Support System (DSS), and cyber defence. The reviewed
technical literature reveals a strong focus on enhancing existing military functions
rather than creating revolutionary new ones. In warfighting, concrete low-level prob-
lems such as weapon target assignment [115], small arm firing skill evaluation [110],
chemical weapon detection [80], and assessment of urban destruction [82] have been
examined. On a higher level decision support, interest towards AI or ML based mil-
itary DSSs is trending [58; 55; 59]. Another trend, uncrewed (previously known as
unmanned) systems related research includes non-Global Navigation Satellite Sys-
tem (GNSS) based navigation for UAVs [73], intelligent communication solutions
[116], swarming [78], and UAV related cyber capabilities [95]. The digitally cross-
sectional cyber security perspective has been researched from, e.g., Explainable Ar-
tificial Intelligence (XAI) point of view on malicious data classification [96] and
overall evaluation of AI-based cyber security solutions [97]. Applicability of ML
techniques to personnel-related tasks, such as retention [100], candidate selection
[101] and harm prevention [102], can be hypothesized to be related to the availabil-
ity of structural data and close resemblance to civilian problems in similar areas. The
focus on discrete, solvable problems within the technical literature exemplifies one
side of the research gap, showcasing deep but narrow progress that is often difficult
to translate into broad strategic advantage.
In summary, the scientific literature on ML and AI in military contexts is heavily
focused on the practical implementation of algorithms to solve problems in C2 &
DSS, autonomy, and ISR. The primary goal of engineering and computer science
efforts appears to be in creation of functional, real-world applications. The field
seems to prioritize the development of intelligent systems that can perceive the envi-
ronment, process information, and support or supplant human decision-making. The
relative scarcity of theoretical and foundational research shows the focus on certain
field, i.e, the military: foundational research happens in more general scope, and mil-
itary applications follow those innovations. This focus on applied, narrow problems
becomes even more apparent when examining the most prominent papers from this
15
Lauri Vasankari
cohort, which reveal a clear pattern of enhancing existing military functions rather
than creating revolutionary new ones.
2.2 Governmental policies and strategies
The policy documents from NATO and the US DoD operate at a high level of ab-
straction, articulating principles and goals that often lack a clear connection to the
granular technical capabilities currently being developed, thus representing the other
side of this gap. From a governmental perspective, 2021 NATO Artificial Intelligence
Strategy and 2024 revised strategy aim to provide the Alliance with a aims and out-
comes, underpinning responsibility principles such as lawfulness, traceability, relia-
bility, governability and bias mitigation AI [39; 119]. In the United States, DOD Eth-
ical Principles for AI provide similar ethical principles and guidelines for acquisition
[120]. Data, analytics and AI Adoption Strategy [40], superseding 2018 AI strategy,
aims to improve the organizational environment to enable achieving decision advan-
tage with AI. The nation-wide AI competitiveness and defence innovation priorities
have been outlined in National Security Commission on AI final report [121]. Within
NATO Defence Innovation Accelerator for the North Atlantic (DIANA), accelera-
tor hubs and test centers have been established across the alliance, focusing on AI,
autonomy, quantum, and other revolutionary technologies [122]. U.S. Department
of Defense [123] demonstrates rapid tri-lateral deployment of AI and autonomous
systems. Regarding Lethal Autonomous Weapon Systems (LAWS), DoD Directive
3000.09 [124] establishes legal and technical safeguards and framework, for exam-
ple, the responsibilities and requirements regarding use of lethal force. For United
Nations, International Committee of the Red Cross [125] expands the LAWS discus-
sion to humanitarian-law concerns and recommendations for UN. United Kingdom’s
Defence AI Strategy [126] along with [127] outlines UK-specific governance and
assurance mechanisms. Actual, executable regulation and laws around AI are some-
what non-existent. The European Union’s landmark AI Act [24], in a deliberate
policy choice, excludes military applications from its scope. This allows the EU to
advance a market-focused regulatory policies while pursuing defence AI separately.
Hence, defence related AI is mainly unregulated and while there is considerable re-
search on the subject, regulatory and legislative framework is in its infancy.
From a non-Western point of view, People’s Liberation Army (PLA) incorporates
AI to its modernization strategy, labeled ”intelligentization”, which aims to develop
a world-class military that leverages AI for new forms of warfare and transform-
ing key areas such as situational awareness, decision-making, unmanned systems
and cognitive warfare domain [41]. PLA operational concepts [128], inferred from
strategic guidelines and recent examples of PLA in combat, show that the desired
modernized status is to create information dominance, deal new realities between
combat and war space, and be able to defeat adversary’s operational system through
16
Literature review
target-centric warfare. In Chinese military writings, combat space is the geographic
area where physical conflict occurs, while war space encompasses all domains of
war from physical to non-physical, including political, economic, diplomatic and in-
formation spheres. In the operational concept perception, the war space is expanding
while combat space is shrinking. Target-centric warfare denotes the use of precision-
strike capabilities and intelligent munitions to surgically impact the combat space.
These advances underline the ongoing race towards military supremacy between
competing superpowers, namely the United States and China. Borchert et al. [129]
have published a very thorough case study into the defence AI that covers 25 coun-
tries, highlighting that defence AI is at the center of geopolitical and geoeconomic
competition. The introductory chapter notes that the case study aims to fill the re-
search gap of how countries think about defence AI, how they prepare for its adop-
tion, and how they develop existing concepts and processes and related capabilities.
The study identifies three strategic motives for defence AI: threat-based, fear or
falling behind, and AI as a capability multiplier. Most of the countries reside in the
third category, while the countries at the bleeding edge such as China and United
States are in the first, accompanied by smaller countries that aim to maintain their
strategic edge against prominent opponents. These countries include Greece, South
Korea, India and Ukraine. From capability categories, the most sought-after is the
combination of AI with uncrewed systems, followed by predictive maintenance, C2
combined to data analytics and data management, Electronic Warfare (EW), wargam-
ing and, in minority, mission planning and tactics development. The research high-
lights that Russia and China seem to considerably prioritize capabilities that aim
towards autonomous reconnaissance-strike complexes. The research indicates that
almost all nations are focused on data-driven and correlational learning, which can
be effectively translated as ML. The approach is denoted the second wave of AI,
in accordance with Defense Advanced Research Projects Agencys (DARPAs) three
waves of AI technology [130], while the United States is the only nation exploring
the third wave by focusing on contextual reasoning and self-learning under uncer-
tainty. The tension between sovereignty and cooperation is also highlighted as na-
tional interests compete with collaborative resources. Despite the recognized impact,
it is prevalent that most countries focus on training military personnel to handle spe-
cific AI systems instead of advancing their general AI talents and competence. The
insights highlight the fact that current military innovation in AI is emulation, mainly
mimicking the US, while it usually aims to enhance existing practices and systems.
Human-in-the-loop and human-on-the-loop solutions are emphasized globally, and
the case study points out that the ”valley of death” problem, where adopting promis-
ing AI technologies to practice is not a straightforward process that often times fails,
is a common struggle for most nations.
17
Lauri Vasankari
2.3 Think tanks
RAND corporation, a non-profit American think tank that conducts research and
analysis on a wide range of public policy issues, including defence and national se-
curity, has published research reports regarding AI since 1960s. In their database,
211 research papers have been published, spanning from 1962 until today. It is con-
ceivable that the ”AI winter” and the ”dotcom” bubble seem to correlate with the
publication frequency, as there are gaps from 1973 to 1989, from 1993 to 2001 and
from 2002 to 2012. The vast majority of the research, 193 reports in total, have been
published since 2017, while only 18 papers cover the interval from 1962 to 2012.
This also correlates with the progress of digitalization, AI and ML, including the AI
winters when AI failed to meet expectations. In a similar manner, SIPRI (Stockholm
International Peace Research Institute) that focuses on research into conflict, arma-
ments, arms control and disarmament have published reports on military AI since
1987. Their database returns 21 publications when querying for ”artificial intelli-
gence”, of which 20 are linked to armed forces and military.
RAND usually employs a scenario-based and policy oriented methodology while
SIPRI leans towards arms control and international law. For both institutions, early
research addresses issues such as problem-solving programs [131], NLP and sym-
bolic AI [132; 133], as well as early neural networks [134], advanced computing
[135] and strategies [136]. There is a major shift in later research, where the concep-
tual and theoretical approach turns to more concrete applications and implications,
as well as the introduction of specific military topics from strategic to tactical lev-
els, akin to the categories represented afore. The identified categories include DSS
and C2, logistics [137], cyber defence [138], space technologies [139], human re-
sources, uncrewed and autonomous systems [140; 141], responsible use [142; 143]
and governance [144] as well as strategic principles [145] and bias mitigation [146].
The DSS and C2, in this case, include wargaming [147; 148], as wargames can be
defined as ”representations of conflict or competition in a safe-to-fail environment,
in which people make decisions and respond to the consequences of those decisions”
[149]. This can be seen as an exercise or test-time environment prior to decision-
making itself. In addition, a conceptual review on the future of C2 systems [150] has
been conducted. C2s can be perceived as systems that produce situational awareness
that allow decision making while also allowing one to act on the decision. In human
resource management, ML is also related to decision-making [151] and its fairness
[152]. Notably, the work of Schulker et al. [151] aligns with that of Wasilefsky et al.
[101], as both focus specifically on Air Force applications, creating a targeted body
of research in this area. Due to the scale and gravity of nuclear capabilities and the
nature of, e.g., SIPRI’s methodology, the impact on stability, deterrence, and nuclear
risks are well presented in the related reports [153; 154; 155; 156; 157; 158; 159].
Both institutions have researched similar and supplemental topics on military AI,
18
Literature review
which accumulates into a comprehensive high-level understanding of the impact and
reach of this cross-sectional technology. The low-level impact is less researched,
and often the results are policy suggestions to better enable, govern, and manage
the agreed upon change that AI, as a capability, brings to different warfighting and
military functions.
2.4 Other literature
In addition to academic literature, governmental policies and strategies as well as
think tank reports, there is a small number of expert books that have been especially
influential in shaping how policymakers, defence practitioners and the wider audi-
ence discuss AI, autonomous weapons and the future of warfare. These sources have
substantial insider subject-matter expertise and synthesis value to them, although
they simultaneously display a subjective point of view that can be deemed likely to
include author bias, agenda-setting and a selective framing. Despite these limitations,
they remain useful for capturing the dominant practitioner narratives that frequently
guide real-world discourse. This thesis selects a sample of books from Brose, Scharre
and a professional anthology edited by Tangredi due to wide engagement in defence
policy and professional military discourse, jointly covering complementary analyses
on the subject.
In this category, Colby award winning ”Army of None” [160] by Paul Scharre
was one of the most widely cited early syntheses to describe the advance of AI and
especially autonomous systems in military forces, focusing on United States [161].
With field experience and a long career in military affairs, Scharre manages to de-
scribe the history, level of development, and trends in a manner suitable for wide
audiences. Highlighting the premise of early drones and the underlying AI technol-
ogy, the book has become a prescient of the military reality of the 2020s.
Scharre’s book was followed by a practitioner anthology ”AI at War: How Big
Data, Artificial Intelligence, and Machine Learning Are Changing Naval Warfare”,
edited by Tangredi and Galdorisi [162]. It serves as an effort to provide a balanced
and practical overview that seeks to demystify the technology for a non-technical
audience of national security professionals, policymakers, and concerned citizens,
examining both the promising applications and the inherent limitations of AI in a
defence context. Addressed topics include high-level strategy, policy, doctrine, spe-
cific weapon systems, and pressing ethical concerns. Although its subtitle speci-
fies a focus on naval warfare, its themes and findings are abstract and thus relevant
across all military services and the broader defence community. The book also aims
to identify the significant real-world barriers to AI adoption, including entrenched
institutional cultures, inter-service rivalries, and the political realities of defence ac-
quisition. It explicitly cautions against over-reliance on AI, particularly in contested
environments where systems could be vulnerable to cyberattacks or electronic de-
19
Lauri Vasankari
ception. This skepticism is balanced with a sense of urgency, as the book frames AI
development within the context of great power competition.
Brose [163], a former Senate Armed Services Committee staff director, argues
that the current state of the US armed forces is falling behind in the critical areas
of AI, autonomous systems, and networked warfare when compared to possible ad-
versaries, mainly China. Brose claims that the victor of future conflicts will be the
side with the faster, more resilient, and more intelligent kill chain, the namesake of
the book, which refers to the decision-making loop from observation to action. In
other words, the United States risks losing future wars, not because of insufficient
defence spending, but instead due to outdated systems, slow procurement processes,
and institutional inertia. In his perception, modern warfare will be defined by speed,
autonomy, and decision-making advantage that are all domains where AI can be a
crucial component. The book has been reviewed by RAND [164] and National De-
fense University Press [165].
The latest expert addition to these insights is Scharre’s newest book, ”Four Bat-
tlegrounds - Power in the Age of Artificial Intelligence” [166], reviewed by United
States Army War College Press [167]. The book is an overall review of AI in strate-
gic, military context, as the work focuses on the fundamental rivalry between com-
peting nations and the technological race for advance that follows. It covers aspects
from historical analogies to modern equivalents, data and hardware insights, as well
as concrete and foreseeable applications, benefits, and threats, of AI.
2.5 Summary
In summary, the literature paints a clear picture: the strategic imperative for military
AI is widely accepted [163; 38; 168; 126; 41; 119], and a broad consensus exists
on the most promising application areas [129; 162; 40]. However, this review has
also highlighted a persistent ’valley of death’ [129] fueled by institutional inertia
[163] and a nascent understanding of the associated risks and governance require-
ments [153; 24]. Consequently, a significant gap remains between the high-level
strategic discourse and the granular, practical research needed to bridge concept with
capability. The identified research gap is the disconnect between concept and policy
research versus applications, as well as between AI research and development versus
real world military requirements.
This dissertation directly addresses this disconnect. The six original articles that
form the basis of this work are designed to bridge the gap between strategic con-
cepts and operational reality. Publication I and Publication II are applicatory studies
of technical, algorithmic solutions to different level military problems, providing in-
sight not only to the problems themselves but into the challenges such adaptation
of AI capabilities face in the military context. Publication III introduces similar is-
sues on an abstract level of military DSS, and Publication V highlights the existing
20
Literature review
gap between R&D and deployment for GenAI solutions in the military. Publica-
tion IV introduces FL as an ML paradigm that provides solutions to some of the
presented challenges. Publication VI concludes the analysis with an ethical discus-
sion on human-machine teaming and roles of responsibility. Together, these original
papers collectively build the very framework that the literature shows is currently
lacking. By examining ML research areas from this perspective, this research pro-
vides a synthesized model for how military AI can move from a set of disconnected
applications and abstract policies to a truly integrated warfighting capability.
21
3 AI and Military
This chapter provides a brief background on the field of AI and a generalized glimpse
into the fields of defence, warfare and the military framework.
3.1 The Intelligence Artifice
The history of AI starts from the 19𝑡ℎ century, pioneered by Ada Lovelace [169]
and lord Boole [170]. Closely related Bayesian conditional probabilities [171] were
introduced a century earlier, and for example the aforementioned Euclidean distance
inherits the name from Euclid [172] and his work around 300 BC. The foundation
laid by these theories before digital computers exhibits the incremental scientific
method, where even the current state-of-the-art leverages centuries old science to
come up with novel solutions.
Figure 2 shows the subfields of AI and a way it can be understood through lead-
ing paradigms, their intersections, and the modern research areas of which most are
addressed in this dissertation. The figure is based on the works of Russell and Norvig
[17], Bishop and Bishop [173], and Pearl [174]. The symbolic AI era of dominance
spanned from the 1950s to the 1980s [175; 176; 17; 177], during which the symbolic
approach was considered the paradigm that would eventually lead to human-like
intelligence through the manipulation of symbols and rules. Among the most promi-
nent realizations of this paradigm were expert systems [178; 179]. The paradigm
has not vanished but instead is a stable in common information processing and pro-
gramming languages as well bridging numeric methods to symbolic domains and
vice versa, creating the neuro-symbolic approach also shown in Figure 2. The prob-
abilistic approach started its rise in 1980s, reaching maturity by 2000s, and remains
central for probabilistic machine learning, Partially Observable Markov Decision
Process (POMDP), and overall uncertainty modelling.
The numeric approach started out in parallel with the dominant era of symbolic
AI, in 1950s, with early discoveries such as the perceptron [180] as the basic com-
ponent of current linear neural networks. Due to the theoretical scrutiny [181] and
limitations in computation capacity, the paradigm laid near dormant from 1970s to
late 1980s, experiencing a re-emergence during the 1990s through statistical rea-
soning, probabilistic models, kernel methods, and classic ML algorithms such as
Support Vector Machines (SVMs) [182]. The deep learning dominant era made its
22
AI and Military
breakthrough during 2010s and has been the leading paradigm since.
As explained in Section 1.1, despite lengthy and mathematical history of AI,
there is no generally accepted definition for the term. Even the term AI was disputed,
and it supplanted other terms in the 1950s when John McCarthy coined it [183],
dwarfing competing terms such as computational intelligence. As there is no general
definition, the understanding of the subject varies more than what usually happens
in a living language. It should be noted that the term AI used to continuously mean
something that has not yet been achieved. Elaine Rich defined AI as a field of science
that aims to make computers do things that currently people are better at [18], which
is a robust definition as it integrates the advance of the technology and the sliding of
meaning.
However, after the release of world-renowned GenAI applications such as Chat-
GPT [11], Gemini [184], Claude [185], Llama [44], Grok [186], Mistral [45] and
others, the term AI has now become a synonym with transformer-based generative
applications. This is somewhat troublesome, as now the understanding of the broader
scope of AI and its foundations is blurred by the familiarity of human-like interac-
tion exhibited by applications that usually combine several AI methods and research
areas to provide a suitable full-stack solution.
In this research, the term AI is used in an old-fashioned way as explained in
Section 1.1, describing a field of science that aims to create intelligent computer sys-
tems that can execute tasks that formerly required humans. ML is understood as the
most promising subfield within AI that has enabled leveraging data and computa-
tion to create models without the need to build their logic from scratch. The logical
approach, also known as symbolic AI, was the most prominent paradigm from the
1950s to 1990s [17], after which ML and so called numeric solutions became the
winning paradigm [17]. Deviating from statistical methods that aim to explain data,
ML in general aims to create predictions based on data: this division is some what
blurred on occasion, but generally a good distinction.
As shown in Figure 2, DL is a subfield of ML where the difference is in the
use of sophisticated, multi-layer neural networks in a multitude of different network
architectures to solve problems. The term deep refers to the depth created by these
layers, although mathematically the layers are a chain of functions or a function
composition. For a three layer neural network 𝐹 (𝑥), each layer performs a function
𝑓𝑖, so that the initial input 𝑥 is transformed turns into the output 𝑜𝑖 which is then fed
to the next layer function 𝑓𝑖+1, as displayed in Equation 2:
𝐹 (𝑥) = 𝑓3(𝑓2(𝑓1(𝑥)))→ 𝑦 (2)
The approach, in ML or DL, is not different from other forms of functional op-
timization, as the trained models are optimized to perform with certain data and cer-
tain tasks. Before the launch of very large transformer-based models, DL solutions
23
Lauri Vasankari
Artificial
Intelligence
Computability
Uncertainty handling
Knowledge representation
Probabilistic
Neuro-Symbolic
Bayesian Deep Learning
Unsupervised
LearningReinforcement
Learning
Supervised
learning
Machine Learning
Statistical Methods
First-order Logic
Theorem Provers
Planning Systems
Graphs / Ontologies
POMDP
Hidden Markov
Models
Bayesian networks
Deep Learning
Symbolic Numeric
Computer
Vision
Explainable
AI
Generative
AI
Federated
Learning
Natural Language
Processing
Contemporary Research areas
Ethics
Expert Systems
Figure 2. The Paradigms, Methods and Research areas of AI
24
AI and Military
were still considered narrow, as they were usually fitted into a certain task to solve a
certain, narrow problem. After the currently available computational resources and
the availability of data was exploited to train extremely large language-based trans-
formers, also known as LLMs, it can be stated that the latest AI models are able to
generalize to a multitude of tasks without further training or fine-tuning. This has
had a tremendous impact on the field of AI. As a result, the current trend resembles
a race toward the best model that could, hypothetically, be an Artificial General In-
telligence (AGI), meaning that it could be able to solve any problem in a human-like
manner. The feasibility of an AGI remains debatable, as there is no paradigm, re-
search field or theory that shows a proven premise in reaching AGI, which indicates
that we lack the architecture to reach it, as suggested by a decorated AI scientist Yann
LeCun [187]. He has also stated that the future of AI will not be generative [188].
The majority of scientists in the field tend to think alike [189], while industry leaders
tend to anticipate AGI within a few years at earliest [190; 191; 192]. Furthermore,
the whole discussion is distorted by the fact that there is no agreed definition for an
AGI either.
Despite advances, there are still clear limitations in AI models that hinder their
performance. Lately, researchers have demonstrated that the LLMs and Large Rea-
soning Models (LRMs) merely give the illusion of thinking and understanding [193;
194]. Shojaee*† et al. [193] research was criticized for placing constraints on the
models like limiting context-window size and disabling code-based solutions, but as
the context enables reiteration it arguable enables a form of brute forcing, and code-
based solutions can be extracted from the memory of the model, these design choices
can also be defended. Mancoridis et al. [194] showcase the illusion of thinking by
examining the way LLMs fail to understand concepts in a human manner which leads
to memorizing instead of deeper knowledge and renders certain benchmarks invalid
in measuring the models so-called cognitive performance. The ARC-AGI leader-
board [195] showcases that the size of the models and the performance has indeed
increased, leaving a decreasing gap between human-level performance and the cur-
rent state-of-the-art. It can be argued that the last mile is the hardest to beat, but it
can also be stated that novel innovations may take the field by surprise.
As presented in the Chapter 1, the ML methods aim to fit a model to the data
by decreasing the error that results from the predictions and the chosen evaluation
metric. This applies to the state-of-the-art LLM and LRM as well. The key is to
have adequate high-quality data that enable training the model, a sophisticated model
architecture that is able to exhibit intelligent properties when used in inference, and
a validation and evaluation methodology to assess whether the model performance is
good to begin with. As a common framework, the CRISP-DM displayed in Figure 3
[196], is as valid as ever.
CRISP-DM necessitates existence of relevant data. The data is leveraged through
business understanding that translates into data understanding. In other words, the
25
Lauri Vasankari
Figure 3. CRISP-DM
business understanding precedes interpreting the data and creates the insight to lever-
age some new performance or capability from the data. After a data understanding
has been reached, in collaboration with the business understanding, the data is pre-
pared which usually includes cleaning and transforming it into a suitable format.
Then, the modeling occurs. The modeling is iterative and it can be also be incre-
mental, consisting of training and tuning phases. Resulting models have to be eval-
uated with regard to the original business understanding, by comparing the model
performance to the actual, recognized need for example. After evaluation has been
completed, the model can be deployed.
On an important note, the CRISP-DM process may halt before deployment. For
example, if there is no relevant data available, or that the quality of data is insuffi-
cient, the process will not proceed to deployment. Likewise, if the evaluation fails,
the process redirects back to start and may necessitate an alternative approach. Data
itself is the information prerequisite for knowledge, but it is not knowledge by it-
self. Also, data can be viewed from a multidimensional perspective, as it can possess
different qualities, such as temporal, granular, and structure, displayed as general
trade-offs in Figure 4. In addition to the qualities shown in the figure, for example
veracity and completeness are factors that greatly affect the usability of data: how
reliable is the source or the data itself, is it complete or does it require fusion with
other sources in order to be useful.
In essence all ML development and deployment projects and endeavors follow a
similar process. There are variations, for example, the data may incite a new need
that has not been recognized by the business understanding, which may still prove
26
AI and Military
Figure 4. General data quality trade-offs
to be valuable and deployable. Likewise, the business understanding should not be
understood in a narrow sense. In the scope of this research, business understanding
can be translated into doctrines, military knowledge and standard operating proce-
dures. As such, CRISP-DM functions as the ML framework through which the mil-
itary problems are viewed. To transform the framework into a suitable assessment
methodology it can be translated into
• Domain understanding: Identifying the problem areas within military opera-
tions and warfighting suitable for AI applications
• Data Understanding: Identifying the data, how it is accumulated and stored,
what it enables and what is lacking
• Data preparation: What is required to formulate the data into usable, high-
value format
• Modeling: What models from the hypothesis space seem applicable and what
are the limitations
• Evaluation: How is the model evaluated, in which environments and how is
the performance measured
• Deployment: What are the implications of deployment, including regulatory
and ethical perspectives as well as hardware and personnel resource perspec-
tives
27
Lauri Vasankari
This translated version is displayed in Figure 5, where the main difference is un-
derstanding the domain, i.e., warfighting, through doctrines, concepts, procedures,
rules and regulations, capabilities, resources, caveats, and crucially, all the afore-
mentioned aspects also from the perspective of the adversary and the adversarial
impact.
Figure 5. Military adaptation of the CRISP-DM
The technical details of AI and ML will be examined in further depth under
respective sections for original publications. Now that the AI premise has been es-
tablished, the following sections introduce the military domain and its specialties
regarding the subject.
3.2 Military Domain
This section introduces the military domain in general, providing context, back-
ground information and ”business understanding” for the synthesis, through which
the arising problems and suitability of AI methods is assessed.
As a demonstrative use case it is established that there is a military group that
operates in a designated operations area, tasked to produce intelligence and compile
recognized operational picture from the area of responsibility with available assets, to
maintain readiness to deter or repel adversary aggression, and to sustain operational
28
AI and Military
capability over the period of months in liaison with logistics support. By definition,
a group consists of units, which have troops and platforms, such as infantry and
armoured vehicles. This demonstrative example does not consider the particular mil-
itary branch, but the group consists of headquarters and subordinate units that have
a selection of sensors, effectors, and personnel that together create the capabilities
available to the group. This use case is reflected in following subsections.
3.2.1 Tasks
The classic Roman maxim si vis pacem, para bellum, translated as “if you want
peace, prepare for war”, captures the universal rationale for developing and main-
taining a military force and capability [197]. Machiavelli advances the same logic
regarding the political necessity of military readiness in The Prince and The Art of
War [198; 199]. Another often cited theorist of war, Carl von Clausewitz, describes
war as merely an extension of politics used to compel the enemy to do ones will
[30], not neglecting the idea that if the will is to have peace, military power must be
maintained to achieve or maintain it. These notions justify the existence and national
function of military forces, where specific operational tasks are a concrete realization
of the ways of executing this primary function of serving these national interests.
As stated by Wile´n and Stro¨mbom [200], contemporary roles and tasks of mil-
itary forces comprise warfighting and irregular warfare, military assistance and in-
ternational crisis management as well as aid to the nation which includes disaster
relief, military support to internal security forces and, for example, epidemics sup-
port. While these tasks differ greatly in their purpose and execution, the military
capabilities do not change drastically according to tasks. Instead, the orders and con-
straints regarding the use of said capabilities differ. Military operations and exercise
do not exclusively exhibit warfighting functions, as peacetime operations may, for
example, consist of patrols, training, and maintaining or building readiness in or-
der to activate warfighting functions when necessary. Even in this case, warfighting
capability is the main output, even if not used in full effect.
For the sake of this research, it is not applicable to assess the differences between
tasks, as AI solutions can be technically suitable for any task if they serve a purpose
for a given capability within the military. Therefore, the main focus of this research
is in warfighting, but the findings and conclusions are not limited to this primary
class of tasks for a military organization.
In the aforementioned demonstrative use case, the exemplary task is to monitor
an area and maintain readiness to use force should the situation evolve negatively
through adversarial actions. To accomplish this task, the headquarters must plan
the use of subordinate units and their rotation in the operation, as well as ensure
logistic support to replenish the units. Likewise, the group must compile intelligence
information and situational awareness of the theatre to enable such planning and
29
Lauri Vasankari
proactive tasking of units and deployment of capabilities.
3.2.2 Capabilities
The term capability is not self-explanatory in a military context. As described by
Finnish Admiral Anteroinen [201], the term can have several definitions depending
on the context: it can refer to an effect, a function to execute tasks, physical weapon
systems, platforms, fighting power, or a system model. In this research, the term ca-
pability reflects the ability to execute tasks with given systems and platforms, which
ultimately generates the fighting power of a military force, holistically encompassing
all auxiliary functions that contribute to that power.
While systems and platforms provide the physical means to execute a task, mak-
ing them a fundamental component of a capability, they are not, by this definition,
capabilities themselves. For example, a warship is a platform equipped with multi-
ple systems, effectively operating as a system of systems. The platform itself is the
asset; the capabilities it produces include, for example, the ability to compile a tac-
tical picture from the air, sea surface, and subsurface domains, effectively creating a
multi-domain surveillance capability. Likewise, the effectors on the platform enable
the execution of air defence or surface warfare tasks, thereby providing air defence
and surface warfare capabilities. As stated in the demonstrative use case definition,
the subunits have and operate the assets, which create the capabilities available to the
headquarters and the group as a whole. These capabilities can be extended beyond
the hierarchical structure, for example in an effort to support another group, or vice
versa in receiving support.
3.2.3 Organization
The military organizations consist of a hierarchical structure which differs from na-
tion to nation and coalition. Fundamentally, there are high level headquarters under
which lower echelons are established. The highest command level is usually the joint
command that has service branches and other high level establishments like military
intelligence under it. Under branches and their respective headquarters are the hi-
erarchy trees of subordinate units which go all the way down to unit and platoon
level.
The organization composition enables the military to have a rather pure rational-
legal bureaucratic leadership structure described originally by Max Weber [202].
This bureaucratic structure creates a clear chain of command and division of effort,
resulting in an effective, commander-led structure that is, in theory, nimble and adap-
tive as the decision-making is concentrated to the officer in command. While other
leadership systems have shown promise that argues towards distributed leadership,
the clear chain of command is straightforward and, when effective, the most decisive
30
AI and Military
way to lead and manage military forces.
Each level of the military organization is characterized by the scale it operates in.
The levels are usually described as tactical, operational and strategic [203; 204]. The
concept of dividing war into these levels has historical roots in the Napoleonic Wars
and the American Civil War, was formally developed by Prussian and Soviet mili-
tary theorists [205], and was formally adopted into U.S. doctrine in 1982 via Field
Manual 100-5 [206]. The tactical can be underlined with ”technical” level, meaning
individual unit maneuvers and actions. In the lower echelons of the hierarchy, i.e.,
tactical level, the tempo is higher and the time span of decision-making shorter. Es-
sentially, for example in a platoon or unit level, the operational decisions are done
in a time frame from seconds to hours, while the unit headquarters plan and execute
in a time span of days or weeks. Likewise, the scope of impact for the actions that
result from the decisions have near immediate effect. In higher echelons, the actions
may have effects over the course of years to come.
As an example, modern U.S. Army doctrine refines this traditional three-tier
model into four distinct levels of warfare: the national strategic, theater strategic,
operational, and tactical levels [207]. This updated framework serves to clarify the
relationship between broad national objectives, the operational approach, and the ex-
ecution of tactical tasks. At the highest echelon, the national strategic level involves
the government formulating policy goals and global strategy using all instruments of
national power. Subordinate to this, the theater strategic level focuses on combatant
commanders synchronizing activities to fulfill those policy aims within an assigned
region. The operational level acts as the vital link between these strategic goals and
tactical force employment, where campaigns and major operations are planned, con-
ducted, and sustained over broader aspects of time and space. Finally, the tactical
level is where forces directly plan and execute battles and engagements to achieve
assigned military objectives. While tactical actions at the corps or division level
might span days or months in the form of battles, lower-echelon engagements ex-
ecuted by brigades and below are typically resolved in minutes or hours, reflecting
the immediate, high-tempo impact characteristic of the lowest levels of the hierarchy
[207].
Referring back to the demonstrative use case, the headquarters needs to plan days
ahead, while the operational units act in real-time while executing the tasks that span
from hours to days.
3.2.4 Decision-making processes
The organizational decision-making processes are denoted as Military Decision-
Making Process (MDMP)s, which often follow a similar structure. MDMP has been
defined by the US Army Colonel Mueller, director of the Center of Army Lessons
Learned, as ”a systematic process that enables commanders and their staffs to apply
31
Lauri Vasankari
critical and creative thinking and doctrine to solve problems and establish the frame-
work and conditions for commanders to make effective decisions” [208]. Similar
process structure as in [208] has been highlighted in NATO APP-28 [209], which
also references several other MDMPs that are very much alike. The key points to
highlight are:
• Receipt of Mission: What is to be done, what is the task
• Analysis of the situation: What are the resources, what is the adversary, what
is the environmental impact
• Development of Courses of Action: Creating alternatives for decision-making
• Evaluating the alternatives: War gaming to decide for the best course of action
identified
• Orders and Execution
When compared to CRISP-DM, the MDMP is, on abstract level, very similar.
The business understanding evokes a need to understand the data. Then, the data is
preprocessed, a model is fitted, alternatives are evaluated and finally the end result is
deployed. However, when considering AI solutions, CRISP-DM can be considered
for each of these MDMP phases separately: Can AI aid in analyzing the mission, the
situation, to develop courses of action, evaluate alternatives and enhance execution?
A decision-making framework that is widely adopted in the military and scales
from individual fighters to staff headquarters has been proposed by US Colonel
Boyd and goes by the name of Observe-Orient-Decide-Act (OODA) loop [210; 211].
Colonel Boyd used OODA to describe the decision-making process of an individual
or a group of individuals. This framework provides a more ground-level approach to
the application of AI when individual operators and warfighters are examined. Just
as with the MDMP, the same questions can be placed upon OODA: Can AI be used
to aid in observing the environment? Can it aid the orientation, where the individual
synthesizes the observed data against his or hers knowledge base and experience?
Can the decision be better informed, faster, more concise? The Chapter 4 answers
these question from the point of view of each original research paper.
It is to be noted that the OODA loop has given ground for a simplistic interpre-
tation that the speed is the primary characteristic to achieve military superiority, but
this point of view is narrows the focus of larger scale warfighting and has garnered
criticism [212]. While speed and closing the kill chain are key factors in tactical suc-
cess, the overall success of an operation or military campaign relies on the quality
and scope of the decision made.
In the use case example, headquarters employ a process like APP-28 or MDMP,
while the units in the area of operation execute their own OODA loop in real-time,
under the constraints and autonomy issued by headquarters. For example, the head-
quarters have activated certain rules of engagement, which dictate the way the units
are to respond to different situations. Usually in low-risk phases of an operation the
32
AI and Military
use of force is allowed only in self-defence for all units, and the headquarters retain
the authorization in all other scenarios. When the situation evolves, the headquarters
may delegate this authority to the units as well, with new limitations such as a list of
accepted targets that can be engaged without further notice. Assessment of the sit-
uation and tasking the units is done via the decision-making process, which collects
the information from the units to develop and evaluate possible courses of action to
choose the best execution from.
3.2.5 Military Information Systems
Despite the proliferation of information technology, information systems do not have
a succinct, singular definition or a concept, although there are widely recognized and
utilized definitions. Checkland and Holwell [213] have proposed general concepts
of information systems, of which a combined interpretation is displayed in Figure 6.
The key distinction between an IS and an Information Technology (IT) system is that
IT systems consist of hardware, software and networks, while IS includes humans in
the system view.
Essentially, applying the systems methodology of Checkland, there are elements
that lead to actions and an IS. The IS, in this case, is the system which serves, or
supports, the system that executes actions. The actors that execute actions have infor-
mation needs that need to be met to perform purposeful actions and inflict changes in
the elements. This process can be viewed as a generalized, schematic interpretation
of any IS, encompassing also IT systems in it. Military information systems can be
examined from the perspective of this particular concept.
Leads to
Elements
Creates
changes
Purposeful action
Supports
Information system,
professional
knowledge
The system which
serves
Processing of data
relevant to people
undertaking action
The system which
is served
Actors that have
information needs
Figure 6. Information system concept [213]
33
Lauri Vasankari
The fundamental nature of military operations and related IS have undergone a
profound transformation over the past several decades, shifting from platform-centric
warfare, where individual tanks, ships, or aircraft operated as largely independent
and isolated entities, to network-centric warfare which is defined, e.g., as an ”in-
formation superiority-enabled concept of operations that generates increased com-
bat power by networking sensors, decision makers, and shooters to achieve shared
awareness, increased speed of command, higher tempo of operations, greater lethal-
ity, increased survivability, and a degree of self-synchronization” [214]. In this mod-
ern operational paradigm, the decisive advantage on the battlefield is no longer de-
rived solely from kinetic mass, armor thickness, or raw firepower. Instead, superior-
ity is linked to information dominance.
As stated above, one of the most widely recognized concepts underpinning mil-
itary information systems is the OODA loop [210]. It has seemingly transcended its
origins as a cognitive model for individual pilots and been scaled up to encompass
entire military organizations, automated defensive grids, and global sensor networks.
In essence, it can be stated that
• Observation involves the collection of System-of-Source (SoS) data, such as
raw or lightly processed, mainly unfiltered data from the operational environ-
ment via sensors, as well as System-of-Record (SoR) data such as human-
generated or automated reports and messages that provide an insight into the
environment and its actors.
• Orientation requires the synthesis of the newly observed data with prior knowl-
edge, context, historical intelligence, and strategic objectives to form a coher-
ent, accurate operational picture of the battlespace.1
• Decision involves selecting a specific course of action based on this orienta-
tion.
• Action is the execution of that choice, usually physically or electronically.
This view of the military information system is depicted in Figure 7, where the
domains exist within the contested environment. Objects, elements and phenomena
denote artifacts within the environment that can be observed, thus creating the infor-
mation flow through the observation layer. The observers and data sources consist
of units that encompass sensors and troops, denoting humans, as well as internet
sources, fixed sensors from radars to weather stations, and third parties such as part-
ners and co-operators. The observations and resulting reporting that is based on
initial observations creates SoS and SoR data that exists in a variety of types and
qualities, from structured to unstructured, real-time to sporadic, fine-grained to high-
level in Figure 4. This data is then processed, aggregated and displayed for C2 pur-
poses such as maintaining the operational picture to monitor and guide the operation.
1A combined, favorably domain-crossing picture is known as Common Operating Picture (COP).
34
AI and Military
Operating requires that the C2 is linked back to the units in order to react to changes
within the environment. Overall, the aggregation and processing of data enables ori-
entation, where the surrounding context is combined with the observations to create
situational awareness and understanding, which gives grounds for further analysis,
automated or manual, and decision-making that results in courses of action, plans
and orders that are promulgated again to the units to have a desired effect on the
environment and the objects or elements within. The actions ought to create changes
that then result in novel observations, and the loop is reiterated.
On the right side of Figure 7 is a depiction of hardware regarding the IT compo-
nents of the system, although technology resides in the IS part as well. In the hard-
ware, there are certain capabilities at the level which executes actions, either kinetic
or otherwise, requiring those capabilities and usually some form of sensors, mobility,
local computing and a power source. On the supporting level, which serves the ac-
tors, the emphasis is on information processing in contrast to mechanical equipment
and real-world effects. The connectivity requirement, consisting of radios, cables,
satellites, and fiber optics as well as all the other related hardware, stretches from the
acting front to the supporting layer to ensure the information flow.
Despite the simplified view in Figure 7, military information systems do not op-
erate in a flat, decentralized hierarchy. Instead, they are strictly organized around the
aforementioned three echelons of warfare: Strategic, Operational, and Tactical [204].
Each echelon requires fundamentally different types of information, processed at dif-
ferent speeds, and presented at vastly different levels of granularity. Additionally,
the more granular hierarchy of units, formations, and commands dictates how this
information physically and structurally flows. To execute operations across these
levels, forces are organized, for example, into a standardized unit echelon hierarchy
as displayed in US Army Field Manual 3-0 [207]. In the field manual, strategic ob-
jectives are managed by Theater Armies or Joint Commands, which pass operational
directives to Field Armies and Corps. These operational echelons then translate cam-
paigns into actionable missions for tactical formations, cascading from Divisions and
Brigades down to Battalions, Companies, and individual Platoons. Consequently, an
information system at an operational headquarters will aggregate, filter, and transmit
long-term logistical and campaign data at different scope and scale than a system
utilized by a tactical company or a platoon utilizing and requiring real-time targeting
data.
Additionally, the military IS comprises multiple information security environ-
ments [215], which are not displayed in Figure 7. Namely, the environments or
information classification levels are Unclassified, Restricted, Confidential, Secret,
Top Secret and, in some instances, Cosmic Top Secret [215]. In practice, this means
that depending on the gravity of information, it has to be stored and used within a
proper environment to ensure that is not disclosed in an uncontrolled manner. In the
Figure 7, the whole IS can exist within one classification, or several, depending on
35
Lauri Vasankari
Constested environment
Information System
Land Sea Air Space Cyber
Observations and data generation
Internet
(open sources)
Partners &
cooperators
Data processing and orientation
Units
Sensors Troops Fixed sensors
Filtering Aggregation
Objects, elements and phenomena
Decisions and plans
Analysis
Plans
OrdersActions
Changes
in the
environment
Actions
Hardware
Platforms
and
assets
(Kinetic)
effectors
N
etw
orks &
C
onnectivity
Servers, data
centers & 
computation
Mobility
Sensors
Computing
Power
Orders
Directions
Orders
and
guidance
Reports
and
analysis
System-of-source data System-of-record data
Messages
ReportsListsStreamsBatch
Rules
and
Regulation
C2
Operational Picture
Figure 7. Military information system view
36
AI and Military
the gravity of the information.
As such, when military information systems are mapped to operational echelons
and actual organizational hierarchies, the overarching architecture becomes a nested
structure of codependent ISs in different physical and digital environments. As with
the OODA loop, each individual entity can be viewed as its own IS, whether it is
a single tactical unit, a field army, or a maritime component. At every level, there
are elements that provide information, which requires processing to support purpose-
ful actions. For example, a military group consists of subordinate units, which are
themselves separate ISs that feed information upward to the group headquarters—a
higher-echelon IS. In turn, the subordinate units receive the necessary command
guidance and support to conduct their purposeful actions, creating localized changes
within the environment in order to meet the broader goals of the operation.
3.2.6 On Complexity
It has been now established that each level of a military organization can be consid-
ered a system with inputs and outputs. At the lowest level, the inputs are real-world
events that are monitored with available sensors and systems. The data sources in-
clude sensors that survey some wave length in the electromagnetic spectrum, from
infrared to communication frequencies, as well as ISs that process data, nowadays
mainly in the digital domain. The SoS and SoR data can be raw or processed, struc-
tured or unstructured, and it provides its primary user insight into a particular, narrow
problem while contributing to the bigger picture. This information is processed into
an output to higher echelons, which gather and aggregate the data, analyze it, review
their mission objectives with regard to the data and either give their outputs as actions
to the lower echelons or inputs to higher echelon decision-making.
The amount of data, the complexity and the uncertainties increase when moving
from tactical to strategic levels. When the number of variables increases simultane-
ously with the time span of planning and execution, the complexity is guaranteed to
increase exponentially. Additionally, the military landscape is characterized by the
incompleteness of data and only partial observability into the opposing forces capa-
bilities, composition and intent, which create uncertainty in the development of the
situation towards the desired end state.
The complexity of military operations is governed by what is computationally
known as the ’curse of dimensionality.’ As the number of, e.g., units, weapon sys-
tems, and environmental factors increase, the possible states of the battlefield and the
available actions scale exponentially. Furthermore, unexpected events that are often
referred to as the ’fog of war’ introduce severe stochasticity. While these factors are
described conceptually here, they are formally quantified and mapped to mathemat-
ical state spaces and probability distributions later in Section 4.2, where they form
the baseline for applying RL to military domain.
37
Lauri Vasankari
In summary, a military organization is created in a way that the units operating
or producing military capabilities provide headquarters with information. The infor-
mation functions as the input for decision-making, analyzed in a suitable decision-
making process. The decision-making is an analytical workflow that results in or-
ders for actions to be executed. The execution then serves a goal that is related to
the particular task, such as warfighting or crisis management. Data plays a critical
role through the military organization, but as demonstrated later in Section 4.2, the
scale of complexity in decision-space calls for abstraction and heuristics, sometimes
a rule-of-thumb, to be able to make decisions in due time. This has been recognized
before, as the ability to grasp the essential is one of the key principles of, for exam-
ple, Alexander Suvorov [37]. This underlines the next subsections that examine the
data in military domain as well as identified application areas.
3.3 Domain features of data
The data is a difficult subject in the military domain. It is simultaneously both abun-
dant and scarce. There are many factors behind this, which relate mainly to sensitiv-
ity and security, organizational culture and doctrines.
It is self-evident that a lot of military data is classified to keep, for example, capa-
bility and performance information secure from adversaries [215]. A lot of military
systems are air-gapped [216; 217; 218], which means that they are neither directly or
indirectly connected to the internet. This approach secures critical systems such as
C2 systems from cyber attacks and data leakage. Simultaneously, it prevents collect-
ing the data in the same manner that can be done for public, commercial and open
source data, and requires considerably more complex architectures and integration to
get the data from these systems into a database that can be used to train AI models.
Combining the data into a large, all-encompassing data set might not be wanted at
all, as distributed and fragmented data prevents potential adversaries from gaining
the whole picture despite receiving some parts of it.
As a result from the fragmentation, the data is often in silos, and reaching a holis-
tic understanding of what data is available, where, and how is very difficult. This is
highlighted by Brose [163], stating that ”platforms rarely cohere into one battle net-
work that can share information effectively”, quoting one US military officer saying
that ”The main problem is that none of my things can talk to each other”. The CEO
of Anduril, a US company focused on drones and autonomous systems, has stated as
an industry observation in December 2024 that ”Exabytes of defense data, indispens-
able for AI training and inferencing, are currently evaporating” [219]. These points
highlight the fact that while military forces generate and process vast quantities of
data, it is used ephemerally for a singular use case. If it is stored, the operator insight
that occurred is usually lost, as the systems do not support recording it or that is not
part of the modus operandi. Likewise, if analyzed data is stored with its meta data,
38
AI and Military
the raw data might be not, which again may have an impact on the usability. If there
is no ground truth the supposedly iterative process between data understanding and
data preparation is limited and backtracking to the source is not possible.
Additionally, as a difference when comparing with civilian domain, the military
environment is contested in a different sense. In industry, data security is critical and
enterprise espionage is a factor, but in the military side these aspects are amplified,
as the stakeholders are governmental, and national security is at stake. Peacetime
environment is contested in a different sense, as methods are more subtle than dur-
ing open conflict, but may still include for example adversarial actions such as data
poisoning [220; 221]. Other things such as jamming, information falsification and
noise, in different mediums, may also inflict both the accumulation and exploitation
of data. EW aims, in short, to maintain own C2 capabilities while breaking the con-
nectivity of the adversary. EW deception operations aim to decrease the accuracy
of adversary’s intelligence gathering, surveillance, target acquisition and reconnais-
sance [222]. Referring back to data and its qualities, the completeness is likely to be
effected both in peace and wartime. A widely known electronic warfare method to
precede an aerial operation is to gradually increase the background noise with suit-
able jammer platforms to increase the thresholds of surveillance radars: in time, this
will lead to increased detection thresholds, which enable the attacker to get closer
before being detected [223]. A similar approach can be applied to other operations
and data types, for example exercises and general activity. As an example, an ad-
versary could operate in different manners, publish misleading doctrines, or use dif-
ferent camouflage equipment to prevent the accumulation of data for accurate ML
purposes. Therefore, the data is greatly different from, for example, medical data
that is collected in vitro and more accurately describes, e.g., the state of health of a
certain populace, as both the medical institutions and the populace usually share the
same goal and do not posses and adversarial stance.
Being part of a process instead being a capability or an enabler in itself underlines
the cultural and doctrinal issues towards data. Data is mainly seen as an ephemeral
input that results in an equally ephemeral output in a reactive system. This does not
mean that insights are not drawn from the data, but that there are limited resources
and limited capabilities to effectively store the data and find use for it beyond the
current input-output loop. While modern technology giants have built their entire
business models on collecting data, platform providers such as Amazon, Meta and
Google [224; 225], modern militaries have yet to integrate the understanding and
utilization of data into their operational procedures and processes.
3.4 Main application areas
Building upon the architectural foundation of military information systems estab-
lished in Subsection 3.2.5 and the data constraints outlined in Section 3.3, the inte-
39
Lauri Vasankari
gration of AI can now be mapped into this ecosystem. While the previous sections
illustrated the holistic, multi-echelon complexity of military operations, applying AI
requires abstracting these systems into a functional pipeline.
From a dichotomic perspective, AI can be viewed to either enhance current pro-
cesses and workflows or to create entirely new ones [226; 227; 228]. Enhancing
current processes and workflows is usually easier, as it is simpler to analyze a pro-
cess, find the issues that can be improved, apply changes, and evaluate the result,
compared to innovating a completely new process or a meaningful workflow. It is
also noteworthy that AI is still very much a tool instead of a source of innovation,
despite advancing inference and reasoning capabilities [229]. Hence, it is up to hu-
mans to come up with the new processes suitable to be executed with AI methods, or
the processes that can be disregarded due to the capabilities that can be induced with
AI.
The aforementioned frameworks, MDMP and OODA-loop can be used to esti-
mate the main application areas for AI methods within the military context. Both of
these frameworks or processes can be abstracted into a process shown in Figure 8.
Essentially, there is information regarding the physical world, which is observed with
either biological senses or sensors. This information covers thermal, mechanical,
chemical, magnetic and electromagnetic [230; 231], of which the electromagnetic
is most significant in the military sense: radars, optroelectronics and signal intelli-
gence utilize electromagnetic transmitters and receivers, and so do communication
devices. The importance has been highlighted by accredited military leaders, as the
control and management of electromagnetic spectrum has been highlighted as a the
key component of victory in modern warfare [232; 233].
In addition to the information received directly from the physical world exists the
digital world which includes technically all the rest of the available information, from
documents to the sensory information. The digital information includes knowledge
and models of the physical world which enable analyzing the raw sensory data to
greater extent. Likewise, orders, regulations, guidelines, plans and doctrines exist in
the digital domain, contributing to the understanding of the world, both digital and
physical.
For both, the OODA-loop and MDMPs, there is a task that needs to be executed,
and it usually requires having an effect on the physical or digital domain, or both,
and in order to achieve the desired effect, the understanding of the task requires
understanding from both domains. Therefore, based on Figure 8, there are five areas
that can be viewed separately:
1. Collecting and accumulating observations.
2. Preprocessing and analyzing the data.
3. Analyzing and synthesizing the situation.
4. Evaluation of alternatives and finalizing decision.
40
AI and Military
Figure 8. Information flow and decision-making
5. Executable actions that effect the desired domain.
This list is not exhaustive, but serves as a high-level abstraction of a generaliz-
able information handling process that can be analyzed from the perspective of AI
applicability.
The accumulation of information through observations happens individually both
in the digital and physical worlds, and physical world observations are in general
digitalized for processing and communication. It is also notable that through tech-
nology the digital and physical domains are converging, as the amount of digital data
increases and for example digital twins [234] aim to bridge the gap. Simultaneously,
the independent importance of digital domain has increased, due to digitalization
and the amount of information. Therefore, in Figure 8, the decided actions can have
effects in both domains, aggregating cyber and information warfare [235; 236] capa-
bilities.
After and while information is being accumulated, it goes through pre-processing
and post-processing before influencing decision-making. In this case, the pre-processing
transforms the data into a desired, usable format that enables further processing and
analysis. The post-processing, i.e., analysis, can be small-scale individual analysis
executed by a sole operator with a narrow focus, or large-scale data analysis with
vast amounts of data from different sources. The so-called big data analysis [237]
is required to handle large quantities of data to produce impact in compressed time.
Big data analysis requires multiple technologies from distributed computing to AI
and ML methods to find patterns, identify anomalies and trends as well as perform
predictions.
41
Lauri Vasankari
Once the data has been analyzed, conclusions can be drawn from it. These con-
clusions are used to induce a decision between alternatives, and it includes the details
of the chosen course of action. In the military context, the decision can be a short-
term individual decision by a single warfighter or fighter pilot, or a long-term strate-
gic decision of a joint commander. The lowest level is usually denoted as technical,
under tactical, and it includes the technical and immediate actions. The complexity
of decisions increases with the level, as the number of effecting features, phenomena,
uncertainties and variables increases both due to the level and the time-span of the
decision.
From AI perspective, as partially shown in chapter 2, AI can be leveraged in
each phase, from observation to action. Using the same sources as in chapter 2,
the observations, from urban destruction [82] to chemical agent identification [80]
can be enhanced with ML, automating the preprocessing phase and enabling faster
analysis and synthesis. The analysis can be tasked to an intelligent DSS, which
can then support tasks from troop deployment [58] to outcome prediction [59]. The
resulting actions, from weapon assignment [115] to effect evaluation [110] are also in
the scope of AI and ML applications. Hence, it can be stated that AI and specifically
ML has a proven potential to enhance individual, narrow tasks, through all the phases
of decision-making and information flow, on all levels of a military organization.
Instead of technological hindrance, the constraints to the use of AI are the data,
available computation, and expertise, both technical and military.
42
4 Machine Learning research areas
This chapter provides the necessary technical and theoretical background for the key
ML paradigms addressed in this dissertation. By establishing the foundations of CV,
RL, and other fields, it aims to provide a common framework for understanding the
contributions of the six original publications, which are then synthesized in Chap-
ter 6.
As explained in other words in Chapter 1, a basic ML model can be expressed as
a mapping function 𝑓𝜃,
𝑦 = 𝑓𝜃(𝑥), e.g. 𝑦 = 𝑤𝑇𝑥+ 𝑏 in linear models, (3)
where 𝜃 denotes the parameters of function, or model, 𝑓 , which in the linear case
simplifies to the feature or weight vector 𝑤 that maps the input 𝑥 to the predicted
output 𝑦, corrected with the bias or intercept term 𝑏. Essentially, as a generalization,
ML is always learning an approximation from inputs to outputs and then using the
resulting model, 𝑓𝜃 or 𝑤 in Equation 3, for inference.
However, the ability of a model to learn from a finite dataset and make accurate
predictions on unseen data is not guaranteed. The theoretical framework that answers
whether learning is feasible was largely established by the seminal work of Vapnik
and Chervonenkis on statistical learning theory [238; 239]. As later explained by
Abu-Mostafa et al. [240], the goal is to ensure that the error a model makes on the
training data, the in-sample error (𝐸𝑖𝑛), is a good proxy for the error it will make on
future data, the out-of-sample error (𝐸𝑜𝑢𝑡).
Probabilistic tools, such as Hoeffding’s inequality [241], show that for a single,
fixed hypothesis, 𝐸𝑖𝑛 will likely be close to 𝐸𝑜𝑢𝑡 under the assumption that there is
enough data. Machine learning algorithms, however, do not test a single hypothesis;
they search through an entire family of functions, also known as the hypothesis set
ℋ, to find the one ℎ ∈ ℋ that best minimizes the error.
This is where the groundbreaking concept of the Vapnik-Chervonenkis (VC) di-
mension becomes essential [238]. The VC dimension, denoted 𝑑𝑉 𝐶 , measures the
capacity or expressive power of a hypothesis set. It quantifies the model’s ability
to split, or shatter, data points into all possible dichotomies. A model with a finite
𝑑𝑉 𝐶 can be proven to generalize. The VC generalization bound, as depicted by Abu-
Mostafa et al. [240], formalizes this relationship:
43
Lauri Vasankari
𝐸out ≤ 𝐸in +
√︂
8
𝑁
ln
4𝑚ℋ(2𝑁)
𝛿
(4)
where 𝑁 is the number of data points, 𝛿 is the probability that the bound fails,
and 𝑚ℋ is the growth function, which is bounded by a polynomial in 𝑁 if the VC
dimension is finite. This bound reveals the fundamental trade-off in ML:
• A more complex model with a higher 𝑑𝑉 𝐶 can achieve a lower 𝐸𝑖𝑛 but faces
a larger penalty term, increasing the risk that 𝐸𝑜𝑢𝑡 will be high. This is known
as overfitting.
• A simpler model with a lower 𝑑𝑉 𝐶 has a smaller penalty and generalizes better
but may be too simple to capture the underlying patterns, resulting in a high
𝐸𝑖𝑛. This is known as underfitting.
This insight leads to the principle of Structural Risk Minimization (SRM) [238],
a formal strategy for selecting a model that balances low training error with con-
trolled model complexity to achieve the best possible out-of-sample performance.
This theoretical foundation justifies the entire learning process and informs practical
techniques like regularization, which implicitly penalize model complexity.
All ML requires three fundamental components:
1. Data,
2. Model,
3. Loss or Objective,
and the distinctions between different paradigms result from variations around
these three components.
A key component that enables updating the model with regard to the received loss
is the update mechanism, a rule or an optimizer. While updates can be random devi-
ations from which the most suitable is picked for the next iteration, the most widely
spread and effective method relies on gradient-based optimization. This approach
calculates the gradient of the loss function with respect to each weight parameter.
Formally, if a model is parameterized by weights 𝜃 and its performance is eval-
uated by a loss function 𝐿(𝜃), the goal is to find the parameters that minimize this
loss. In standard gradient descent, the weights are iteratively updated in the direction
of the negative gradient, i.e., the direction of steepest descent:
𝜃𝑡+1 = 𝜃𝑡 − 𝜂∇𝐿(𝜃𝑡), (5)
where 𝜃𝑡 represents the parameters at iteration 𝑡, ∇𝐿(𝜃𝑡) is the gradient of the
loss function with respect to those parameters, and 𝜂 is the learning rate, a hyperpa-
rameter that controls the step size of the update [5; 28; 6].
44
Machine Learning research areas
This fundamental rule ensures that the weights are updated towards the gradient-
indicated direction of decreasing error in the outputs. More advanced optimizers,
such as Adam [242] or RMSprop [243], build upon this foundational principle by
introducing adaptive learning rates or momentum to accelerate and stabilize conver-
gence, but the underlying mathematical premise remains the same.
Supervised learning is the most straightforward approach, where the data set 𝐷
has annotations, i.e., [𝑥𝑖, 𝑦𝑖] ∈ 𝐷 and the loss calculation is straightforward between
the predictions of the trained model and the known labels (annotations) in the data.
A classic example is linear regression [244; 245], which can be assumed to have an
input vector 𝑥 = [𝑧1, 𝑥2, . . . , 𝑥𝑝] that is to be predicted into a real-valued output 𝑦.
The linear regression model has the form
𝑓(𝑋) = 𝛽0 +
𝑝∑︁
𝑗=1
𝑋𝑗𝛽𝑗 , (6)
where 𝛽𝑗 are unknown parameters or coefficients. Unsurprisingly, linear regres-
sion closely resembles Equation 3. Supervised learning approach and more sophisti-
cated algorithms are introduced in more detail within the field of CV in Section 4.1.
If there are no annotations and the data is simply 𝑥𝑖 ∈ 𝐷, unsupervised meth-
ods are applicable to gain understanding of the data, such as identifying underlying
patterns and doing component analysis to identify most impactful features [246]. A
classic example of an unsupervised method is 𝑘-means [247], which starts with a
preselected number of groups, 𝑘, that are represented by a single, random data point
in each. Then, iteratively, a new data point is added to a group according to the
closest proximity of the new point and the mean of previous data points within the
groups, after which the mean of the group is adjusted accordingly. In the seminal
work by MacQueen [247], for a given set of 𝑘 centers, where 𝑥 ∈ (𝑥1, 𝑥2, . . . , 𝑥𝑘),
the region of points closest to center 𝑥𝑖 is defined as
𝑇𝑖(𝑥) =
[︀
𝜉 : 𝜉 ∈ 𝐸𝑁 , |𝜉 − 𝑥𝑖| ≤ |𝜉 − 𝑥𝑗 |, 𝑗 = 1, 2, ..., 𝑘
]︀
, (7)
where 𝑇𝑖(𝑥) is known as the Voronoi cell [248] for the center 𝑥𝑖. These regions
are used to build the final partition sets in a sequential manner as
𝑆𝑘(𝑥) = 𝑇𝑘(𝑥)𝑆
′
1(𝑥)𝑆
′
2(𝑥) . . . 𝑆
′
𝑘−1(𝑥), (8)
so that the final partition 𝑆(𝑥) has the property that every point in a cluster 𝑆𝑖(𝑥)
is closer to its center than the other established centers. Independently, a similar
approach was proposed by Lloyd [249] at Bell labs, where the repeating steps of the
iterative process were the assignment displayed in Equation 9 and update displayed
in Equation 10.
45
Lauri Vasankari
𝑆
(𝑡)
𝑖 =
{︃
𝑥𝑝 : 𝑖 = argmin
𝑗∈{1,...,𝑘}
‖𝑥𝑝 − 𝜇(𝑡−1)𝑗 ‖2
}︃
(9)
𝜇
(𝑡)
𝑖 =
1
|𝑆(𝑡)𝑖 |
∑︁
𝑥𝑝∈𝑆(𝑡)𝑖
𝑥𝑝. (10)
Lloyd’s algorithm works by repeatedly applying these two steps until the par-
tition and centroids no longer change. The modern formulation synthesizes Mac-
Queen’s property and Lloyd’s procedure by framing K-means as an optimization
problem. The goal is to find the partition 𝑆 that minimizes the Within-Cluster Sum
of Squares (WCSS), also known as inertia. The modern notation for the optimized
objective function to minimize the sum of distances over points for all partitions can
be stated as
min
𝑆
𝐽 =
𝑘∑︁
𝑖=1
∑︁
𝑥∈𝑆𝑖
||𝑥− 𝜇𝑖||2. (11)
Another widely adopted and useful unsupervised method is feature reduction
through, for example, Principal Component Analysis (PCA), for which the mathe-
matical background and formal name were introduced by Pearson [250] and Hotelling
[251] approximately a century ago. PCA can be used to identify most relevant fea-
tures in the data to reduce computational complexity for succeeding training tasks or
to provide other analysis of the underlying implications.
While unsupervised learning is effective and useful for, e.g., large scale data anal-
ysis, this research does not cover unsupervised tasks as a primary research problem,
although the methods are affected by FL, introduced in Section 4.3.
If data is not available at all, it can be in some instances generated or gathered
during training. This applies to RL, elaborated in Section 4.2, as it relies on an
exploratory agent that interacts with an environment, usually virtual, and by accu-
mulating experience of interactions and having a reward mechanism instead of a loss
function learns to predict most favorable choice of action from one state to the next
[27]. While data may not be required to start train an RL algorithm, it still has to gen-
erate data on the go. Additionally, data is usually required to formulate the model or
simulation that is used to generate the data.
As described in Section 3.1, evaluation is a critical part of the ML development.
It relies on quantitative criteria such as F1-score [252], Receiver Operating Char-
acteristic (ROC) curve [253; 254], Area Under Curve (AUC) [255], and qualitative
criteria consisting of, e.g., interpretability assessment.
To evaluate the quantitative performance of classification models, the ROC and
AUC are utilized as robust performance metrics that require formal definition.
46
Machine Learning research areas
The ROC curve [253; 254] illustrates the diagnostic ability of a binary classifier
system as its discrimination threshold, 𝜏 , is varied. It is created by plotting the True
Positive Rate (TPR) against the False Positive Rate (FPR) across all possible thresh-
old settings. The TPR, also known as sensitivity or recall, represents the proportion
of actual positives that are correctly identified, and is defined as:
𝑇𝑃𝑅 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
, (12)
where 𝑇𝑃 is the number of true positives and 𝐹𝑁 is the number of false nega-
tives. The FPR, also known as the fall-out or the probability of false alarm, represents
the proportion of actual negatives that are incorrectly identified as positives, and is
defined as:
𝐹𝑃𝑅 =
𝐹𝑃
𝐹𝑃 + 𝑇𝑁
, (13)
where 𝐹𝑃 is the number of false positives and 𝑇𝑁 is the number of true nega-
tives.
While the ROC curve provides a graphical representation of the trade-off be-
tween sensitivity and specificity, the AUC [255] provides an aggregate measure of
performance across all possible classification thresholds. Mathematically, it is the
two-dimensional area underneath the entire ROC curve from (0, 0) to (1, 1), com-
puted as the integral of the TPR with respect to the FPR:
𝐴𝑈𝐶 =
∫︁ 1
0
𝑇𝑃𝑅(𝐹𝑃𝑅−1(𝑥)) 𝑑𝑥 (14)
An AUC value ranges from 0.0 to 1.0. An AUC of 1.0 represents a perfect model,
while an AUC of 0.5 denotes a model performing no better than random guessing. In
probabilistic terms, the AUC value represents the probability that the classifier will
rank a randomly chosen positive instance higher than a randomly chosen negative
one [255]. This makes it an especially valuable metric in military AI applications,
such as target recognition or anomaly detection, where the cost of false positives
(false alarms) and false negatives (missed threats) must be carefully weighed.
As a well known ML metric, the F1-score was first introduced as a composite
E-measure by Van Rijsbergen [252], which is denoted as
𝐸 = 1− 1
𝛼( 1𝑃 ) + (1− 𝛼)( 1𝑅)
, (15)
where 𝑃 is precision, calculated as |𝐴∩𝐵|𝐵| , where 𝐴 are relevant samples and 𝐵
are retrieved samples. Together, the intersection |𝐴∩𝐵| is the set of true positives. 𝑅
is recall, defined as |𝐴∩𝐵||𝐴| , and 𝛼 ∈ [0, 1] denotes the composite weighting between
precision and recall; in Van Rijsbergen [252], it is denoted as 12 for equal weight.
47
Lauri Vasankari
In modern terms, the precision is the number of true positives divided by the sum
of all positives, true and false, while recall is the true positives divided by the sum
of true positives and false negatives. The precision depicts the model’s accuracy in
positive predictions, while recall measures how well the model is able to detect all
the positive instances. The E-measure version known as F1-score combines both of
these measurements into one evaluation measure, modernly formulated as
𝐹1 = 2× 𝑃 ×𝑅
𝑃 +𝑅
. (16)
The precision, recall and F1-score have originated from binary classification or
prediction tasks, but can be scaled to more complex scenarios by, e.g., averaging over
classes.
In the following sections, each ML research area from CV to FL with respective
original publication is explored in depth. Each section starts with an introduction
to the research area, after which the publication is reviewed to form an analytical
insight to the research problem of this dissertation. It is noticeable that the separation
between paradigms is fluid, as different model architectures and approaches afflict
several paradigms, usually in a manner where the novel solutions are discovered
under one paradigm and extended to others. Hence, the separation is some what
arbitrary and artificial, but correlates well with the original publications as such.
4.1 Computer Vision background
The genesis of CV as a field can be traced back to the 1950s and 1960s. Early re-
search drew inspiration from neurobiology, notably the work of David Hubel and
Torsten Wiesel [256; 257] on the mammalian visual cortex, which revealed a hi-
erarchical structure of neurons responsible for detecting edges and orientation. In
1963, Lawrence G. Roberts’ Ph.D. thesis [258], often cited as a pioneering work,
showed how to derive 3D information about a ”blocks world” from 2D images. CV
was largely based on 3-dimensional projective geometry, with hand-crafted features
constructed, with for example edge detection [259] or Local Binary Patterns (LBP)
[260], and then used as inputs to simple learning algorithms [261], accompanied by
feature detection and extraction was dominated by handcrafted feature descriptors
that were designed and tuned to be invariant to changes in scale, rotation and illumi-
nation, such as SIFT [262] and SURF [263].
The fundamental goal of computer vision is to extract meaningful information
from visual data. This separates it from image processing, where the image is edited
and modified. CV process is typically structured as a sequence of steps:
• Image acquisition, which happens with a sensor such as a camera, converts
sensory data from physical world into a numerical representation as a grid of
pixel values.
48
Machine Learning research areas
• Image processing, where low-level processing techniques are applied to the
raw pixel data to prepare it for analysis
• Feature detection and extraction, a crucial step to identify salient patterns or
points of interest in the image.
Core tasks in CV include image classification, object detection, image segmenta-
tion, caption generation, synthesis, inpainting, style transfer, super-resolution, depth
prediction and scene reconstruction [173]. In this thesis, the focus is on classification
and object detection. In caption generation a caption is automatically generated for
an image, meaning that the model has been trained with data consisting of images
and their captions. Synthesis refers to generation of new images, which is discussed
later, while inpainting is an image editing application to, for example, remove un-
wanted objects. Style transfer refers to converting, for example, a photograph to an
oil painting, while super-resolution improves the image resolution by increasing the
number of pixels with generation. Depth prediction predicts the distance from the
camera to the objects from one or more views, and scene reconstruction creates an
additional dimension to an image, for example from black and white to color.
In classification, a fundamental task, whole images are assigned a single label
from a set of categories, with ”cat” and ”dog” as classical examples. Object detection
is a more complex task where the goal is to detect objects and their location within
an image, which can then be classified. Image segmentation can be divided into
semantic segmentation, where each pixel of the image is classified as belonging to a
particular category, such as the ”cat”, or instance segmentation where each instance
of the same object class is distinguished.
The field was one of the first to be greatly transformed by modern DL methods,
predominantly using the Convolutional Neural Networks (CNN) architecture [264].
CNNs are a class of neural networks specifically designed for processing grid-like
data, such as images [6; 173]. Key components of a CNN include the namesake,
convolutional layers, which apply a convolution operation, a special linear operation
that replaces the general matrix multiplication of linear neural networks. It is usually
denoted with an asterisk, as in 𝑠(𝑥) = (𝑥 * 𝑤) [6]. The convolution is usually used
over more than one axis in the grid, which means that for two-dimensional image 𝐼 ,
a two-dimensional convolution kernel 𝐾 results in formulation
𝑆(𝑖, 𝑗) = (𝐼 *𝐾)(𝑖, 𝑗) =
∑︁
𝑚
∑︁
𝑛
𝐼(𝑚,𝑛)𝐾(𝑖−𝑚, 𝑗 − 𝑛), (17)
where 𝐼 is the input image with grid coordinates (𝑖, 𝑗) and kernel 𝐾 has di-
mensions (𝑚,𝑛). While this is the mathematical convolution, neural network pro-
gramming libraries usually implement a related function called cross-correlation [6],
which is similar to convolution but omits the kernel flipping:
49
Lauri Vasankari
𝑆(𝑖, 𝑗) = (𝐾 ⋆ 𝐼)(𝑖, 𝑗) =
∑︁
𝑚
∑︁
𝑛
𝐼(𝑖+𝑚, 𝑗 + 𝑛)𝐾(𝑚,𝑛). (18)
The cross-correlation is still called convolution by convention, and the mathe-
matical difference is mainly irrelevant for the intended purpose in CV.
Convolutional layers are followed by pooling layers. These layers reduce the
spatial dimensions (width and height) of the feature maps, which helps to decrease
computational complexity and control overfitting. By stacking multiple convolu-
tional and pooling layers, CNNs can learn a hierarchy of features. Early layers learn
simple features like edges and colors, while deeper layers combine these to learn
more complex patterns like shapes, object parts, and eventually, entire objects. Ar-
chitectures like ResNet (Residual Network) [265] later enabled the training of much
deeper networks, further pushing the performance on various computer vision tasks.
The trajectory of computer vision was fundamentally altered in 2012. The in-
troduction of a deep CNN named AlexNet [266] resulted in a dramatic reduction in
error rates on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
[267], a prominent benchmark for image classification. This event marked the be-
ginning of the deep learning era in computer vision, rendering handcrafted feature
engineering largely obsolete.
Region-based methods aim to identify regions of interest in an image for further
classification task. A pioneer solution was Regions with CNN features (R-CNN)
[268], which combines region proposals with CNNs. One of the most prominent
region based CV methods is You Only Look Once (YOLO) [269], which frames
the object detection as a regression problem and spatially separates bounding boxes,
i.e., the areas where objects are detected, and provides class probabilities for each
detected object, unifying these separate components into a single NN. The original
model architecture comprises 24 convolutional layers and two fully connected lay-
ers, of which 20 convolutional layers, a pooling layer and a fully connected layer
were used for pretraining before modifying into the final detection architecture. The
design divides an input image into a 𝑆 × 𝑆 grid, in which each cell predicts bound-
ing boxes as 𝑥, 𝑦, 𝑤 and ℎ, along with confidence defined as 𝑃𝑟(Object) * 𝐼𝑂𝑈 truthpred .
The confidence represents the Intersection over Union (IOU) between the predicted
box and the ground truth. The 𝑥 and 𝑦 represent the center of the box in relation to
the grid cell while 𝑤 and ℎ represent the relation to the whole image. During infer-
ence, the conditional class probabilities and individual box confidence predictions
are multiplied as
𝑃𝑟(Class𝑖|Object) * 𝑃𝑟(Object) * 𝐼𝑂𝑈 truthpred , (19)
which gives each box a class-specific score. Due to the combined structure,
the unified YOLO architecture is fast enough to process video stream frame rate of
images live, making it a go-to solution for a plethora of CV applications.
50
Machine Learning research areas
More recently, the Transformer architecture, which achieved state-of-the-art re-
sults in natural language processing, has been adapted for computer vision tasks. The
transformer architecture and its attention mechanis,m is further studied in NLP and
GenAI context in Section 4.4. The Vision Transformer (ViT) [270] model, intro-
duced in 2020, demonstrated that a pure transformer architecture could perform on
par with or better than CNNs on image classification.
Unlike CNNs, which have a strong inductive bias for locality, ViTs treat an image
as a sequence of patches. To process 2D images, the ViT reshapes an input image
𝑥 ∈ R𝐻×𝑊×𝐶 into a sequence of flattened 2D patches 𝑥𝑝 ∈ R𝑁×(𝑃 2·𝐶), where
(𝐻,𝑊 ) is the resolution of the original image, 𝐶 is the number of channels, (𝑃, 𝑃 )
is the resolution of each image patch, and 𝑁 = 𝐻𝑊𝑃 2 is the resulting number of
patches. These patches are then mapped to a constant latent vector size 𝐷 using
a trainable linear projection, effectively treating the image patches as tokens in a
sequence. Conceptually, the idea of splitting the image into patches is, in a way,
similar to the YOLO functionality of splitting the image into the 𝑆 × 𝑆 grid, as both
methods move away from having a computationally taxing sliding window solution
or a pixel-by-pixel approach.
The network then uses a self-attention mechanism [9] to weigh the importance
of different patches when creating a representation of the image. The core of this
mechanism is the scaled dot-product attention, which computes a weighted sum of
values (𝑉 ), where the weight assigned to each value is determined by the dot-product
of a query (𝑄) with all keys (𝐾). This is formally defined as:
Attention(𝑄,𝐾, 𝑉 ) = softmax
(︂
𝑄𝐾𝑇√
𝑑𝑘
)︂
𝑉, (20)
where 𝑄, 𝐾, and 𝑉 are matrices derived from the input token embeddings mul-
tiplied by learned weight matrices, and 𝑑𝑘 is the dimension of the keys. The scaling
factor 1√
𝑑𝑘
is applied to prevent the dot products from growing too large in magni-
tude, which would otherwise push the softmax function into regions with vanishing
gradients [9].
This global attention mechanism allows ViTs to capture long-range dependencies
within an image more effectively than standard CNNs, as every patch can theoret-
ically attend to every other patch in the image sequence from the very first layer.
Hybrids and variants like the Swin Transformer [271] have further improved effi-
ciency and performance on tasks like object detection and segmentation.
While YOLO excels at real-time inference, the reliance on large annotated datasets
is the primary bottleneck. To overcome this, there have been advances in self-
supervised Vision Transformers, such as the DINO (Self-Distillation with No La-
bels) family of models [272; 273]. Unlike YOLO, DINO does not require bound-
ing box annotations as it leverages self-supervised learning to extract rich, universal
feature representations from unlabeled imagery. These self-supervised models can
51
Lauri Vasankari
subsequently be fine-tuned for specific operational tasks using only a fraction of the
labeled data that traditional supervised networks require.
The progress driven by DL has enabled a vast array of real-world applications.
Autonomous systems, such as self-driving cars and drones, rely heavily on CV to
perceive their environment, detect obstacles, recognize traffic signs, and navigate
safely [269]. In medical imaging CV algorithms assist radiologists in analyzing
medical scans (X-rays, CTs, MRIs) for the early detection and diagnosis of diseases
like cancer, segmenting tumors, and quantifying anatomical structures [274]. Facial
recognition technology is used for identity verification and access control [275; 276].
Video surveillance systems employ computer vision to detect anomalous activities,
count people, and monitor crowds. In agriculture, drones and cameras equipped with
CV systems monitor crop health and diseases [277; 278]. Augmented Reality (AR)
and Virtual Reality (VR) technologies use computer vision for spatial mapping, ob-
ject tracking, and surface detection to seamlessly blend digital content with the real
world [279; 280; 281]. In manufacturing, automated quality control on production
lines uses CV to inspect products for defects at high speed, surpassing human capa-
bilities [282; 283; 284; 285].
Despite significant progress, CV faces several ongoing challenges and is evolv-
ing in new directions. Training state-of-the-art DL models often requires massive,
meticulously labeled datasets. A key area of research is developing methods that
can learn from less labeled data, such as few-shot learning [286; 287; 288], self-
supervised learning [289; 290; 291; 272], and the use of synthetic data [292; 293].
Models can be brittle and fail when presented with inputs that differ slightly
from their training data due to, e.g., changes in lighting, viewpoint, or context, but
can also be affected by intentional injection of suitable noise unnoticeable for human
eye, known as adversarial examples [294]. Improving model robustness to these
variations is a critical challenge. Likewise, moving beyond 2D images to understand
the 3D world is a major frontier. This includes tasks like 3D object reconstruction
from images, depth estimation, and scene understanding from point clouds. In addi-
tion, the integration of vision with other modalities, such as language and audio, is a
growing trend. This leads to more comprehensive AI systems that can, for example,
answer questions about an image or generate textual descriptions of a video.
From a practical standpoint, deploying complex computer vision models on resource-
constrained devices like mobile phones and embedded systems requires model opti-
mization, efficient hardware and preferably co-optimizing both in a hardware aware
manner [295]. Edge AI is the research area that focuses on running these models lo-
cally for real-time processing and improved privacy. The rise of generative models,
including Generative Adversarial Networks (GANs) [6] and diffusion models [173],
has enabled the creation of highly realistic synthetic images and videos. This has ap-
plications in data augmentation, media creation, and also raises societal challenges
related to deepfakes and misinformation.
52
Machine Learning research areas
In conclusion, computer vision has evolved from a nascent field of academic
inquiry into a mature and impactful technology. While the DL paradigm currently
dominates, the field continues to advance rapidly, driven by new architectures, larger
datasets, and the pursuit of more robust, efficient, and comprehensive visual under-
standing.
4.1.1 CV in military domain
In the military context, CV is one of the most mature and widely applied areas of
ML, used for tasks ranging from automated target recognition (ATR) in satellite and
drone imagery to facial recognition and vehicle identification. The core challenge
often lies in acquiring sufficient high-quality, labeled data for training and ensuring
the model’s robustness in diverse and adverse environmental conditions.
This thesis first explores the practical application of CV to the challenging naval
domain of littoral warfare. In Publication I, ”DeepMix: AI in Littoral Sonar Oper-
ations,” a novel approach is presented for detecting objects from sonar images. The
study addresses the significant challenges inherent in the underwater domain, such as
high levels of noise, varying environmental conditions, and the scarcity of available,
high-quality sonar data.
The research detailed in Publication I applies DL techniques to enhance object
detection capabilities in shallow-water sonar operations. This serves as a founda-
tional applicability assessment for the dissertation, grounding the research in a tan-
gible and difficult military problem which not only highlights the results but the
underlying issues in creating AI capabilities for military units. The research investi-
gates how advanced AI techniques can be leveraged to process complex sensor data,
thereby aiming to improve situational awareness and reduce the cognitive load on
human operators in critical maritime environments. The findings from this study
provide a low-level, practical perspective on the opportunities and limitations of ap-
plying CV in a data-scarce, high-stakes military setting.
From a CRISP-DM point of view, the purpose of the study was founded in the
doctrinal need to speed up data processing to enhance Mine Countermeasure (MCM)
operations. The current process of MCM was assessed and the relevant data iden-
tified, observing that the MCM units produce a lot of post-processed data that has
been annotated by the subject matter experts, i.e., the operators. Therefore, this was
a suitable research area to apply state-of-the-art ML methods such as ViT [270] and
Mixture of Experts (MoE) [296; 297] in modeling to create novel value from old
data.
The data understanding was created in collaboration with military personnel,
while data preparation was alike any CV problem: the data was cropped and harmo-
nized for modeling. The modeling itself had several hypotheses, utilizing individual
AI models from conventional ML models to novel DL models.
53
Lauri Vasankari
The modeling itself tested several hypotheses, utilizing individual AI models
ranging from conventional ML models, specifically Random Forest (RF) [298] and
SVM [182; 299], to novel DL architectures.
The conventional models were selected to serve as robust, non-deep-learning
baselines, establishing a performance floor for standard classification before intro-
ducing more complex neural architectures. Formally, an SVM is a maximum-margin
classifier that seeks the optimal hyperplane to separate classes by minimizing 12 ||𝑤||2
subject to the constraint 𝑦𝑖(𝑤𝑇𝑥𝑖 + 𝑏) ≥ 1 for separable data. Meanwhile, an RF
operates as an ensemble learning method that constructs a multitude of decision trees
during training, utilizing bootstrap aggregating (bagging) and random feature selec-
tion to output the mode of the classes, thereby reducing variance and overfitting.
The selection of the individual DL models was methodologically deliberate to
contrast distinct computer vision paradigms. Specifically, the comparison included
Visual Geometry Group (VGG) [300] and ViT [270] to highlight the transition from
strictly localized feature extraction to global contextual representation.
VGG serves as a robust, well-understood and widely used baseline that repre-
sents deep CNN architectures which rely heavily on spatial inductive bias to capture
local pixel hierarchies. In contrast, the ViT architecture acts as the state-of-the-art
alternative that lacks this inherent spatial bias as explained above, instead using self-
attention to capture long-range, global dependencies across the entire sonar image.
Comparing these two fundamentally different approaches provides critical insight
into whether the heavily obscured and noisy nature of underwater sonar imagery
benefits more from strict local feature extraction or global contextual awareness.
The most promising approach combined the performance of several models using
a MoE architecture [297]. Formally, an MoE system consists of 𝑁 expert networks,
𝐸1, . . . , 𝐸𝑁 , and a gating network, 𝐺. For a given input 𝑥, each expert produces an
output 𝐸𝑖(𝑥), and the gating network outputs an 𝑁 -dimensional vector 𝐺(𝑥) repre-
senting the probability distribution over the experts. The final aggregated prediction
𝑦 is computed as the linearly weighted sum of the experts’ outputs:
𝑦 =
𝑁∑︁
𝑖=1
𝐺(𝑥)𝑖𝐸𝑖(𝑥) (21)
In Publication I, this approach was used to weight the classification predictions
of the best-performing individual models. By allowing the gating network 𝐺 to learn
which underlying models (experts) were most reliable for specific types of features in
the sonar data, the MoE ensemble effectively mitigated the difficulty of the problem,
surpassing individual model performances on all metrics.
The evaluation was done with 𝑘-fold metrics, using a tenth of the data as a test
set for each round and taking the average performance over ten rounds of modeling
for each solution. Therefore, the evaluation was operationally relevant and used the
54
Machine Learning research areas
small data set to maximum extent.
The key finding was that a mixture of several capable DL models was able to
mitigate the difficulty of the problem to an extent, but the resulting models did not
exhibit adequate reliability to be deployed for operational use. This is due to two
main reasons. First of all, the data set was very small, only hundreds of images.
Secondly, the data was of poor quality, as it was extracted from a post-processing
system. This decreased the resolution of the data, potentially resulting in losing
usable features. In addition, the data did not include the accurate location of the
objects but a noisy location, so even evaluation for accurate object detection was
difficult.
As such, the research invokes a hypothesis that military units create vast amounts
of data without a data governance architecture that would allow and mandate using it
for the development and fine-tuning of AI models afterwards or even hypothetically
in a continuous setting. In order to increase the organizational maturity to leverage
AI solutions, the data pipelines need to be considered holistically, so that the work of
the operators who initially process the data is not lost in the process. Data pipeline,
in this instance, is not just the software-based data integration and processing but
rather covers the operational process from the user or operator to the AI engineers
and developers. This poses requirements for the operated systems, which must allow
saving the operator insight and data annotations as meta data to enable effective post-
processing and model development with AI methods.
On a conceptual level, this hypothesis calls for the development of a data capa-
bility, where the systemic exploitation of data is treated as an active military function
rather than a temporary or ephemeral input-output feed.
4.2 Reinforcement Learning
RL is a paradigm of machine learning where an agent or several agents learn to
make sequential decisions by interacting with an environment.[27] Unlike supervised
learning, the agent is not told which actions to take but instead discovers which ac-
tions yield the most reward or least penalty through a process of trial and error. The
core components include an agent, the environment, states, actions, and rewards,
with the agent’s goal being to learn an optimal policy that maximizes cumulative
rewards over time. This framework is exceptionally well-suited for modeling the
dynamic and uncertain nature of military operations, which are fundamentally se-
quential decision-making problems in a dynamic environment.
RL origins are closely connected to the history of optimization. A unified frame-
work that encompasses both stochastic optimization and RL has been advocated by
Powell [301]. Fundamentally, despite differences, stochastic optimization and RL
share a lot in common, although stochastic optimization emphasizes model-based
approaches and RL is largely focused on model-free learning and value-based meth-
55
Lauri Vasankari
ods.
The foundation of modern RL is the framework developed by Sutton and Barto
[27]. In their formulation, RL is presented as a problem of learning to map situations
to actions to maximize a numerical reward signal. The agent is not provided with
explicit instructions on which actions to take. Key aspects of this problem are that
actions may affect not only the immediate reward but also subsequent situations and,
consequently, all future rewards. These two characteristics, trial-and-error search
and delayed reward, are the distinguishing features of reinforcement learning.
The core components of an RL problem are
• The policy (𝜋), which defines the learning agent’s behavior at a given time. It
maps perceived states of the environment to the actions to be taken
• The reward (𝑅𝑡), which signals immediate desirability, whereas a value func-
tion estimates long-term desirability. The value of a particular state represents
the total amount of reward an agent can expect to accumulate over the future
of sequential actions beyond that current state
• The model of the environment (optional)1, which captures the behavior of
the environment, allowing inferences to be made about how it will respond to
actions.
The RL problem is formalized through a Markov Decision Process (MDP) [27].
An MDP is defined as a tuple (𝒮,𝒜,𝒫,ℛ, 𝛾), where 𝒮 is the set of states, 𝒜 is the
set of actions, 𝒫 is the transition probability function for each state and ℛ is the
reward function while 𝛾 is a discount factor used to estimate accumulating rewards
with certain weight beyond the immediate return, i.e., 𝛾 = 0 would neglect future
rewards past the next state, which is denoted as 𝑠′. An MDP possesses the Markov
property, meaning that future is independent of the past given the present.
To contextualize this formalization within the military complexity conceptually
described in Subsection 3.2.6, the true scale of the state space 𝒮 and action space 𝒜
in military operations is subject to the curse of dimensionality. When quantifying the
operational environment, the state space expands exponentially based on the number
of units, capabilities, and environmental variables. The complexity can be formally
defined as:
|𝑆unit| = |𝑉sensors| × |𝑉vars|, (22)
where |𝑉sensors| and |𝑉vars| represent the number of possible combined states for
the unit’s sensors and its internal and environmental variables, respectively. The
internal variables include human aspects, technical requirements such as power and
maintenance and so on, while the environmental variables include weather, impacts
of opposing force etc.
1There is always an environment, but a model of the environment is optional.
56
Machine Learning research areas
Similarly, the action space 𝒜 available to a commander scales combinatorially
with the forces deployed. For a headquarters managing 𝑛 similar units, the total state
space, 𝑆HQ, grows exponentially, as the headquarter has to theoretically deal with all
possible combinations of sensors and variables for each unit that are not dealt with
at the lower level. In reality, the unit has been delegated the autonomy to act on
certain inputs on its own, but even these autonomous actions create inputs to higher
echelons. It can be deemed that the set of 𝑎𝑖 independent actions in the set of all
actions 𝒜 does not alter the rate of changes in the |𝑆unit|, all of which may or may
not concern the headquarters. As the headquarter decides for itself whether or not all
or a subset of changes in state are a concern, this relationship is the primary driver of
complexity.
Furthermore, the transition probability function 𝒫 in a military MDP is highly
volatile due to the friction of war and unexpected events. Rather than treating these as
static probabilities, the introduction of unexpected events over time can be modeled
as a Poisson process. Let 𝜆 be the average rate of ”shock” events (such as equip-
ment failure or enemy contact) per unit of time for a single unit. The probability of
observing exactly 𝐾 = 𝑘 shocks over a time horizon of length 𝑇 is:
𝑃 (𝐾 = 𝑘, 𝑇 ) =
(𝜆𝑇 )𝑘𝑒−𝜆𝑇
𝑘!
(23)
Hence, for example, if 𝑇 = 1 and 𝜆 = 0.5, 𝑃 (𝐾 = 1, 𝑇 = 1) ≈ 0.3 for one
event (𝑘 = 1). For a system of 𝑛 units, the aggregate event rate is 𝜆HQ = 𝑛𝜆. The
expected number of shocks, 𝐸[𝐾], i.e., the mean of Poisson distribution 𝑃 (𝐾 =
𝑘, 𝑇 ) in Equation 23, increases linearly with both the number of units and the time
horizon:
𝐸[𝐾] = 𝜆HQ𝑇 = 𝑛𝜆𝑇 (24)
The total complexity of the planning problem, 𝒞, is therefore a function of the
exponential growth in the state-action space and the temporal expansion of the deci-
sion tree due to uncertainty. If each unit has 𝑎 actions available, the total number of
joint actions is 𝑎𝑛. The complexity can thus be conceptualized as:
𝒞 ∝ 𝑓(|𝑆unit|𝑛, 𝑎𝑛, 𝑛𝜆𝑇 ). (25)
As formulated in Equations 22 through 25, the sheer mathematical scale of mil-
itary operations necessitates abstraction and sophisticated approximation methods,
as exact tabular solutions are infeasible to compute. Instead of calculating every
possible future permutation within this exponentially growing decision tree, an au-
tonomous agent must learn to estimate the expected long-term utility of its current
state and chosen actions under uncertainty. The mathematical foundation for break-
ing down these infinitely complex sequential decisions into recursive, evaluable steps
57
Lauri Vasankari
is the Bellman equation for the optimal state-value function, 𝑣* [302], of which the
original notation is shown in Equation 26:
𝑓(𝑝) = 𝑆𝑢𝑝
𝑞
[𝑔(𝑝, 𝑞) + ℎ(𝑝, 𝑞)𝑓(𝑇 (𝑝, 𝑞))] (26)
This early version provided by Bellman gives a mathematical formulation for
his principle of optimality which states that ”An optimal policy has the property
that whatever the initial state and initial decisions are, the remaining decisions must
constitute an optimal policy with regard to the state resulting from the first decision”
[302]. The 𝑓(𝑝) denotes value function at state 𝑝, 𝑞 is an action variable, 𝑔(𝑝, 𝑞) is the
immediate reward function for state 𝑝 and action 𝑞, 𝑇 (𝑝, 𝑞) is the transition function
for the new state given respective state 𝑝 and action 𝑞 for which 𝑓(𝑇 (𝑝, 𝑞)) is the
value of next state, ℎ(𝑝, 𝑞) is the discount function and 𝑆𝑢𝑝 denotes supremum,
i.e., maximum over actions. The modern, widely recognized versions of the same
equation are
𝑣*(𝑠) = E
[︀
𝑅𝑡+1 + 𝛾𝑣*(𝑆𝑡+1)|𝑆𝑡 = 𝑠,𝐴𝑡 = 𝑎
]︀
(27)
= max
𝑎
∑︁
𝑠′,𝑟′
𝑝(𝑠′, 𝑟|𝑠, 𝑎)
[︁
𝑟 + 𝛾𝑣*(𝑠′)
]︁
, (28)
where in the latter, original 𝑓(𝑝) is denoted as 𝑣*(𝑠), ℎ(𝑝, 𝑞) as 𝛾, 𝑔(𝑝, 𝑞) ulti-
mately as 𝑟 + 𝛾𝑣*(𝑠′) and 𝑓(𝑇 (𝑝, 𝑞) as 𝑝(𝑠′, 𝑟|𝑠, 𝑎).
Sutton and Barto [27] have expanded the original Bellman equation to incorpo-
rate full stochastic MDP model as well as policy and expectations over actions. An
optimal action-value function is defined as
𝑞*(𝑠, 𝑎) = E
[︁
𝑅𝑡+1 + 𝛾max
𝑎′
𝑞*(𝑆𝑡+1, 𝑎′)
⃒⃒⃒
𝑆𝑡 = 𝑠,𝐴𝑡 = 𝑎
]︁
=
∑︁
𝑠′,𝑟
𝑝(𝑠′, 𝑟|𝑠, 𝑎)
[︁
𝑟 + 𝛾max
𝑎′
𝑞*(𝑠′, 𝑎′)
]︁
(29)
where 𝑞*(𝑠, 𝑎) is action-value function, 𝑝(𝑠′, 𝑟|𝑠, 𝑎) is the probability of ending
in state 𝑠′ and receiving reward 𝑟 given current state 𝑠 and action 𝑎, with the dis-
counted optimal future value 𝛾 max
𝑎′
𝑞*(𝑠′, 𝑎′). The introduction of probability distri-
bution 𝑝 induces stochasticity. For a state-value function 𝑣𝜋(𝑠) under policy 𝜋, the
formulation is
𝑣𝜋(𝑠) =
∑︁
𝑎
𝜋(𝑎|𝑠)
∑︁
𝑠′,𝑟
𝑝(𝑠′, 𝑟|𝑠, 𝑎)[︀𝑟 + 𝛾𝑣𝜋(𝑠′)]︀, (30)
which states that the value of a state, 𝑣𝜋(𝑠), under policy 𝜋 is the expected dis-
counted return from that starting state onward following the policy.
58
Machine Learning research areas
Combining Bellman equation, and its variations, to (stochastic) MDP is the foun-
dation for RL, which aims to optimize a target value in a sequential problem. RL in-
troduces another fundamental concept, which is the exploratory, trial-and-error learn-
ing. The learning aims to an optimal policy, in other ML denoted as a model, and the
learning of the policy is guided or updated usually according to a value function.
In Sutton and Barto [27] definitions, RL algorithms can be classified into tabular
and approximation methods. Tabular methods relate to problems that are discrete
in nature or can be discretized to form a table that encompasses the values related
to each state or state-action pair. If the state or action spaces are continuous, an
approximation method such as an NN is needed. Additionally, algorithms can be
classified into direct and indirect methods, where direct refers to learning a policy
from direct interactions with the environment, and indirect approach involves learn-
ing a model of the environment that is used to learn the policy from. The common
understanding, which deviates slightly from this interpretation, is that model-based
learning is regarded as indirect and model-free learning is direct. The third distinc-
tion is between on-policy and off-policy learning. On-policy learning means that the
algorithm addresses updates directly to the policy that is used to explore the state-
action space, while off-policy learning utilizes a separate policy for exploration and
the learned parameters are iteratively transferred to the policy that is the end result
of the training.
With its lengthy history, rooted in optimization and dynamic programming, RL
has made a tremendous impact in recent years. Sophisticated RL solutions have
shown performance beating human capabilities in games like Go [303] and Star-
Craft II [304]. These advances have been advocated by adopting sophisticated ML
approaches to RL, such as Monte Carlo Tree Search (MCTS) [305; 306; 307] and
neural networks from DL. Most recently, RL has been bridged to Reinforcement
Learning from Human Feedback (RLHF) [308]. In essence, human comparisons be-
tween trajectory segments are used to train a reward model, which is used as a proxy
reward function in RL. This has been extended to GenAI, where pretrained language
models are fine-tuned on a smaller datasets of human-generated response. Another
approach is Inverse Reinforcement Learning (IRL) [309], which aims to infer the
reward function of an agent given its policy or observed behavior. In the military set-
ting, the approach could be used to analyze the goals and commander’s intent of an
adversary from observed behavior of troops, given a suitable environment to enable
such an analysis.
4.2.1 RL in military domain
This research focus is on the adaptability and adaptation of RL into military domain
and problems. The Publication II explores the integration of Multi-Agent Reinforce-
ment Learning (MARL) into the domain of littoral naval warfare, focusing on its
59
Lauri Vasankari
capacity to generate tactical Course of Actions (COAs) under littoral, adversarial
maritime conditions. The study utilizes a simplistic, bespoke simulator to model
naval engagements as a Partially Observable Stochastic Game (POSG) and tests two
MARL algorithms, Multi-Agent Double Deep Q Network (MADDQN) and Multi-
Agent Proximal Policy Optimization (MAPPO).
The selection of these two specific algorithms is methodologically deliberate to
contrast two different RL paradigms. MADDQN serves as a well-established, value-
based, off-policy baseline that operates over a discrete action space, representing a
standard Deep Q-learning solution in multi-agent form, while MAPPO represents
a state-of-the-art, on-policy actor-critic approach that natively utilizes a continuous
action space. By comparing a value-based off-policy model against an advanced
policy-gradient method, the research evaluates how different underlying learning me-
chanics adapt to the uncertainties of the naval POSG environment.
The POSG formulation expands the stochastic MDP to include partial observ-
ability characteristic to real-life military scenarios, where the transparency of the
environment is clouded by at least three features: the lack of insight to adversarial
planning, force composition, and capabilities, the alternating surrounding environ-
ment which encompasses for example the weather, non-combatants and third parties,
as well as uncertain outcomes of different interactions, from warfighting functions to
equipment malfunctions.
With MADDQN, the policy update is based on the Bellman equation shown
in Equation 26, where the update is selected between the current policy and target
network as
𝑌𝑡 = 𝑅𝑡+1 + 𝛾𝑄(𝑆𝑡+1, argmax 𝑄(𝑆𝑡+1, 𝑎; 𝜃𝑡), 𝜃𝑡−1), (31)
where 𝑅𝑡+1 is the reward, 𝛾 is the discount factor, 𝑆 is the set of states, and
𝜃𝑡 is the policy network and 𝜃𝑡−1 is the target network which is used to stabilize
the original DQN training. Double Deep Q-Networks (DDQN) [310] separates the
maximum operation in the target network into action selection and action evaluation,
as depicted in Equation 31.
MAPPO, the multi-agent version of a PPO algorithm [311], is an actor-critic
algorithm that has two separate NNs, the actor which executes the policy and the
critic that serves as a value function. In this case, all the agents on the same side
shared their actor and critic networks, and the multi-agent setting comes from having
two opposing sides with their respective settings. As in research by Yu et al. [312],
the actor network is trained to maximize the objective function
𝐿(𝜃) =
[︁ 1
𝐵𝑛
𝐵∑︁
𝑖=1
𝑛∑︁
𝑘=1
min(𝑟
(𝑘)
𝜃,𝑖 𝐴
(𝑘)
𝑖 , clip(𝑟
(𝑘)
𝜃,𝑖 , 1−𝜖, 1+𝜖)𝐴(𝑘)𝑖 )
]︁
+𝜎
1
𝐵𝑛
𝐵∑︁
𝑖=1
𝑛∑︁
𝑘=1
ℋ[𝜋𝜃(𝑜(𝑘)𝑖 )],
(32)
60
Machine Learning research areas
where 𝜋𝜃 is the actor network, 𝑜 denotes observations and 𝑎 actions, 𝑟
(𝑘)
𝜃,𝑖 is the
ratio 𝜋𝜃(𝑎
(𝑘)
𝑖 |𝑜(𝑘)𝑖
𝜋old(𝑎
(𝑘)
𝑖 |𝑜(𝑘)𝑖
between old and current network output values, 𝜃 denotes actor
network parameters, 𝐵 is the batch size, 𝑛 is the number of agents, advantage 𝐴(𝑘)𝑖 is
computed using a Generalized Advantage Estimation common for PPO algorithms
[313], ℋ is the policy entropy and 𝜎 ∈ [0, 1] is the entropy coefficient parameter,
𝜖 ∈ [0, 1] is the clipping ratio to stabilize the training and prevent excessive updates.
The critic network, 𝑉𝜑, that maps states to rewards, is trained to minimize the loss
function
𝐿(𝜑) =
1
𝐵𝑛
𝐵∑︁
𝑖=1
𝑛∑︁
𝑘=1
(max
[︀
(𝑉𝜑(𝑠
(𝑘)
𝑖 )−?^?𝑖)2, clip(𝑉𝜑(𝑠(𝑘)𝑖 )−𝜖, 𝑉𝜑𝑜𝑙𝑑(𝑠(𝑘)𝑖 )+𝜖)−?^?𝑖)2
]︀
,
(33)
where ?^?𝑖 is the discounted reward-to-go, meaning the reward that is accumulated
from the current timestep onward.
The simulation environment is a 100x100 grid based on the northern Baltic Sea,
with terrain and navigability derived from an aerial image. The simulation accounts
for observation sharing, anti-surface warfare in the form of missile salvos and naval
artillery, electronic warfare and littoral tactics like utilizing the cover of archipelago.
The agents share data between the units of the particular side to create observations
of the environment, using a CNN to interpret terrain and environment data, followed
by an aggregated input of additional data before fully connected MLP that is used for
action prediction or probability distribution approximation, depending on the chosen
algorithm.
The study demonstrates that transitioning from standard RL environments to mil-
itary POSGs requires careful algorithmic selection, as different paradigms handle
partial observability and sparse military rewards differently. The comparative evalu-
ation of how these models perform in generating naval COAs is detailed in chapter 5.
Consequently, the research shows that with proper reward engineering RL meth-
ods can discover valid tactics and strategies without doctrinal encoding, and may
enable tactical decision-making edge if novel solutions emerge in the process. Oth-
erwise, the research highlights and validates the upcoming findings of the Publica-
tion III: lack of suitable simulators and operational data inhibits advancing research
and advocates proofs of concept instead of creating an operationally relevant path
from hypothetical to applications in the real-world systems and proper evaluation.
To address the technology assessment objective of this thesis, a systematic re-
view of RL for decision support in defence and security was conducted. This sys-
tematic literature review, presented in Publication III ”Reinforcement Learning for
decision support in defence and security”, was authored by a multinational research
task group SAS-181, convened under NATO’s Science and Technology Organiza-
61
Lauri Vasankari
tion (STO). The aim is to examine how RL has been applied to support decision-
making within defence and security contexts across NATO member states. The
methodology of the survey involved the search strategy, where articles were selected
with structured queries focused on three topics: decision support, RL, and defence
and security, filtered to include only relevant peer-reviewed works with clear descrip-
tions of RL methodologies.
The chosen 20 articles were classified by evaluating across 19 dimensions includ-
ing time horizon, decision type and scale, uncertainty modeling, use of simulators,
RL algorithm, type and presence of explainability features. An analysis using UMAP
[314] was used for dimensionality reduction to visualize the research landscape and
identify clusters of trends.
The survey highlights the applicability of RL to military domain where decision-
making is characteristically partially observable, uncertain and time-sensitive, but
applicable to be formulated as sequential decision-making problems. As such, RL
is viewed as a tool that can support the decision-making by learning policies from
simulations or operational data.
As with this dissertation, the Publication III framework is strongly rooted in
MDMP and OODA-loop, which align RL cycles with real-world command decision-
cycles. Another theoretical framework used is Powell’s universal modeling frame-
work for sequential decision [301; 315]. Against these frameworks, the survey covers
20 articles from application domains including defence planning, force generation,
force projection and force employment, over multiple warfare domains albeit lacking
in Multi Domain Operations.
As RL usually translates, in Powell’s framework, to value function approxima-
tion (VFA) algorithms, VFA algorithms were the dominant policy type. Most fre-
quent algorithms were Q-learning and Proximal Policy Optimization (PPO), mirror-
ing the solutions in Publication II, which notably is also one of the 20 articles that
were analyzed. Nearly all implementations relied on bespoke simulators, with few
interfaces to standard RL environments. This underlines the lack of common simu-
lation architecture for the defence use cases.
According to the findings, most reviewed systems had direct policy usage, mean-
ing that the RL model output directly supports a decision or suggests it. Indirect use,
where the RL model informs a broader planning process, were in minority. Like-
wise, single agent setups were in majority but several adversarial and cooperative
multi-agent approaches were also explored.
From evaluation perspective, all articles used simulation for policy training but
real-world validations were non-existent and results were evaluated either hypothet-
ically or qualitatively instead of empirical evaluations in a relevant environment.
The research contributed to ML and military AI by mapping the state of RL
research in military decision-support systems and identifying priority areas for re-
search and technical innovation. The critical needs exist in explainability, simulator
62
Machine Learning research areas
integration, continual learning and multi-agent coordination. Regarding simulators,
standardization is called for to enable interoperable simulators and shared data for-
mats to enable multinational collaborations.
While the survey confirms RL potential to improve military decision-making,
despite the challenges of uncertainty, partial observability and adversarial dynam-
ics there are several persisting challenges. These include operational integration,
simulation-to-real world transfer and ethical and legal considerations. Future of
RL will likely be characterized by continued experimentation through simulations
and military exercises and development of modular, interoperable and explainable
RL systems designed in a bottom-up manned. The operational integration calls for
trusted, transparent and explainable systems and closer collaboration between re-
searchers and stakeholders to ensure these aspects.
In summary, the Publication III provides the big picture of RL in military decision-
making in an cross-sectional manner, providing a technology assessment. The Publi-
cation II is included in the reviewed articles of Publication III, where it is highlighted
as including most uncertainty factors from all the reviewed 20 articles. It brings the
conceptual insights into a single decision-support task within one specified applica-
tion and warfare domain from a technical and practical perspectives. Consequently,
these publications are supplementary researches into the RL paradigm in the mili-
tary context, encompassing the wide spectrum of problem areas throughout different
levels of granularity.
From the CRISP-DM perspective, the major findings between these two publica-
tions relate to the business understanding and data. In both cases, it is highlighted that
the data is scarce, in silos, and often classified, which hampers the research efforts.
From the business understanding perspective, the heterogeneity of research due to
the limitations does not promote an impact that would otherwise be achievable with
more focused efforts and better infrastructure to execute meaningful, operationally
relevant research in close collaboration with military stakeholders. Hence, while
RL exhibits premise to solve complex decision-making problems, advancing from
theory to impact requires further effort, funding and a common framework to en-
able research collaboration between all stakeholders and requirements of end-users
to function towards advancing the state of research and application beyond current
state.
As pointed out in the previous Section 4.1, it also applies to RL that data should
be viewed as a capability. Collecting data from exercises and enabling RLHF [308] in
military systems would be crucial for major advancements. Therefore, the collection,
processing and storing of data should be mandated in the procurement and upgrade
processes of military systems, as well as the interface requirements to enable learning
from human feedback. In this sense it is important to understand that data is not just
the data gathered from sensors, as shown in Figure 8, but also the human insight
should be gathered.
63
Lauri Vasankari
4.3 Federated Learning
FL, established by McMahan et al. [316], is a machine learning paradigm that en-
ables collaborative model training across multiple decentralized devices or servers
holding local data samples, without exchanging the data. This approach is particu-
larly critical in contexts where data privacy, security, and governance are paramount.
The FL framework addresses three core challenges: it reduces the required commu-
nication overhead and cost as only model parameters are exchanged instead of entire
datasets, it enables mitigating issues with heterogeneity and non-Independent and
Identically Distributed (IID) data, and promotes privacy protection.
As a technology assessment, a comprehensive survey of this rapidly evolving
learning paradigm was conducted. Publication IV, ”Emerging trends in federated
learning: from model fusion to federated X learning,” provides a comprehensive re-
view of the field. While not military specific, the study investigates the progression
of FL from Federated Averaging (FedAVG) [316] to more advanced concepts, estab-
lishing the state-of-the-art and future trends. In its basic form, FedAVG objective is
formulated as
𝑓(𝑤) =
𝐾∑︁
𝑘=1
𝑛𝑘
𝑛
𝐹𝑘(𝑤), where 𝐹𝑘(𝑤) =
1
𝑛𝑘
∑︁
𝑖∈𝒫𝑘
𝑓𝑖(𝑤). (34)
In Equation 34, the global model is updated with the average parameter weights
over 𝐾 clients and a data partitioning 𝒫𝑘 on client 𝑘 ∈ 𝐾, with 𝑛𝑘 = |𝒫𝑘|. The
data partitioning is done in accordance with the amount of data per client, instead of
uniform random sampling, to mitigate the non-IID setting the FedAVG is aimed to
tackle.
Succeeding model fusion techniques, that are elaborated in Publication IV, in-
clude advanced methods for aggregating local client models to create an improved
global model and vice versa, moving beyond the standard FedAVG algorithm. The
goal is to create a more robust and optimal combined model despite the statistical
heterogeneity of client data. Key approaches in model fusion include:
• Adaptive and Attentive Aggregation: These methods assign client contribution
weights based on performance metrics, such as model parameter distance or
accuracy, rather than simply the quantity of client data.
• Regularization Methods: To mitigate the ”client drift” caused by non-IID data,
regularization terms are added to the local or global objectives to constrain the
local training process and enhance convergence.
• Clustered Methods: This approach groups clients with similar data distribu-
tions into clusters and performs model aggregation within each cluster, better
capturing the heterogeneity across the entire network.
64
Machine Learning research areas
• Bayesian Methods: Probabilistic approaches are used for model fusion, such
as aggregating neurons based on, e.g., maximum a posteriori estimation of
global neurons and minimization of Kullback-Leibler (KL) divergence be-
tween global and local distributions to handle the architectural diversity of
neural networks.
• Fairness: To prevent the global model from being skewed towards over-represented
clients, fairness-aware algorithms adjust the optimization objective to ensure a
more uniform distribution of performance gains across all participants.
Publication IV explores the integration of FL with other machine learning paradigms
to create more flexible and powerful systems. This combination allows FL to be
adapted to a wider range of real-world challenges. Federated Transfer Learning
(FTL) and Knowledge Distillation (KD) are used to handle statistical and model
heterogeneity. FTL transfers knowledge between clients with different datasets and
feature spaces, while KD allows smaller client ”student” models to learn from a
larger, more powerful server model, accommodating diverse hardware capabilities.
Federated Multi-Task and Meta learning approaches treat each client as a distinct but
related task, which enables greater personalization. In multi-task learning, Separate
models can be trained for each client with some shared structure between models
exploiting related tasks, while meta learning aims to adapt a model to a new task
by, for example, learning an initial shared model and a meta updating scheme. This
allows for the development of models that can be rapidly adapted to new tasks with
minimal data. Federated unsupervised, semi-supervised learning and RL integrations
adapt the FL framework to scenarios where data labels are scarce or non-existent, as
well as for distributed agent-based learning where clients learn from rewards of their
actions.
While the baseline FL architecture inherently improves privacy by keeping raw
data localized, the exchange of model parameters, i.e., gradients or weights, is not
immune to privacy leakage. Adversaries can reverse-engineer sensitive information
from these updates through model inversion [317] or membership inference attacks
[318]. To robustly counter this, FL is frequently augmented with advanced privacy-
enhancing technologies. Differential Privacy (DP) [319] is a prominent mathematical
framework utilized in FL to provide formal privacy guarantees. By systematically in-
jecting calibrated statistical noise into the local model updates before transmission,
or into the global aggregation process, Differential Privacy (DP) obscures the con-
tribution of any single data point. This ensures that the aggregated model does not
memorize or leak individual records, albeit introducing a trade-off between strict pri-
vacy and model accuracy. Furthermore, cryptographic techniques such as secure ag-
gregation [320] or homomorphic encryption [321] are commonly employed. These
methods allow the central server to mathematically compute the aggregated global
model without ever decrypting or exposing the individual, client-specific updates.
65
Lauri Vasankari
Future directions include on-device personalization, unified benchmarks for eval-
uation, and focus towards unsupervised methods, as these methods require less man-
ual data processing, i.e., labeling, to be effective, as label scarcity is a fundamental
challenge. Other challenges include collaboration of various techniques within the
FL framework, as well as the mitigated but persistent issues with heterogeneity in
both the data, the models and the devices, security and privacy issues, and communi-
cation efficiency. The research points out recommendation systems, healthcare, and,
e.g., open banking as beneficiary real-world applications, but sharing similarities to
military domain due to the impact on stated challenges, FL positions itself as a good
fit for military purposes as well.
4.3.1 FL in military domain
The key characteristics of FL align well with many fields, such as healthcare and
personal devices, but one can argue that the emphasis is possibly the greatest for
military use cases. In a military setting, FL paradigm is highly relevant for train-
ing models on data that is distributed across different units, platforms, or even allied
nations, and where security constraints and policies prevent data centralization and
sharing. Likewise, reduced communication overhead is critical in contested environ-
ments, as military forces face far worse conditions with regard to connectivity than
civilian or enterprise use cases. The military units are usually spread out over a large
theater of operations, where the communications utilize a cascade of methods but are
constantly challenged by the conditions and the very likely by the adversary.
In addition, the military data is at least as heterogeneous as any other real-world
domain, but so are the use cases between different warfare branches, operational
theaters and singular users. To elaborate on previously addressed ML paradigms, FL
can be used, for example, for
• training shared CV target recognition models across allied nations, warfare
branches or units, without sharing classified sensor data
• training RL decision support models in collaboration between several head-
quarters, possibly having also broader supply for RLHF that is explained in
Section 4.4
Hence, the premise of FL in enabling better security and data privacy, handling of
heterogeneous data combined with the possibility for personalized local models and
their presumably elevated applicability to local user needs, as well as the reduction
of communication overhead all serve well-recognized military requirements that pro-
mote secure, robust, and highly relevant ML adaptation through collaboration within
military forces.
As Publication V will propose in the next section, FL is a powerful solution
for developing a state-of-the-art military base model, i.e., a family of shared global
66
Machine Learning research areas
models. The analyzed solutions, such as FTL, KD and meta learning answer the
individual, national or force-specific needs between allied countries while simulta-
neously preserving data privacy and reducing communication overhead. Therefore,
FL offers a framework to provide forces with different features but a common objec-
tive to improve on their individual needs, and FL can be deemed as a potent research
area to enhance the development and adaptation of military AI systems. Likewise,
enabling distributed training can help solve the data scarcity issue to an extent. The
distributed, heterogeneous computational resources and hardware requirements also
support the claim, while simultaneously providing a more robust, distributed archi-
tecture to withstand a possible conflict scenario, in which the adversary is certainly
likely to target concentrated AI capabilities such as large data centers.
From the CRISP-DM perspective, FL can be seen as the collaboration and co-
operation of multiple parallel processes, which all handle their respective data in
their respective systems, but share at least some business understanding or doctrinal
task that can be enhanced by distributing the modeling efforts to all participants. FL
counters several issues from the data understanding to modeling, as it aims to han-
dle heterogeneous, non-IID data, in a distributed manner, but requires governance,
standardization, and infrastructure to be realized effectively.
4.4 Generative Artificial Intelligence
Generative AI is a subfield of ML that aims to produce ML models which generate
new, synthetic data based on their training data. Bishop and Bishop [173] do not treat
GenAI as a single technique but as a family of modeling strategies whose common
goal is to learn a data-generating distribution capable of synthesizing novel sam-
ples, optionally under user control through conditioning variables. This perspective
makes “generation” a probabilistic task: the object of learning is 𝑝(𝑥) or condition-
ally 𝑝(𝑥|𝑐), while the practical question is to define, train, and sample from models
that approximate such distributions well enough to produce convincing data or to
support downstream inference.
This section explores first NLP, as it has a strong link to generative text models
and current trend of utilizing LLMs, after which the focus is directed to generative
models.
4.4.1 Natural Language Processing
NLP refers to processing human language with a computer. Many NLP applications
are based on probability distribution over sequences of words, characters, or bytes,
in natural language. For example, fixed-length sequence based 𝑛-grams [6] are com-
binations of words, or tokens, with 𝑛 tokens for each and they can be used in parallel,
for example 2-grams and 3-grams. In essence, the 𝑛 can be viewed as the context
67
Lauri Vasankari
window from which the context is derived from. These combinations of 𝑛 length are
used to define a conditional probability of the 𝑛-th token, a discrete entity represent-
ing, e.g., a word or a character, given the preceding 𝑛− 1 tokens as displayed in the
training set. The fundamental limitation for this simple approach is the dimension-
ality, where even a large training set and modest 𝑛, most 𝑛-grams will not occur in
the training set, resulting in zero probabilities and non-sensible outputs. This is due
to the sparsity of natural language, where words can be combined in combinatori-
ally vast and creative ways. If a combination is not included in the training data, it
receives a zero probability, failing to generate a sensible continuation. Succeeding
Neural Language Models (NLMs) were designed to overcome the curse of dimen-
sionality problem with a distributed representation of words, called embeddings, not
unlike hidden layers of CNN, representing words as dense vectors in a continuous,
multi-dimensional space.
For neural machine translation, the early approach was to use an MLP for the
common input-ouput training, so that the model would translate the received input
into the target language. The problem is that for an MLP the sequences have to be of
fixed length, which is gravely suboptimal for natural language. However, Recurrent
Neural Network (RNN) have the ability to accommodate variable input and output
lengths, as the recurrent network outputs the next token based on the certain slid-
ing window that it uses to process the input [6]. For natural language, an RNN or
CNN can be used to encode an input sequence to capture the context, which is then
decoded into the target language with an RNN. This approach also requires a fixed-
size representation and it is difficult to grasp all semantic details of a long input. The
alternative is to produce the output in the sequential manner described, but focusing
on different parts of the input for each output to consider the semantic details effec-
tively. This mechanism, where different parts of the input affect the output, is called
attention, essentially a weighted average over feature vectors and weights associated
with each input position [6; 322; 9]. An early transformer-based solution that utilizes
self-attention was Bidirectional Encoder Representations from Transformers (BERT)
[323], which improved for example translation accuracy considerably, and acted as
an early implementation of the larger generative language models that are addressed
below.
4.4.2 Generative Models
The conceptual base for GenAI is latent-variable modeling. With discrete latent
variables Bishop and Bishop [173] show how marginalizing over hidden assignments
yields mixture distributions. The hidden assignment is the process of assigning each
individual data point to one of the discrete categories it has learned. When using
continuous latent variables, however, the conceptual framework shifts to a manifold
viewpoint. This approach posits that complex data, such as realistic images, does
68
Machine Learning research areas
not fill its high-dimensional space randomly but is concentrated on a much simpler,
smoother surface, i.e. a manifold, with the continuous latent variables serving as a
coordinate system for this underlying structure.
GAN is an nonlinear latent-variable approach, where the generator transforms la-
tent noise 𝑧 ∼ 𝑝(𝑧) into data space while a discriminator provides the training signal
by learning to distinguish real from synthesized examples. The generator learns ad-
versarially by aiming to minimize the difference detected by the discriminator [173].
While GANs do not typically offer a tractable likelihood, their implicit distribution
and adversarial objective can yield high-fidelity samples.
The second approach, normalizing flows, constrains the generator to be invert-
ible, allowing exact log-likelihoods via the change-of-variables formula 𝑝𝑥(𝑥|𝑤) =
𝑝𝑥(𝑔(𝑥|𝑤))| det 𝐽(𝑥)|, where 𝐽(𝑥) is the Jacobian matrix of partial derivatives
𝜕𝑔(𝑥,𝑤)
𝜕𝑥 [173]. It enables straightforward maximum-likelihood training without like-
lihood approximation. The latent and data spaces must then have the same dimen-
sionality, placing a structural cost on the approach. The third alternative approach
includes autoencoders and Variational Autoencoder (VAE). Autoencoders are a use-
ful, non-probabilistic precursor to VAE, which restore a generative interpretation by
learning both an approximate posterior 𝑝𝜑(𝑧|𝑥) and a generative model 𝑝𝜃(𝑥|𝑧). The
final approach, diffusion models, also known as denoising diffusion probabilistic
models, inverts a multi-step corruption process. In the process, data is gradually per-
turbed towards a simple noise distribution and a NN is trained to reverse this process
step by step. It can be viewed as a hierarchical VAE with a fixed encoder defined
by the noise process and a learned decoder, and its training is stable and scales ef-
fectively on parallel hardware. Diffusion models are currently the state of the art in
image domains.
While diffusion models are the current state of the art for images, transformer
based LLMs are the text equivalent. Text models, specifically autoregressive trans-
formers, are generative in the strict probabilistic sense. These models factorize a
sequence distribution into token-level conditionals and sample one token at a time,
where token is the mathematical representation of a word or a syllable, in this case
[173]. LLMs are generative models over discrete token sequences. Given a prefix,
denoted 𝑥1:𝑡, the LLM defines a conditional distribution 𝑝(𝑥𝑡+1|𝑥1:𝑡), which means
that the probability for the next token 𝑥𝑡+1 is calculated based on the preceding to-
kens from 1 to 𝑡. The model is trained to maximize the likelihood of the next token
across the vast corpora it has been trained on. The sampling from the learned distri-
bution outputs open-ended text, which can be conditioned on prompts as displayed
in practice with the literature review analysis in chapter 2.
A decisive architectural shift was brought by the introduction of a self-attention
mechanism [9], which replaces recurrence with parallelized attention over token
positions and adds positional encodings to retain word order. Decoder-only trans-
formers implement the autoregressive factorization used for text generation while
69
Lauri Vasankari
encoder-decoder variants handle conditional generation such as translation and sum-
marization within the same attention framework. Still, the core learning signal is
next-token prediction.
The tokenization bridges raw text and model inputs by providing said repre-
sentations of words or other inputs. The design choice of tokenization, for exam-
ple between byte-pair encoding [324; 325] or SentencePiece [326] balance open-
vocabulary coverage with manageable vocabulary sizes, and shape the discrete space
over which the distribution is learned.
Empirically, increasing model size, data, and compute improves LLM perplexity
and downstream generalization, as explained by scaling laws [327]. Scaling, along
with broad, self-supervised pretraining, has resulted in foundation models. These
models are systems trained on broad data and at scale, which has enabled adapta-
tion to many tasks through prompting or lightweight fine-tuning. Regarding founda-
tion models, the transformers with attention mechanism are the currently dominant
paradigm [173].
The trained, raw LLMs optimize the said likelihood of the next token, not usabil-
ity. To enable a system that can, for example, follow guidelines and execute tasks, the
pretrained model needs post-training stages such as instruction tuning or multitask
prompted training on instruction-formatted datasets [173].
Likewise, RLHF [308] can be used to align the model with human preferences
over outputs for which the base model is optimized. The core of RLHF relies on
training a reward model 𝑟𝜑(𝑥, 𝑦) parameterized by 𝜑, which takes a prompt 𝑥 and
a generated response 𝑦 to output a scalar reward. In their seminal work, Christiano
et al. [308] followed Bradley-Terry (BT) model, formulating the cross-entropy loss
between the predictions and human labels as
ℒ𝐵𝑇 (𝜑) = −
∑︁
(𝜎1,𝜎2,𝜇)∈𝒟
𝜇(1) log𝑃 [𝜎1 ≻ 𝜎2] + 𝜇(2) log𝑃 [𝜎2 ≻ 𝜎1], (35)
where
𝑃 [𝜎1 ≻ 𝜎2] = exp
∑︀
𝑟(𝑜1𝑡 , 𝑎
1
𝑡 )
exp
∑︀
𝑟(𝑜1𝑡 , 𝑎
1
𝑡 ) + exp
∑︀
𝑟(𝑜2𝑡 , 𝑎
2
𝑡 )
,
and 𝑟 is a preference-predictor,𝒟 is a database of triples (𝜎1, 𝜎2, 𝜇), where 𝜎1 and 𝜎2
are the two segments or sequences of observations 𝑜𝑡 and actions 𝑎𝑡 over a segment
of time as Christiano et al. [308] applied this on a continuous RL environment, and
𝜇 is the distribution over {1, 2} that indicates which one the user preferred.
To improve training efficiency and prevent overfitting in LLMs, modern ap-
proaches like InstructGPT [328] generate 𝐾 different responses for a single prompt.
Given a dataset where a human annotator ranks these 𝐾 responses, the reward model
is trained using a pairwise ranking loss, denoted here as ℒ𝑟𝑎𝑛𝑘. The loss function is
formulated to evaluate all
(︀
𝐾
2
)︀
pairs for a given prompt, maximizing the difference
70
Machine Learning research areas
in expected reward between the preferred response 𝑦𝑤 and the rejected response 𝑦𝑙
in each pair:
ℒ𝑟𝑎𝑛𝑘(𝜑) = − 1(︀𝐾
2
)︀E(𝑥,𝑦𝑤,𝑦𝑙)∼𝐷[log(𝜎(𝑟𝜑(𝑥, 𝑦𝑤)− 𝑟𝜑(𝑥, 𝑦𝑙)))], (36)
where 𝐷 is similarly the dataset of human comparisons, 𝑟𝜃(𝑥, 𝑦) is the scalar
output of the reward for prompt 𝑥 and completion 𝑦 under parameters 𝜃, while 𝑦𝑤
is the preferred completion between 𝑦𝑤 and 𝑦𝑙. Once the reward model is trained, a
reinforcement learning algorithm uses this reward signal to fine-tune the LLM policy
to generate responses that maximize human preference.
To avoid repetitive and dull responses from the LLM, stochastic truncation meth-
ods such as top-𝑘 and top-𝑝 sampling are used to improve diversity by sampling
among the probable region of the distribution. This can be controlled by alternat-
ing temperature parameter, which controls the randomness of the output, repetition
penalties and other constraints.
Furthermore, the responses can be grounded using Retrieval Augmented Gener-
ation (RAG) [329], which is also explored in the context of military applications in
Publication V. RAG is a paradigm that mitigates hallucinations [22], i.e., erroneous
generation, and data staleness. Formally, RAG consists of two core components: a
retriever 𝑝𝜂(𝑧|𝑥) with parameters 𝜂, and a generator 𝑝𝜃(𝑦|𝑧, 𝑥) with parameters 𝜃.
Given an input query sequence 𝑥, the retriever searches a large document index to
return a set of top-𝐾 relevant text chunks, denoted as 𝑧. The generator then condi-
tions its output sequence 𝑦 on both the original query 𝑥 and the retrieved context 𝑧.
The probability of generating a target sequence 𝑦 is approximated by marginalizing
over the highest-scoring retrieved documents:
𝑝(𝑦|𝑥) ≈
∑︁
𝑧∈top-𝐾
𝑝𝜂(𝑧|𝑥)
𝑁∏︁
𝑖=1
𝑝𝜃(𝑦𝑖|𝑥, 𝑧, 𝑦1:𝑖−1). (37)
By injecting external, verifiable knowledge 𝑧 into the generative process, RAG
allows the model to utilize tools such as external Application Programming Inter-
faces (APIs) for lookup, calculation, and code execution. This anchors the prob-
abilistic text generation closer to the source data without requiring other measures
such as retraining the parameter weights 𝜃.
4.4.3 GenAI in military domain
GenAI represents a recent and significant development in the field of AI, character-
ized by models capable of creating novel content and following instructions rather
than simply analyzing or classifying existing data. These models, most notably foun-
dation LLMs built on the transformer architecture, can produce and process human-
71
Lauri Vasankari
like text, images, code, and other forms of data. This has led to a proliferation of
applications that are rapidly altering both civilian and military domains. In the mil-
itary context, the potential of GenAI extends from enhancing intelligence analysis
and report generation to creating synthetic training data and augmenting command
and control systems.
This dissertation explores this disruptive technology by identifying its current tra-
jectory and potential impact on defence. Publication V, ”GenAI in Military: Trends
and Opportunities”, provides a dedicated analysis of this topic. The structured re-
view of recent literature (2022–2025) on GenAI in military applications reveals a
significant disconnect between state-of-the-art academic and private innovations and
practical defence adoption. While GenAI shows clear potential to enhance, for ex-
ample, decision-making, simulation, and cyber security, current research is largely
experimental and highlights two major obstacles: reliance on partially unsuitable
proprietary models and lack of military-specific resources and infrastructure, such as
datasets and military-secure computation capacity. To bridge this gap, a collabora-
tive approach to developing a family of military-specific base models using federated
learning is proposed as a key path forward.
Analysis of recent application studies shows a dual trend in military GenAI re-
search. The first involves leveraging large, proprietary, off-the-shelf models (e.g.,
GPT-4) for high-level decision-support tasks. These models have been used to ac-
celerate COA generation and to create multi-agent simulations for analyzing histor-
ical battles and strategic conflicts. The second trend involves fine-tuning smaller,
open-weight models for narrow, domain-specific tasks, such as military equipment
entity extraction and automated data tagging for cybersecurity frameworks. Concep-
tual research complements these applications by proposing strategic frameworks for
GenAI’s role in military competition, cross-domain ethical principles, and collabo-
rative architectures.
A primary finding is the disconnect between the cutting edge of GenAI research
that is driven by industry and academia breakthroughs like the transformer with at-
tention mechanism itself, MoE architectures, unsupervised reinforcement learning,
and models capable of test-time learning, and their limited adoption in military stud-
ies. This gap is largely due to significant operational, ethical, and security barriers.
The analysis identifies two major obstacles to successful GenAI adaptation in the
military:
• Reliance on proprietary models: Large proprietary models, while powerful,
are unsuitable for critical military use as-is. They lack explainability due to
their vast size and undisclosed training data, and they exhibit undesirable be-
haviors, such as a tendency toward escalation in simulations, because they are
not necessarily aligned with military doctrine.
• Lack of resources and infrastructure: The immense data and computation re-
72
Machine Learning research areas
quirements for training foundational models from scratch are significant bot-
tlenecks. Even when fine-tuning smaller models, their performance falls short
of larger systems, and their applicability is limited. This is compounded by the
scarcity of military-specific datasets and benchmarks for proper evaluation.
These limitations do not concern large, technologically advanced nations in the
same manner it concerns smaller ones or those lacking sovereign computation and
data resources. For example, Anthropic has deployed Claude Gov models for na-
tional security customers in the United States [330], stating that these were built on
direct feedback from government customers. Similar statements have been given
by other large AI companies including OpenAI, Google and Meta [331; 332; 333].
While this highlights the inherent issues of proprietary, off-the-shelf models, the dis-
played service and supply of governmental models is only convenient for a large
customer with national hyperscaler companies to provide the services and deploy-
ments without risking sovereignty. Both of the aforementioned obstacles hold for
nations with more limited resources and technological sovereignty.
From the CRISP-DM perspective, the obstacles can be perceived from multi-
ple points of view. It can be argued that the current process of creating capable
GenAI models with readily available internet data inherently lacks the operational
understanding, the data understanding, the modeling insight and perhaps foremost
the evaluation. While these models have been proven useful in military tasks as well,
the exhibited military understanding is shallow, starting from the corpus and tactical
insight that exists in the available training data. To improve such models, military-
specific data sets for base or foundation model training and instruction-tuning cannot
be fully produced on behalf of military forces, as the required data has to serve the
desired tasks, the proper corpus and the thought-through military intent.
To overcome these obstacles, the development of a family of military-specific
base, or foundation, models, such as a ”NATO base model”, is identified as a top pri-
ority for allied nations. This would require collaboration with industry and academia,
as well as investment in data, computation, and personnel. A key enabler for this is
FL, which is proposed as a secure, system-level architecture for training LLMs col-
laboratively. Under an FL paradigm, allied nations could train a shared global model
using their own local, sensitive military data without ever exposing the data itself.
This approach would address critical issues of data privacy and ownership while re-
ducing communication overhead and issues with heterogeneous data, enabling the
creation of robust, client-specific military-grade AI capabilities. In the best case
scenario, military-specific foundation model could be utilized to serve narrow and
user-specific problems with greater efficiency than aiming to leverage larger, general
models in a variety of tasks.
Additionally, the instruction-tuning data should be generated and stored in oper-
ation. This means that military forces would require information systems that store
73
Lauri Vasankari
the data that leads to a conclusion to create proper instruction-output datasets. For
a simplistic example, within a particular MDMP, there are certain inputs that lead
to certain types of COAs. In order to enhance the COA preparation, the data from
inputs to suggested alternatives for actions should be stored. This, of course, neglects
to an extent the chance to produce greater variance and makes the model converge
towards a certain way of tactical or strategical ”thinking”. To enable further devia-
tion, the systemic approach can be split into two parallel approaches: a legacy-based
approach that aims for feasibility according to known doctrines, and exploratory ap-
proach that aims to leverage at least partially different knowledge-base to generate
more ungrounded solutions.
4.5 Ethical considerations regarding AI systems
In the wake of wide-spread adaptation of AI-based or -enabled systems in the military
the ethics of automation in warfare have been considered. As shown in chapter 2, the
regulatory field is still in its infancy, and legal guidelines are scarce. However, West-
ern nations widely share the idea presented in U.S. Department of Defense [124]: a
human must be responsible for the use of lethal force. This has been echoed in other
nations as well, for example, in the Finnish government defence report, which states
that ”While the decision to use lethal force must always be made by a human, the
appropriate level of human involvement will not in all situations require on-line com-
munications connection if responsible behaviour is ascertained in other ways.” [334].
The very same report declares that ethical and juridical challenges of artificial intel-
ligence must be resolved. Despite the consensus on the matter, the outspoken intents
have not driven other ethical guidelines apart from the dictated human oversight.
The form of ”meaningful human control” has been debated and researched [335],
but propositions in general still share the idea that human control and oversight are
fundamental concepts for the safe deployment of autonomous weapon systems. For
example, the meaningful human control is perceived to include three components:
that humans make informed, conscious decisions, that humans have sufficient infor-
mation to ensure compliance with requirements of law, weapons, and context, that
the weapon itself is designed and tested in realistic operational environment, and that
humans are properly trained to ensure the judicious manner [336].
Publication VI claims that the use of AI in military systems creates a fundamen-
tal ”Catch-22”-like paradox regarding reliability and human oversight. This dilemma
places human operators in an untenable position, making them responsible for sys-
tems that are designed to outperform them. A proposed solution involves an ideolog-
ical shift in the human-machine relationship, which reframes the human’s role from
one of scrutiny to one of support.
The central paradox emerges from two conflicting requirements:
• The need for perceived reliability: For an autonomous system to be deployed
74
Machine Learning research areas
and trusted, it must be perceived as exceptionally reliable, to the point that it
surpasses human judgment.
• The requirement for human oversight: Legal and ethical mandates, such as the
U.S. Department of Defense Directive 3000.09, require that a human operator
exercise ”appropriate levels of human judgment over the use of force”.
This creates a paradox: the perception of high reliability needed for deployment
undermines the rigorous questioning required for meaningful human oversight. An
operator is unlikely to question a system that seems to operate flawlessly in most
cases. This leaves the human operator in an impossible role: either they apply too
much distrust and nullify the AI’s performance advantage, or they apply too little
and become a ”surrogate scapegoat” for the system’s errors. This results in a critical
failure in the deployment and evaluation stages of CRISP-DM, but not from a tech-
nical perspective. The current model for deployment, requiring human oversight, is
flawed as it misunderstands the operational goal of leveraging superhuman perfor-
mance instead of replicating human performance. Likewise, the evaluation stage is
flawed if it focuses on the AI system instead of the actual human-machine team that
creates the operational capability in cohesion.
The root of this dilemma is identified as the ”perceived sanctity of human intel-
ligence”. This analysis contrasts the ideal of ”humane” action with the reality that
human cognition is flawed by biases, emotion, and fatigue. A properly programmed
AI, free from these distortions, could potentially adhere more objectively to ethical
guidelines. Furthermore, machines are often held to a much higher standard of flaw-
lessness than people, with every autonomous accident receiving harsh scrutiny while
widespread human errors are more accepted.
To resolve this dilemma, an ideological shift in the human-AI relationship is
proposed, outlined in a three-step model:
• Acknowledging the limitations of human cognition
• Aligning idealistic expectations with realistic AI capabilities, basing accep-
tance on measurable improvement over the human baseline
• Shifting focus from the ”means”, i.e., technology, to the ”results”, i.e., the
improved outcomes
This shift reframes the interaction from the human oversight of a machine to hu-
man support for an intelligent system. In this model, the human’s primary role is not
to second-guess the AI’s every decision but to cover its ”blind spots” by providing
additional, context-rich information that the system might lack. This process creates
a feedback mechanism not very distinct from RLHF, but one performed during infer-
ence with a direct impact on the current output. The critical question for the human
becomes: ”Does he or she know something the system does not?”, i.e., what are the
limitations of the data pipeline and how can that impact the system. This approach
75
Lauri Vasankari
aims to leverage the distinct strengths of both human and machine intelligence for
more effective and genuinely ”humane” outcomes.
It can be argued that the very notion of putting humans to safeguard systems that
are supposed, and expected, to outperform them in the same task is more idealistic
than realistic. The approach, by design, hinders the ability to use the system to the ca-
pacity where it exceeds human performance: it can be said to strap the benefits of AI
and ML to the maximum performance enabled by humans. The more beneficial, and
thus arguably favorable from military perspective, approach is to recognize the limits
of both, the human and machine, and enable both of these to contribute to the task in
maximum respective capacity. As such, the suggested solution also hints guides to
accept the imperfectness of all systems and models, which allows for more transpar-
ent and realistic conversation regarding AI systems such as autonomous drones and
AI agents. This approach could also enable advancing the regulatory and juridical
aspects without the risk of falling behind in the so called AI arms race.
4.6 Testing, evaluation, validation, and verification
Test, Evaluation, Validation and Verification (TEVV) practices have been a founda-
tion for many fields, from robotics to software engineering, and it is just as crucial for
ML and AI. Within the CRISP-DM framework, it is confined to the evaluation step,
but the complete process includes a variety of angles in addition to mere evaluation
and testing of a particular ML model [337].
The testing, as in ML training, is usually done by executing the model on a set
of inputs to enable analyzing the resulting outputs. In evaluation, the empirical test
results are reviewed against a testing criteria to assess the performance of the model
from quantitative and qualitative perspectives. After evaluating metrics, verification
aims to demonstrate that the system conforms to specifications, i.e, that is was ”built
right”, and validation that it fits to the intended operational context. Overall, TEVV
is not a single event but a continuous process that aligns with, e.g., CRISP-DM. As
a simple example, an inference model can be used to classify a test set of images to
produce empirical evidence, which is then analyzed with testing criteria, and then
verified and validated for the intended context.
As an ML specific TEVV twist, deviating from traditional software, ML learns
patterns from data, and as a result the data is an integral part of the specification.
Quality, representativeness, provenance, and lineage must be tested and documented
for proper TEVV. The ML performance is distribution dependent and systems em-
ploying ML models may fail under distribution shift, as with different sensors, weather,
and deception. Likewise, the calibration of out-of-distribution detection for anomaly
detection and mitigation is important.
AI and ML solutions also bring about new threat vectors to systems, for example
adversarial examples [294], data poisoning [220; 338], and inverse attacks [317].
76
Machine Learning research areas
Likewise, cumulating new data and updating models invalidates prior assurance or,
at least, provokes re-evaluation and testing.
Drawing from this widely applied TEVV practice with a long history in other
technical fields there are currently multiple, complementary standards and policies
that anchor TEVV for AI:
• NIST AI Risk Management Framework (AI RMF 1.0) [339] provides out-
comes and practices to drive trustworthy AI, which ought to be valid, reliable,
safe, secure, resilient, explainable, privacy-enhanced, fair and harmful bias
managed, and explicitly recommends testing and monitoring across the lifecy-
cle with a risk-based approach.
• ISO/IEC 23894:2023 [340] gives process guidance for AI-specific risk man-
agement (hazard identification, risk analysis/evaluation, treatment, and moni-
toring), integrating TEVV evidence into risk decisions.
• ISO/IEC 42001:2023 (AIMS) [341] establishes an AI management system
standard that institutionalizes governance, role ownership, and continuous im-
provement, providing organizational scaffolding for repeatable TEVV.
• EU Artificial Intelligence Act (2024) [24] requires risk management, high-
quality data governance, technical documentation, transparency, human over-
sight, and post-market monitoring for high-risk systems; conformity assess-
ment hinges on TEVV evidence.
• Defence acquisition policy DoDI 5000.89 [342] defines Test and Evaluation
(T&E) policy across acquisition pathways, while DoDI 5000.98 [343] gov-
erns Operational Test and Evaluation (OT&E) and live-fire T&E, emphasizing
science-based Test and Evaluation Master Plan (TEMP) or T&E strategies, se-
quential testing with Bayesian or similar inference methods, and integration of
developmental, cyber, and operational evidence.
These frameworks converge on a consistent view that TEVV must produce au-
ditable claims, arguments, and evidence that a system is sufficiently safe, effective,
fair, secure, and reliable for its intended use, but defence AI is mainly excluded. As
pointed out in chapter 2, the laws and regulations for defence AI are still in infancy
and close to non-existent, apart from certain guidelines for LAWS. For example, the
EU AI Act [24] rules out military AI. TEVV for defence AI has to draw applicable
procedures and solutions from the existing framework and adopt it to the military
and defence context.
In order to execute TEVV for defence AI, the operational context has to be de-
fined so that, e.g., the intended environment, platform, sensors, conditions, and hu-
man roles are considered, as well as the current doctrine, CONOPS, and possible
SOPs. The test design and scenario coverage has to rely on this thorough context,
simultaneously enabling all the TEVV steps.
77
Lauri Vasankari
4.7 Field observations from Ukraine
Author spent ten days in Ukraine, Kyiv region, between 15th and 25th of July 2025,
meeting and collaborating with both local and international entrepreneurs, military
personnel, academia and defence technology startups. The main purpose of the visit
was a startup venue participation, which aimed to connect novel ideas to battlefield
needs as well as suppliers to procurement managers.
While most of the details cannot be disclosed, there were several effecting obser-
vations regarding the use of AI and ML models.
First and foremost, the perceived use of GenAI capabilities was nonexistent. The
main focus of effort was in CV, GNSS independent navigation, and tracking. As
the current frontline combat is dominated by drones, electronic warfare capabilities
have become extremely important. This is due to the fact that the drones are piloted
by soldiers, and the most cost-effective countermeasure is to sever the connection
with, for example, jamming. Same applies to GNSS based navigation for long range
targets: jamming GPS signals is relatively easy even on large scale.
Due to these combat realities, the main focus for drone development was on
GNSS independent navigation, which is usually based on CV methods of identifying
the map location from the visual information available. This usually requires upload-
ing satellite mapping or drone imagery from the area of operations before hand, so
that the drone can perform cross-checking navigation on the edge, without connectiv-
ity back to a pilot. Essentially, the idea is to compare the drones camera view to the
uploaded map, and infer the most probable location. The accuracy can be enhanced
with sporadic GNSS signals (fixes) or ground beacons that enable triangulating the
position.
For both situational picture compilation and engagement, the automatic identi-
fication of targets is an essential task. All the companies met that were developing
automated tracking and engagement properties relied on YOLO models, or possi-
ble derivatives and alternatives, that were tuned for the particular purpose, such as
engaging enemy drones. After classifying and identifying a target, the drone could
approach it by aiming to keep it in the center of the field of view while proceeding
directly towards the camera direction. In an interception mode, trying to catch a fast
target mid-flight, a simple heuristic can be used. For example, at sea, a collision is
deemed visually imminent if another ship is in the starboard or port quarter and the
vector to that ship does not change over time. Similar heuristic can be applied to the
drones, to keep the target in a fixed angle to ensure a simple optimization to collide
with the target.
Only a few companies stated that they focus on collecting data from the field
and utilizing that in their AI or ML efforts. This seemed like a highly sensitive
topic, so the analysis is largely speculative. However, simultaneously, the cooper-
ation between the industry and frontline units seemed seamless but heavily siloed.
78
Machine Learning research areas
The coordination of efforts, to avoid overlap and having a concentrated effect, was
not imminent. Simultaneously, despite its tragic nature, the conflict itself is the most
prominent field to gather operationally relevant data for AI development. The config-
urations of drone pilot stations did support collecting the video data, for example, but
it was also indicated that the data is not exploited afterwards, at least in a coordinated
manner.
While these observations are bound to cover only a fraction of the activities on-
site, both spatially and temporally, they still confirm certain findings of this dis-
sertation. First, ML development relies heavily on transfer learning by leveraging
pretrained foundation models, especially in CV, and fine-tuning those to perform in
a military-specific task such as navigation or engagement. Second, the integration
from state-of-the-art breakthroughs to battlefield deployment is not a plug-and-play
fit. Third, all of the observed development and innovation was focused on identified
bottlenecks in current operations echoing a classic weapon-countermeasure devel-
opment cycle. This is most likely due to the reality of the ongoing conflict, where
innovations stem bottom-up from concrete problems on the battlefield and tactics.
While understandable and effective on the battlefield for immediate survival, real-
izing a broader revolution in AI enhanced warfare requires systemic, far-reaching
efforts that aim to integrate and transform the higher levels of the military informa-
tion system of systems.
4.8 Summary of Findings
Across all the analyses, the decisive factor is not a specific algorithm but the system
around the model: data itself, data pipelines (including feedback), interoperable sim-
ulators and tooling, privacy-preserving collaboration, non-IID data, heterogeneous
environments and use cases as well as evaluation in relevant environments. When
these are left to little consideration the results are meager and even strong AI mod-
els underperform. When these factors are considered and amplified, the researched
paradigms (CV, RL, GenAI, FL) can be combined to deliver comprehensive military
capability with a decisive impact. As the fundamental aspects and root causes are
similar for each paradigm, this finding can be generalized to consider all existing
and emerging AI paradigms and research areas. The unifying imperative is to treat
data as a first-class capability and engineer the end-to-end learning loop displayed in
Figure 9.
The dominant constraint through all research areas is data quantity, quality and
availability. CV in sonar results, RL in littoral warfare as well as overall decision
support systems, and GenAI for defence all converge on the same bottleneck: scarce,
fragmented, or unsuitable data. Therefore, data capability is the central thesis. As
noted in Subsection 3.2.2, a capability is defined not as a platform or system itself,
but as the ability to execute a military task and generate fighting power. While data
79
Lauri Vasankari
Refine
Data
Operational feedback
Collect
and
Store
Curation,
Annotation,
Insight
Train
or
update
Evaluate
Sensors
External
Data
Platform
Embedded system
AI model Inference
User User interface
Data storage
Deploy
R
egistry
Figure 9. The Learning Loop
is simply an inert asset and cannot be defined as a capability in itself, the concept
of a data capability transforms it into an active military function. A data capabil-
ity denotes the systemic ability to collect, govern, and exploit data. Data capability
should enable the execution of algorithmic tasks and the continuous improvement of
systems, or systems of systems, with AI. In operational terms, this data-driven, con-
tinuous improvement translates into strategic, operational, and tactical superiority, a
goal framed in contemporary discourse as closing the kill chain [163], compressing
the OODA loop, and securing a decision advantage.
It has also been suggested that the data quality is more crucial than data quantity,
as well curated, carefully crafted small data sets can yield impressive results [344;
345; 346] despite training with considerably fewer samples. As an adversarial result,
threats such as data poisoning are also feasible with very small, targeted data sets,
that at least for LLMs are able to bypass safety measures and enables the model to
comply with harmful requests [338].
All paradigms and research areas require feedback signals, whether labels, re-
wards, preferences or federated gradients. This creates concrete requirements for
systems to establish data capability, specifically the infrastructure to collect opera-
tional data and train AI models. Novel warfare and military information systems de-
picted in Subsection 3.2.5, from combat to managerial functions and from technical
to strategic levels, should be designed to support the accumulation and exploitation
of high quality data, preferably with annotations and other metadata, at scale. Ex-
panding the observations regarding the data, evaluation requires benchmarks. For
example, foundation models in GenAI are measured over various benchmarks, but
military requires its own benchmarks to test both proprietary models as well as pos-
sible own developments in the field, in a conceptually similar manner to traditional
military field tests, though the validation of intelligent, probabilistic systems is con-
siderably more complex than testing conventional military systems such as kinetic
assets.
On the other hand, while benchmarks or other test sets provide a shortcut to eval-
80
Machine Learning research areas
uate model performance, it may also create a spiral of convergence of solutions, as
the benchmark results create requirements that need to be met and preferably sur-
passed, as is intended. Recently Gu et al. [347] pointed out that medical benchmarks
seem to be measuring wrong things, and that LLM/LRM performance is very brittle.
Their paper worryingly notes that LLMs provided correct answers even when re-
moving critical parts of the input such as images. This indicates that the models have
learned the answers ”by heart”, i.e., via rote memorization, instead of understanding,
a test taking strategy that is shunned in human higher education.
The TEVV and benchmark approach needs to be carefully considered for each
use case, especially with regard to high level decision support, as the other upside of
applying AI consists of creating novel solutions. In the field of defence, novel solu-
tions in, for example, decision making may create considerable tactical, operational
or even strategic edge over the adversary. If the AI models are constraint to a certain,
narrow distribution of known solutions, the evaluation ensures that novel solutions
cannot be encouraged by the deployed AI systems. The tradeoff between measur-
able certainty of outcomes against the unforeseeable benefits of variance needs to
be carefully considered, preferably implementing the solution on two parallel rails:
the approved go-to solution and the possible, creative solution that stems from the
margins of the probability distribution.
Regarding other weaknesses, distribution shift is a shared issue over all research
areas. CV brittleness, RL generalization failures, GenAI erroneous generation, and
FL client drift all stem from mismatch between training, deployment and use. Aug-
mentation, self-supervision, and uncertainty estimation are crucial elements in bridg-
ing AI capabilities to operational reality. Additionally, the infrastructure is lacking.
RL requires simulators, CV and GenAI need curated data in vast quantities or impec-
cable quality, as well as metadata schemes, while FL requires secure orchestration
as well as network and device management. Lack of standards blocks multi-nation
collaboration and repeatable evaluation, as seen with the absence of unified military
benchmarks.
Rapid adaptation via transfer learning and fine-tuning generally outpaces the
building of narrow base models in all fields, provided that the expertise, domain data
and infrastructure exist. Likewise, explainability and trust are universal requirements
for all research areas, as users require trust, confidence calibration and rationale. Oth-
erwise, systems stall at proof-of-concept level. It is proposed that human–machine
teaming should be designed for support rather than brittle “oversight”, which neces-
sitates trust calibration but also insight to the AI systems functioning. This necessi-
tates technical understanding and expertise for the personnel, so that the advanced AI
systems are not perceived as black box solution automatons but rather as inference
machines that operate under certain limitations, requirements and constraints.
On a final note, the influential essay from Richard Sutton, ”The Bitter Lesson”
[348], posits that the two methods that ”seem to scale arbitrarily” are search and
81
Lauri Vasankari
learning. He argues, with the academic weight of being one of the most celebrated
pioneers of RL, that sophisticated solutions that mimic human knowledge or cog-
nition usually fail, in time, against search and learning backed up by large enough
computation capacity. The essay highlights ”the great power of general purpose
methods” as the key takeaway from the 70 years of history in AI and ML. An ob-
vious counterargument would highlight the transformer architecture and its current
triumphs, but it can also be described as a novel way to search vast knowledge bases
after learning a comprehensive representation of them. Suttons idea underpins the
example laid out in Figure 9 and the conclusions of the following chapters: there has
to be data to learn from, and there has to be an environment to search within. What-
ever the algorithm or the model architecture, these principles hold, and according to
Sutton, in due time given computation, robust data capability, and suitable simulation
environments, it is inevitable that general methods to scale into the next generation
of AI achievements.
82
5 Contribution of this thesis
The research presented in this dissertation addresses the challenge of developing and
integrating machine learning capabilities within the military domain. The contribu-
tions are not confined to a single technical area but span conceptual, technical, appli-
catory, and ethical dimensions and considerations, backed up by field observations
in Ukraine described in Section 4.7.
5.1 Publication I: Deep Mix: AI in Littoral Sonar Oper-
ations
5.1.1 Summary
Publication I investigates the application of deep learning techniques to enhance
naval MCM operations in the challenging littoral environment of the Baltic Sea. The
primary goal is to automate the detection and classification of underwater objects
from Side Scan Sonar (SSS) imagery, aiming to reduce the workload on human op-
erators and speeding up the overall mine hunting process. The authors explore the ef-
fectiveness of several AI models, including CNNs and ViT, for this task. A key focus
of the research is the development of a MoE system, which combines the strengths
of multiple models to improve classification accuracy. The paper also emphasizes
the importance of explainable AI (XAI), using methods like Grad-CAM to provide
visual feedback to operators, ensuring the system is transparent and operationally
useful.
5.1.2 Methods and Data
The research utilized a unique, operationally relevant dataset provided by the Finnish
Navy, consisting of 1,299 processed side-scan sonar images collected with a Klein
5500 system. The images are categorized into four classes:
Mines, Mine-like Contacts (MILCOs), Rocks, and Wrecks. Due to the limited
size of the dataset, data augmentation techniques were employed to expand the train-
ing set.
The study’s methodology involved several machine learning models:
• Baselines: A SVM and a RF were used as classical baseline models for per-
83
Lauri Vasankari
formance comparison.
• Deep Learning Models: The core of the research involved transfer learning
with pre-trained models, specifically the VGG16 and VGG19 CNN architec-
tures and a ViT (ViT-B/16).
• MoE: The primary contribution was testing an MoE framework that combines
the outputs of the best-performing models (VGG16, VGG19, ViT, and SVM)
using a gating network to weigh their predictions to improve overall accuracy.
The models were evaluated on two classification tasks: a four-class problem and
a three-class problem where the ’Mine’ and ’MILCO’ classes were combined to
reflect operational procedures.
5.1.3 Results and contribution
The experimental results demonstrated the difficulty of the classification task, with
individual models achieving modest performance due to the challenging nature and
limited size of the dataset. The Vision Transformer (ViT) generally outperformed
the VGG models and the baseline SVM.
The main contribution of this work is the successful application of the MoE
framework, which significantly improved classification performance over any sin-
gle model. The best-performing configuration, a three-expert MoE (MoEv3) on the
three-class task, achieved an overall accuracy of 73.29%. This result, while lower
than accuracies reported in studies using cleaner or synthetic data, is significant given
it was achieved using real-world, operational data from the cluttered Baltic Sea en-
vironment.
Furthermore, the paper successfully demonstrated the feasibility of adding an
explainability tool for operators. By using Grad-CAM and attention heatmaps, the
system can visually highlight the regions in a sonar image that led to a particular
classification, providing valuable and transparent decision support for MCM oper-
ations. This research is reportedly the first to apply a transformer architecture and
an MoE framework to mine detection using real-world sonar data from this specific
region.
5.1.4 Author’s contribution
Author was in charge of the research project and the methodology, contributing to
model implementation and evaluation as well as writing, editing and submission of
the final report. Author also applied for, in cooperation with professor Heikkonen,
the research funding, which enabled the research in the first place.
84
Contribution of this thesis
5.2 Publication II: Strategizing the Shallows: Leverag-
ing Multi-Agent Reinforcement Learning for Enhanced
Tactical Decision-Making in Littoral Naval Warfare
5.2.1 Summary
Publication II develops and evaluates a MARL approach for generating COAs in a
littoral warfare context representative of the Baltic Sea. The combat environment
is formalized as a POSG with reactive agents that base decisions on local and joint
observations under explicitly modeled uncertainties. Two families of RL methods
are instantiated and compared in the same environment: (i) a discrete-action pipeline
centered on DDQN, where the state-space consists of surrounding terrain and map
objects as well as other values such as radar state, number of missiles, location of
friendly units and enemy bearings, and is processed with a CNN and linear NN to
produce actions, and (ii) a continuous-control or probability distribution pipeline us-
ing MAPPO with reward normalization and parameter-space noise to address sparse
rewards. Experiments on a 100 × 100 grid with ≈ 2.7nm cells, 15-minute time
steps) produce plausible, tactically interpretable COAs. While MADDQN tends to
converge to simple policies under uncertainty, MAPPO yields more stable learning
and qualitatively stronger COAs aligned with established littoral principles.
5.2.2 Methods and Data
The RL environment instantiates a POSG, an MDP extension, with multiple agents
per side, partial observability, and stochastic effects reflecting naval operations in
cluttered, shallow waters. The state is rendered as a 2D “game grid” over the Baltic
Sea (100 × 100 grid, ≈ 2.7nm cells), advanced in 15-minute ticks. Obstacles
and electronic-warfare considerations are encoded; inter-cell movement feasibility
is checked with A* pathfinding. Exogenous uncertainties (e.g., sensing and engage-
ment outcomes) are injected as probability distributions. Agents receive observa-
tions, communicate within side, and select actions that produce rewards tied to mis-
sion outcomes (including victory signals and penalties for losses).
Two algorithms from different RL methodologies are implemented and com-
pared. For discrete action spaces, a DDQN is trained with a CNN that processes
a slice of the observation state as a 2D image prior to convolution, from which the
output is concatenated with other feature inputs feeding fully connected layers. For
continuous control, MAPPO is employed in similar manner with only a deeper fully
connected linear layer NN and with PopArt normalization of return (value) signals
and parameter-space noise for exploration under sparse rewards; clipped, zero-mean
Gaussian perturbations are adapted during training. Training tactics include
• Adjusting learning rate when repeated victories emerge to curb overfitting and
85
Lauri Vasankari
guide the exploration towards the found solutions.
• Increasing per-rollout epochs when success is detected to exploit promising
policies.
For scenarios, core experiments use 3-vs-3 Blue/Red surface combatants alter-
nating the turns of trained agents, first with predetermined Red policies for initial
training of Blue agents, then optimizing Red tactics without updating Blue agents
and finally tuning Blue tactics against learned Red tactics. Multiple rollouts per seed
are executed to cope with stochasticity. Resulting COAs are visualized as trajectories
from start points to nearest engagement clusters over the Baltic grid.
5.2.3 Results and contribution
In the discrete-action setting, MADDQN converges fast but utilizes simplistic poli-
cies and struggles with generalization, exhibiting a tendency to fixate on overly sim-
ple policies in the face of non-stationary, uncertain dynamics. In the reported runs,
Blue achieves victory in 33.9% of trials while Red wins 11.7%, the remainder being
draws. The aggressive, straightforward maneuver that disregards complex environ-
mental interaction in favor of immediate engagement conceptually mirrors Admiral
Horatio Nelson’s famous tactical doctrine at the Battle of Trafalgar, where he dis-
carded traditional parallel formations to sail perpendicularly, straight at the adver-
sary’s line to force a decisive close quarter combat [349; 350]. In contrast, MAPPO
demonstrates improved training stability and performance, adapting better to sparse
rewards and environment complexity. The exploration was aided by parameter-space
noise instead of action noise, which amplified the results. The learned COAs for Blue
are tactically sensible, showcasing emergent behavior that correlates with established
tactical naval doctrine e.g., holding units back to exploit the cover of archipelago
while spreading some units towards the open sea to enable tracking. The agents
were utilizing terrain, coordinating unit movements and distributing force posture to
balance concealment and target acquisition. This resembles an established littoral
doctrine and offered interpretable options for a commander. Direct quantitative com-
parison between the results of the algorithms is not suitable, as the measured metrics
highlight engagements and victories, but the policies differ greatly in aggressiveness,
rendering more cautious MAPPO policies less effective despite being considerably
more robust in survival rates and qualitatively evaluated tactical coherence.
The publication provides an operationally grounded MARL testbed for littoral
warfare with uncertainty modeling and POSG formalization. As a result, it produces
a practical comparison between MADDQN and MAPPO in the same environment,
including network architecture, reward handling, and exploration design choices that
address sparse signals.
The research gives evidence that MARL can synthesize plausible, commander-
86
Contribution of this thesis
useful COAs rather than only maximizing abstract scores, enabling visual products
that support human reasoning. Reproducibility details such as scenario, grid scale,
timing, hardware, enable follow-on experimentation. Collectively, the work shows
how MARL can augment COA development for higher-level decision-making in
shallow-water naval contexts.
5.2.4 Author’s contribution
Author created the research setting, developed the Python simulation environment as
well as the RL algorithm implementation and model training and testing, and wrote
the initial research report. PhD Saastamoinen summarized, edited, submitted and
presented the research at 20th AIAI conference in June 2024.
5.3 Publication III: Reinforcement Learning for decision
support in defense and security: A systematic re-
view
5.3.1 Summary
This article is a systematic review of how RL is being used to provide decision sup-
port in defence and security. It surveys public, post-2000 work with authorship from
NATO SAS-181 nations, and complements the review with primers on military deci-
sion making (MDMP/OODA), DSS, RL fundamentals, simulation, and explainabil-
ity. The core contributions are:
• A curated set of 20 defence decision-support applications using RL.
• A 19-dimension classification mapped to Powell’s unified framework for se-
quential decisions.
• A Uniform Manifold Approximation and Projection (UMAP) landscape of the
field.
• A synthesis of gaps and recommendations.
The paper anchors readers with an OODA-to-RL mapping and an extended uni-
fied framework that adds TEVV and explainability to Powell’s original pipeline. The
literature landscape is then summarized via UMAP, with tables cataloging studies
and cross-tabs.
5.3.2 Methods and Data
The review uses a multi-national search strategy over national databases, Web of
Science and Scholar, combining three facets: decision support, RL/ADP, and de-
fence/security. Inclusion criteria included public availability, year ≥ 2000, defence
87
Lauri Vasankari
or security focus, explicit use of RL for support, and at least one SAS-181 author.
Screening removes behavior-automation work (no human-in-the-loop support), non-
RL optimizers, and entertainment-only game studies. Each retained article is la-
beled on 19 characteristics including application domain, policy class, uncertainty
handled, simulator type, maturity, explanation capability), which the authors map
to stages of the extended framework that includes modeling, uncertainty quantifi-
cation, policy design, algorithm strategy, TEVV, and explainability. The field is
then projected with UMAP: nominal features are one-hot encoded, ordinal ones are
ordinal-encoded, and two projections are unioned to reveal clusters. Methodological
scaffolding includes an OODA–RL process diagram, the extended framework, and a
complete study matrix, with targeted cross-tabs for maturity by domain, uncertainty
coverage, simulation horizon versus human time available, and direct and indirect
policy usage versus MARL setting.
5.3.3 Results and contribution
Results show that most decision-support applications concentrate on force employ-
ment at the tactical level. Value-function policies dominate the solutions, and nearly
all studies use bespoke simulators. Exogenous uncertainty is commonly modeled
but other forms are underrepresented and algorithmic explainability is rare. Maturity
skews towards theoretical and proofs of concept with no fielded systems. The UMAP
view separates studies primarily by presence/absence of evaluation, with a secondary
split by direct versus indirect policy usage. Additional relationships include a largely
aligned simulation horizon and human decision window, and that indirect uses more
often coincide with adversarial MARL settings.
The paper raises four practitioner challenges:
1. Complex, multi-actor, non-stationary scenarios.
2. Scarce, siloed, and sensitive data.
3. Weak RL–simulator interoperability.
4. Human trust and explainability.
These findings are translated into concrete recommendations, which include to
model to the decision, which can be described as fidelity by purpose or Occam’s razor
principle, to perform early, iterative TEVV, standardize data sharing, e.g., through
NATO Alliance Data Sharing Ecosystem, pursue a wargaming cloud and RL-ready
interfaces, and operationalize explainability. Collectively, the review provides a re-
producible taxonomy, a field map, and a deployment-oriented agenda for RL-based
decision support in defence and security.
88
Contribution of this thesis
5.3.4 Author’s contribution
Author was in charge of sections 2.1 and 5.2 while contributing as part of the research
group to the whole review article, it’s writing, related meetings, working groups and
submission. Weekly meetings revolved around reporting and discussing the progress
while the writing progressed. The work also included week long workshops that
focuses solely on reviewing and progressing the paper. All sections were reviewed
and commented by all the participants.
5.4 Publication IV: Emerging trends in federated learn-
ing: from model fusion to federated X learning
5.4.1 Summary
Publication IV is a focused survey on FL viewed through the lens of model fusion
and its intersections with other paradigms, hence labeled as “federated X learning.”
It organizes advances beyond FedAvg into five fusion families that include adap-
tive and attentive aggregation, regularization, clustered, and Bayesian methods, with
the sixth cross-sectional focus on fairness. The research then maps how FL couples
with transfer learning, knowledge distillation, multi-task and meta-learning, adver-
sarial, semi- and unsupervised learning, and reinforcement learning. The survey
contrasts this vantage point with general FL surveys, emphasizes statistical hetero-
geneity, communication and privacy as core drivers, and closes with challenges and
future directions such as label scarcity, on-device personalization, unsupervised/self-
supervised FL, combining paradigms, benchmarks, and production readiness.
5.4.2 Methods and Data
This is a narrative, scope-delimited literature review instead of a PRISMA-style sys-
tematic review. The authors formalize the standard FL objective and FedAvg work-
flow to anchor comparisons, propose a taxonomy of model-fusion strategies, and
survey “federated X” couplings with concrete formulations, e.g., transfer objectives,
KD losses, multi-task formulations over client-task matrices, meta-learning updates,
adversarial learning for bias mitigation, semi- and unsupervised losses, and FedRL
coordination. Evidence is drawn from peer-reviewed venues and well-cited preprints
up to early 2024, summarized in tables and topical sections, with brief mathematical
formulations to clarify algorithmic families. Application highlights cover recom-
mendation systems, healthcare, IoT, and edge scenarios. As a note, the author of this
thesis promoted military as a prime application field to be mentioned, but the focus
was kept on aforementioned subjects due to their universally accepted and under-
stood nature.
89
Lauri Vasankari
5.4.3 Results and contribution
Across fusion methods, the survey finds that
• Adaptive and attentive aggregation can temper non-IID drift by learning client
weights from parameter distance, recency, accuracy, or attention.
• Regularization (proximal, momentum/mime-style, dynamic penalties, contrastive,
prototype) aligns local and global objectives and mitigates client drift and pri-
vacy noise.
• Clustered FL (two-stage, multi-center, IFCA-style, ensembles) yields multiple
globals to capture client subpopulations.
• Bayesian approaches (neural matching, variational personalization, ensem-
bles) address permutation invariance and uncertainty.
• Fairness objectives (MiniMax/Q-fairness, collaborative/group fairness) rebal-
ance gains for under-represented clients.
For “federated X” and learning paradigms, the review catalogs workable pat-
terns and proposed solutions for FTL, KD, Federated Multi-Task Learning (FMTL),
meta-learning, adversarial learning, semi- and unsupervised FL as well as Federated
Reinforcement Learning (FedRL).
The paper contributes a taxonomy centered on model fusion and FL’s couplings
that complements broad FL surveys, a curated map of algorithm research with con-
cise objective forms that clarify where resilience to heterogeneity, security and pri-
vacy, and communication efficiency is introduced, as well as future directions and
challenges regarding label scarcity, on-device personalization, proliferating unsuper-
vised learning, and collaboration of multiple federated paradigms. The final conclu-
sions call for a unified benchmark to better enable research and improvement, as well
as an agenda for production FL to showcase practical applications of label-efficient
training and on-device personalization that utilize the aforementioned unsupervised
or self-supervised pretraining, combined paradigms, unified benchmarks and tooling,
and deployment patterns robust against real-world problems such as drift, diurnal ef-
fects, and cold-starts. These elements provide a deployment-oriented framework that
can be used to position other contributions within model-fusion choices and “feder-
ated X” integrations.
5.4.4 Author’s contribution
Author organized the framework and architecture of the paper as well as collected,
categorized and analyzed the background data from esteemed conference papers, as
well as reviewed the manuscript. Originally, a hundred high-level conference pa-
pers, with methodological significance, were retrieved from, for example, AAAI and
NeurIPS proceedings, which where then analyzed by the author and clustered into
90
Contribution of this thesis
different groups depending on the methodology, proposed solutions, performance
metrics and applicability.
5.5 Publication V: GenAI in Military: Trends and Op-
portunities
5.5.1 Summary
Publication V surveys how Generative AI (GenAI) is entering military use and where
it realistically adds value now. It opens with a concise state-of-the-art introduc-
tion that covers Transformers with self-attention mechanism, MoE, RAG, Chain-
of-Thought (CoT), knowledge distillation, unsupervised RL, Titans architectures,
and agentic AI. The research then reviews 29 military relevant publications from
2022 to early 2025, spanning decision-support and COA generation, wargaming and
simulation, information extraction and fine-tuning, as well as cybersecurity. The
review highlights the application field and utilized methodology, assessing oppor-
tunities and promises such as faster planning, better intel synthesis, training, cyber
defence, as well as risks including hallucinations, escalation tendencies, doctrine and
corpus misalignment, security and ethics gaps. The paper argues that development
and deployment will hinge on domain-specific data and interoperable, secure archi-
tectures, and makes a case for allied, federated approaches to build military-grade
base models to reduce reliance on proprietary systems.
5.5.2 Methods and Data
The study conducts a structured literature scan across RAND and RUSI outputs and
academic sources from Scholar, with the query (Military OR Defense) AND (GenAI
OR LLM OR Generative Artificial Intelligence) filtered to 2022–2025. From an ini-
tial pool of top 200 results inspected according to relevance 48 were preliminarily
retained, and 29 publications met inclusion criteria of GenAI focus and military rel-
evance. Each paper was double-coded into seven categories: Survey, Review, Policy
Analysis, Application, Proposition, Overview, Other, where application and proposi-
tion papers received deeper assessment for readiness, feasibility, and implementation
issues. As stated above, the article supplements the review with a state-of-the-art
primer to contextualize trends and a cross-cutting analysis of gaps and recommenda-
tions.
5.5.3 Results and contribution
As a result, the research shows that current applicatory research focuses on decision-
support, domain information extraction and small-model fine-tuning, as well as se-
91
Lauri Vasankari
lected cyber tasks. Decision making and decision support includes COA generation
as well as agentic settings simulating diplomatic discourses, strategic, and tactical
scenarios. Smaller-scale approaches focus on operator level systems for unmanned
assets and, e.g., intent detection.
Risks and limits denote that LLM agents can over-escalate in simulated geopol-
itics. COA generators accelerate planning but can raise friendly casualties. Studies
rely on unclassified data, are simulator-bound, and not integrated end-to-end with
actual C2 and ISR workflows. Likewise, security, ethics, and explainability remain
underrepresented. The trend suggests that practice gravitates to either proprietary,
general-purpose LLMs used via prompting for exploratory surrogates, and in a par-
allel lane towards smaller open-weight models fine-tuned for missions— or task-
specific purposes, while cutting-edge advances such as unsupervised RL, Titans, and
long-context, are mostly commercial and not yet militarized at scale.
The paper curates and classifies recent GenAI-for-military literature into an ac-
tionable map, integrates a state-of-the-art primer to connect frontier techniques to
military constraints, and translates gaps into concrete guidance: build secure, in-
teroperable pipelines, operationalize governance and human-AI teaming, develop
mission-aligned benchmarks, and pursue coalition training of military base mod-
els via federated learning to overcome data-sharing limits. These elements form a
deployment-oriented agenda for making GenAI a trustworthy decision-support ca-
pability rather than an off-the-shelf curiosity.
5.5.4 Author’s contribution
Author collected and analyzed the data to gather the used knowledge base of the re-
search field, wrote the manuscript, edited and submitted the research and applied the
reviewer comments from revision. PhD Koski supervised the article and contributed
in the methodology, co-analysis of the data and the synthesis of findings.
5.6 Publication VI: The dilemma of AI reliability
5.6.1 Summary
Publication VI analyzes a core paradox in deploying AI for military use: to be
fielded, an AI system must be perceived as highly reliable, yet the very perception
of reliability erodes the ethical-legal demand for “meaningful human oversight.” The
author frames this as a Catch-22: either the human slows the system enough to negate
its benefits, or the human becomes a nominal gatekeeper who defers to the seem-
ingly reliable machine and only absorbs responsibility when things go wrong. The
argument is developed across autonomy configurations generally stated as in/on/off-
the-loop, decision-support as well as weapon systems, the limits of explainability for
92
Contribution of this thesis
modern ML, and the practical constraints of testing and generalization under scarce,
heterogeneous military data. Historical episodes such as Petrov’s false-alarm inter-
vention and Patriot fratricides illustrate asymmetries in risk tolerance for humans
versus machines. The paper proposes an ideological shift to treat the human as a
support to an intelligent system, supplying context and missing information, rather
than as an all-knowing overseer, and adopt a three-step model to accept human cog-
nitive limits, align ideals with realistic improvements, and focus on outcomes over
means.
5.6.2 Methods and Data
This is a concept-driven, argumentative essay grounded in policy, e.g., DoD 3000.09,
ethics literature, and illustrative cases rather than empirical experiments. The method
is analytic and synthetic. It defines the deployment context, including autonomy
levels, DSS and weapons, assesses explainability and validation limits for complex
ML models such as Grad-CAM and CoT aids, over- and underfitting issues, out-of-
distribution risk under non-IID data, and reasons from historical incidents and com-
parative risk tolerance in civilian autonomy versus human drivers, missile warning
and air-defence cases to articulate the oversight paradox. The proposal is then struc-
tured into a normative, three-step integration model that reassigns the human role
from scrutinizer to context amplifier and reorients acceptance criteria toward mea-
surable improvements in performance and error-to-risk rates relative to the human
baseline.
5.6.3 Results and contribution
Perceived reliability is both necessary and corrosive to oversight. The more “re-
liable” a system appears, the more humans rationally defer, especially under time
pressure, creating a de facto responsibility gap. Explainability-first remedies scale
poorly for deep models and can undercut the very scalability advantages that mo-
tivate autonomy. TEVV limits in military contexts that include data scarcity, het-
erogeneity, and non-IID drift make flawless assurance infeasible. Acceptance must
hinge on comparative outcomes, not impossibly absolute guarantees. Risk tolerance
asymmetry means machines are judged against near-zero-error expectations, while
humans routinely err without equivalent scrutiny.
The publication makes an ethical and philosophical contribution by identifying
and proposing a solution to the ”reliability-oversight paradox” in autonomous mili-
tary systems. The results claim that a clarified Catch-22 of AI reliability for military
AI that unifies oversight, trust, and responsibility into a single deployment dilemma
requires a role reallocation, framing human as information augmenter for the AI,
resolving the oversight-to-performance tension without abandoning ethical intent.
93
Lauri Vasankari
Three-step adoption model acknowledges human cognitive limits, calls to project
ideals into realistic, measurable baselines, and aims to focus evaluation on outcomes.
This is claimed to yield practicable acceptance criteria and policy-relevant guidance
that shifts governance from scrutinizing every decision toward ensuring interfaces
and processes that surface blind spots and feed the system timely, context-rich data
or showcase the absence of it. Together, these elements offer a path to integrate AI
that is more humane in result, even when less “human” in mechanism. In this new
model, the human operator is tasked not with second-guessing the AI system, but
with covering its inherent blind spots by providing and assessing wider context and
information the system lacks.
5.7 Conceptual framework
Derived from the original publications, the primary conceptual contribution of this
dissertation is the formulation and validation of the thesis that for military AI, data
capability is the critical enabler. This has been the case from the information security
point of view, as the classification and security of information have been a key mili-
tary aspect for, arguably, as long as military forces have existed. However, the point
of view has been limited to the particular information in a particular sample, and its
value has been assessed in isolation. For example, information regarding technical
capability of a weapon system is strictly classified, as it gives away the performance
metrics of the system. The wider context of the information, as machine-readable
data, has not been a priority nor a capability concern. At most, data has been viewed
as an information flow inwards, and AI as a technology has been seen as a way to
handle the increasing amounts of data. Simultaneously, operational data has been
ephemeral and its value has been measured with regard to the specific use case for
which it was obtained.
This work argues for a paradigm shift away from a narrow, algorithm-centric
view of AI development toward a holistic, system-centric perspective focused on
the entire learning loop that transforms the ephemeral use of data into a continuous
cycle of data acquisition, preparation, continuous improvement, and deployment of
operational AI models.
This framework, displayed in Figure 10 and substantiated across the analyses
of CV, RL, FL, and GenAI, indicates that the dominant constraint and simultane-
ously the greatest opportunity in military AI is the data ecosystem. The contribution
lies in defining this ecosystem not merely as a repository of information, but as an
active, operational capability that encompasses data governance and pipelines, feed-
back mechanisms, and interoperable infrastructure.
The data governance and pipelines are the processes and infrastructure required
to collect, process, and exploit high-quality data at scale. Data pipelines require
integrations to connect the collection of data to its preprocessing, preparation and
94
Contribution of this thesis
utilization. Data governance is also linked to accessibility, as a significant portion
of the military data is classified and accessible on a need-to-know basis. Hence, a
rigorous framework is required to enable the best tradeoff between data availability,
its use, and data security.
System design must incorporate built-in feedback mechanisms to capture human
operator inputs and feedback, such as labels, rewards, and preferences, to accumu-
late operationally relevant, enriched data that enables continual model improvement
without placing an unnecessary strain on already limited resources such as personnel.
Interoperable infrastructure refers to both digital and physical infrastructures.
The digital infrastructure includes common simulators, benchmarks, and tooling to
support training, testing, evaluation, and collaboration. The physical infrastructure
encompasses data centers, computation capacity, and secure operational networks
that enable proper connectivity. Crucially, the physical infrastructure also requires
robustness through distribution and federation to ensure operational capabilities dur-
ing conflict.
Data Pipelines and Governance
Data
Ecosystem
Integrations
Ingest
Label
Process
Curate
Data
Availability
Collect Train
Policy
Interoperable infrastructure
Need to know
Benchmarks
Simulators
Data centers Maintenance
Operational
Mechanisms
RLHF
Evaluation
Continuous improvement
Security
Networks
Pipelines
Governance
Digital
Physical
Deploy Connectivity
AI Research
and
Development
CVNLP RL
GenAI FL Others
Safety &
Ethics
Figure 10. The Data Ecosystem framework
The complete conceptual framework is, non-exhaustively, visualized in Figure 10,
aiming to capture the key points. The development, deployment and improvement
of military AI capabilities through research and development relies on the ecosys-
tem, which is structurally governed by data pipelines and policy frameworks. These
include the non-technical processes and policies as well as technical procedures and
solutions that first dictate what each process aims to achieve as well as the constraints
95
Lauri Vasankari
and requirements that are placed on it. These include, for example, the accessibility
and security policies. The interoperable infrastructure comprises digital and physi-
cal entities, where physical structure serves as the capacity for digital systems. The
digital capacities include the stated simulators, benchmarks, and connectivity proto-
cols, but also their own policies and guidelines on usage and development. Finally,
the operational mechanisms include the stated system requirements that aim to en-
hance data collection, preprocessing, and storing to create valuable data capabilities
in order to advance the AI research and development efforts.
Despite being conceptual, the physical and digital infrastructure of this ecosys-
tem must correlate with the military information system described in Subsection 3.2.5.
To avoid treating AI as an external add-on, the model deployment, data pipelines and
feedback mechanisms must be embedded directly into the DSS, C2 and ISR systems
that constitute the military’s existing information architecture.
Finally, while the emergence of AGI capabilities in AI is speculative at best,
not being supported by current broader understanding across the scientific field, a
great shift could occur if an RL based paradigm could be instantiated to contin-
uously learn from (near) real-world like interactions. This was briefly iterated in
Section 4.8, quoting Richard Sutton’s claim that only search and learning, as general
purpose methods, scale arbitrarily to arbitrarily complex problems. This possible
future course does not render the Figure 10 obsolete, but merely changes the way
it would be viewed from human perspective and operated by the ML algorithm or
model that could evolve either in silico or even when operational.
5.8 Methodological Framework for Synthesis
This thesis presents a methodological contribution by adapting and applying the
CRISP-DM as a tool for synthesizing disparate military AI research. While tradi-
tionally used for enterprise data mining projects, this work shows its utility as an
academic framework for connecting low-level technical findings to high-level strate-
gic and operational objectives.
By consistently applying the military adapted version of CRISP-DM stages across
the different publications, this dissertation provides a structured and repeatable method-
ology for assessing the operational relevance of technical AI research and develop-
ment, identifying systemic bottlenecks such as data availability and infrastructure
gaps, and most importantly providing a framework to translate theoretical and tech-
nical algorithm or model performance into tangible, operational military capability.
96
6 Conclusion
This dissertation set out to examine how machine learning can be developed and
integrated within the military domain and to determine what truly constrains and
enables military AI capabilities. Across four research areas, namely computer vision,
reinforcement learning, federated learning, and generative AI, the analyses converge
on a single finding that the data capability, realized as the data ecosystem, is both the
dominant constraint and the greatest opportunity. For advancing military AI, specific
models and algorithms matter less than a robust data ecosystem where data itself is
treated, not as a by-product or an ephemeral feedstock, but as a first-class operational
resource.
Empirically and methodologically, the work contributes in three ways. First, it
synthesizes evidence from applied studies and literature surveys to show that all prac-
tical ML paradigms require mechanisms to collect, govern, and exploit high-quality
data at scale, under strict security and classification constraints. Second, the disser-
tation adapts CRISP-DM as a synthesis framework, connecting problem definition
to data understanding, preparation, modeling, evaluation, and deployment in mili-
tary settings. This common lens, adapted from the industry, makes visible where
bottlenecks accumulate and what are the issues regarding successful deployment of
ML in military domain. The highlighted bottlenecks exist especially at three points:
the business understanding, which can be understood as doctrines and policies in the
military sense, as well as the mere existence and availability of quality data, and fi-
nally the data understanding. The business understanding that guides the goal setting
is consequently goal-focused: even ancient doctrines highlight speed and quality of
data processing, although in different wordings. However, the understanding lacks
the expertise to identify the systemic limitations that hinder the full process of ad-
vancing from data to deployment.
Likewise, the existence and availability of data as well as the related data under-
standing is limited, which means that data is seen as an ephemeral tool for decision-
making unless it consists of detailed capability information, such as intelligence re-
ports and technical details of weapon systems. This prevents storing high-quality
data for preprocessing into valuable data products that could be used for modeling
purposes. The data understanding appears two-fold: at first, if the value of certain
data is not recognized, it will not be stored. If the data is not stored, the emergent un-
derstanding is never reached, as usually the data can be explored with, for example,
97
Lauri Vasankari
unsupervised methods to reveal patterns and value that is not trivially evident. For
the user, the value of data is indeed ephemeral, as the task is not the data itself but
what the data immediately enables, with regard to a certain desired end state.
These three are simultaneously the choke points where investments have the
highest leverage. From a supplementary perspective, the model architectures and
algorithms are not as much of hindrance compared to the mentioned three aspects.
As a third contribution and a solution for the aforementioned issues, this dis-
sertation proposes and motivates an integrated Data Ecosystem framework, shown
in Figure 10 that combines governance and pipelines, operator-in-the-loop feedback
mechanisms, and interoperable infrastructure spanning digital (simulators, bench-
marks, tooling) and physical (compute, networks, secure facilities) layers.
6.1 Summary of Key Findings
As the hypothesis is that AI is, or inevitably becomes, a major military capability
that largely determines the future of warfare, the central counterargument of this the-
sis is that progress in military AI is fundamentally constrained by systemic issues
related to data availability, quality, governance, and infrastructure. This was con-
sistently shown across all research areas in the system-level investigation into the
applicability, impact and challenges of current AI methods and research areas.
Publication I, a CV study on sonar imagery, highlighted how even state-of-the-
art ML models fail or fall short when confronted with small, low-quality, and frag-
mented real-world sensor data that represents the operational reality, underscoring
the critical need for integrated data pipelines and a curated data repository. The anal-
yses of RL in Publication II and Publication III revealed that the field’s potential is
hampered by the scarcity of high-fidelity, interoperable simulators and the lack of
operational data, which are essential for training, validation, and real-world transfer.
The Publication I and Publication II provided empirical evidence of the challenges in
solving even narrow problems with a limited resources regarding data and supporting
systems, fulfilling the empirical research objective.
The investigation into GenAI in Publication V provided insight into the future
direction and challenges of this novel ML research area, identifying a critical depen-
dency on proprietary models that are misaligned with military requirements, arguing
that progress hinges on developing military-specific foundation models, a goal that
is currently hindered by the lack of curated data and a collaborative infrastructure. In
part, this meets the future directions and challenges insight objective.
In response to these challenges, this dissertation puts forward two key solu-
tions. First, FL that is introduced in Publication IV was identified as a critical
enabling paradigm for secure, collaborative model development among allied na-
tions, substantially mitigating the tension between data sharing and security as well
as displaying a possible solution to non-IID data and user specific requirement chal-
98
Conclusion
lenges. Second, the ethical analysis in Publication VI proposed a resolution to the
“reliability-oversight paradox” by reframing the human role from simple oversight
to one of active support, thereby creating a more effective and realistic model for
human-machine teaming within the operational learning loop and integrating ethical
considerations as a system-level factor into military AI capability development.
From the system-level, top-down perspective, the adaptation of CRISP-DM method-
ology to bridge the AI research into tangible operational capability was, within the
limitations of this thesis, shown feasible and effective. Other frameworks can be
adapted in the similar manner, but CRISP-DM was selected primarily to bridge the
theoretical research into practice in an industry equivalent manner.
Overall, the objective of a system-level investigation into the applicability, im-
pact and challenges of developing military AI capabilities has been met alongside
other objectives.
6.2 Implications
The primary, tangible contribution is the conceptual framework of the data ecosys-
tem that synthesizes the findings into a holistic, actionable model for military AI
capability development. It aims to shift the focus from isolated ML models and AI
paradigms to the surrounding infrastructure, processes, and policies required to sus-
tain a continuous learning loop. As pointed out in Section 4.8, historical perspective
posits that in the long run, advancing computation capacity enables success with gen-
eral methods, given that there is sufficient high-quality data and, in some cases like
RL, an environment to search within. In this context, the robust data ecosystem that
continuously improves data capability is the core component in advancing military
AI. This does not render algorithmic innovations and novel model architectures ob-
solete in any way, but enables and strengthens the ability to adapt these in an efficient
manner.
The practical implication is that military AI programs should prioritize building
data capabilities, which include policies, processes, and infrastructure, so that con-
tinual data acquisition, curation, secure sharing, and reuse are enabled across mis-
sions and organizations. This includes systematic feedback capture including labels,
rewards and preferences to support ongoing model improvement without placing un-
necessary strain on already limited personnel. Interoperability and robustness must
extend across both digital standards, APIs, and physical networks, with distribution
for resilience under contested conditions.
Ultimately, these implications extend across the military domain on different
tasks and functions shown in Table 2, and even surface a critical ethical and or-
ganizational point regarding the design of human-machine teams.
99
Lauri Vasankari
Table 2. Implications of a Data-Centric Approach for the Military
Domain Implication
Doctrine and Policy Military doctrine must recognize data as a
strategic asset and a core capability compo-
nent, on par with traditional platforms and
weapon systems. Because commercial mar-
kets cannot supply the highly specific, classi-
fied data required for tactical operations, allied
forces must organically generate this data ca-
pability through their own operations.
Acquisition and Procurement New systems must be procured with ”data
readiness” as a core requirement to ac-
tively realize the proposed Data Ecosystem.
Moving beyond passive data logging, pro-
curement mandates must specify interopera-
ble data pipelines, open architectures, and
built-in interfaces that capture operator use
and feedback (e.g., for RLHF). Enforcing
these requirements and standards ensures that
ephemeral operational information is system-
atically transformed into an active data capa-
bility for continuous AI model development
and improvement.
Research and Development R&D efforts should prioritize the creation
of common, interoperable infrastructure, in-
cluding shared benchmarks and simulation
environments, to enable repeatable evalua-
tion. Furthermore, multinational collaboration
should be fostered through privacy-preserving
paradigms like FL, advancing military-specific
foundation models, complementing and reduc-
ing reliance on proprietary, commercial alter-
natives.
Human-Machine Teaming The design of AI-enabled systems should
move beyond the brittle concept of ”human-
on-the-loop” oversight and towards a model
of human support, where operators are trained
and equipped to augment and assess the AI
system by covering its known limitations.
100
Conclusion
6.3 Limitations and Future Research
This dissertation, being a cumulative work, synthesizes findings from specific ML
subfields. While this provides a broad overview, it does not encompass every po-
tential military application of AI. The research was primarily focused on Western
military contexts, inevitably reflecting an authorial bias rooted in European, and es-
pecially Nordic, strategic and operational perspectives and realities. Further work is
needed to explore these concepts in other strategic and operational settings. It is also
noteworthy that the analyses are necessarily bounded by the availability of public
sources, small-scale experiments, and domain-specific constraints.
Building on the Data Ecosystem framework, future research should pursue sev-
eral key directions. Firstly, the development of military-specific, standardized bench-
marks for core military tasks, such as target recognition and COA generation, is crit-
ical for objectively measuring progress and validating ML models and AI systems.
Secondly, future research should focus on architectures for federated operations. Re-
search is needed to develop and test robust FL architectures in operationally relevant,
multi-national exercise environments to address challenges of network latency, secu-
rity, personalization, knowledge distillation, as well as model and hardware hetero-
geneity. Further research is required to validate the proposed ”human-as-supporter”
model, measuring the combined performance of human-AI teams against conven-
tional human-in-the-loop and human-on-the-loop approaches.
Finally, bridging the simulation-to-reality gap is a considerable research area out-
side the military, but a concerted effort is required to create high-fidelity, validated
simulation environments that can serve as reliable training and testing grounds. Not
just for RL agents, but for TEVV purposes and, for example, to validate COA gener-
ation in a quantified manner as well as to research autonomous system deployment
cost-effectively.
In conclusion, the successful integration of state-of-the-art AI into military forces
will not be achieved through a technological silver bullet, especially one that is not
built from within. It requires a deliberate, systemic, and sustained effort to build
an ecosystem that can cultivate, process, and leverage data at the speed and scale
of modern conflict. The path to effective military AI runs through data capability:
governed, pipelined, fed by human feedback, and supported by interoperable infras-
tructure. Acquisition programs that build these foundations will be best positioned
to field trustworthy, resilient, and continually improving AI systems, closing the gap
between the promise of AI and its operational reality.
101
Declarations
Declaration of Funding and Conflicts of Interest
There are no conflicts of interests to be declared, and this research acquired no fund-
ing apart from Publication II, for which the statement is declared in the paper itself.
Research ethical statement
The summary of this dissertation was written solely by the author, adhering to com-
mon research ethics. AI was used in the making of this research in its obvious role as
the primary research interest as displayed in original publications and as a tool as dis-
played in the Chapter 2. Large, proprietary LLMs, Gemini and OpenAI GPT, were
used in formatting, spell-checking and cross-checking the main text. These models
were also used to format and check bibliography and to search for research papers
such as foundation sources for original ideas. Despite these tools, the research effort
and the written text is solely work of the author.
102
Bibliography
[1] Friedrich Naumann Foundation for Freedom. Lethal autonomous weapons
systems: Challenges for regulation and the role of the european
union. Technical report, Friedrich Naumann Foundation for Freedom,
2023. URL https://shop.freiheit.org/download/P2@953/
335163/Policy%20Paper%20LAWS-ENG-Final.pdf.
[2] Forrest E. Morgan, Benjamin Boudreaux, Andrew J. Lohn, Mark Ashby,
Christian Curriden, Kelly Klima, and Derek Grossman. Military appli-
cations of artificial intelligence: Ethical concerns in an uncertain world.
Technical Report RR-3139-1-AF, RAND Corporation, Santa Monica, CA,
2020. URL https://www.rand.org/pubs/research_reports/
RR3139-1.html.
[3] The Editors of Encyclopaedia Britannica Encyclopedia Britannica. Bombe.
Encyclopedia Britannica, 2025. URL https://www.britannica.
com/topic/Bombe. Accessed 24 August 2025.
[4] Francis Harry Hinsley and Alan Stripp. Codebreakers: the inside story of
Bletchley Park. Oxford University Press, 2001.
[5] Alexander Jung. Machine learning: The basics, 2022. URL https://
arxiv.org/abs/1805.05052.
[6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT
Press, 2016. http://www.deeplearningbook.org.
[7] Nestor Maslej, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa
Parli, Njenga Kariuki, Emily Capstick, Anka Reuel, Erik Brynjolfsson, John
Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos
Niebles, Yoav Shoham, Russell Wald, Tobi Walsh, Armin Hamrah, Lapo San-
tarlasci, Julia Betts Lotufo, Alexandra Rome, Andrew Shi, and Sukrut Oak.
The AI Index 2025 Annual Report. Technical report, Institute for Human-
Centered AI, Stanford University, Stanford, CA, 2025.
[8] Bradley Martin, Danielle C. Tarraf, Thomas C. Whitmore, Jacob Deweese,
Cedric Kenney, Jon Schmid, and Paul Deluca. Advancing Autonomous Sys-
tems: An Analysis of Current and Future Technology for Unmanned Maritime
Vehicles. Technical Report RR-2751-NAVY, RAND Corporation, Santa Mon-
ica, CA, jan 2019. URL https://www.rand.org/pubs/research_
reports/RR2751.html.
103
Lauri Vasankari
[9] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all
you need. In Proceedings of the 31st International Conference on Neural
Information Processing Systems, NIPS’17, page 6000–6010, Red Hook,
NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964. URL
https://papers.nips.cc/paper_files/paper/2017/hash/
3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[10] OpenAI. Introducing chatgpt (research preview). OpenAI Blog, November
2022. URL https://openai.com/fi-FI/index/chatgpt/. Ac-
cessed last on 2026-03-07.
[11] OpenAI. Chatgpt. https://chat.openai.com, 2023. Accessed: 2025-
06-20.
[12] NATO. Emerging and disruptive technologies, 2024. URL https://
www.nato.int/cps/en/natohq/topics_184303.htm. Accessed
7th March 2026.
[13] Francis Bacon. Novum Organum Scientiarum. John Bill, London, 1620.
[14] Rene´ Descartes. Discours de la Me´thode pour bien counduire sa raison, et
cherched la ve´rite´ dans les sciences. Jan Maire, Leiden, 1637.
[15] Rene´ Descartes. Key Philosophical Writings. Wordsworth Editions Limited,
1997. ISBN 9781853264702. Translated by Elizabeth S. Haldane and G.R.T.
Ross.
[16] Defence Acquisition University (DAU). AI Glossary for the DoD,
2025. URL https://www.dau.edu/sites/default/files/
Migrated/CopDocuments/DAU%20AI%20Glossary.pdf. Cited
20.7.2025.
[17] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.
Pearson, 3 edition, 2016. ISBN 9781292153964.
[18] Elaine Rich. Artificial Intelligence. McGraw-Hill series in artificial intelli-
gence, 1983. ISBN 0-07-052261-8.
[19] Arleen Salles, Kathinka Evers, and Michele Farisco. Anthropomorphism in
AI. AJOB Neuroscience, 11(2):88–95, 2020. doi: 10.1080/21507740.2020.
1740350. URL https://doi.org/10.1080/21507740.2020.
1740350. PMID: 32228388.
[20] Drew V. McDermott. Artificial Intelligence Meets Natural Stupidity. SIGART
Newsletter, 57:4–9, apr 1976. doi: 10.1145/1045339.1045340. URL https:
//dl.acm.org/doi/epdf/10.1145/1045339.1045340.
[21] Melanie Mitchell. Artificial Intelligence: A Guide for Thinking Humans. Far-
rar, Straus and Giroux, New York, 2019. ISBN 978-0374257835.
[22] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko
Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination
in natural language generation. ACM Comput. Surv., 55(12), March 2023.
104
BIBLIOGRAPHY
ISSN 0360-0300. doi: 10.1145/3571730. URL https://doi.org/10.
1145/3571730.
[23] Muru Zhang, Ofir Press, William Merrill, Alisa Liu, and Noah A. Smith.
How language model hallucinations can snowball. In Ruslan Salakhutdi-
nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan
Scarlett, and Felix Berkenkamp, editors, Proceedings of the 41st Interna-
tional Conference on Machine Learning, volume 235 of Proceedings of Ma-
chine Learning Research, pages 59670–59684. PMLR, 21–27 Jul 2024. URL
https://proceedings.mlr.press/v235/zhang24ay.html.
[24] European Union. Regulation (EU) 2024/1689 of the European Parliament
and of the Council of 13 June 2024 laying down harmonised rules on ar-
tificial intelligence (Artificial Intelligence Act) and amending certain Union
legislative acts. https://eur-lex.europa.eu/legal-content/
EN/TXT/?uri=CELEX:32024R1689, June 2024. OJ L, 13 June 2024.
[25] Philip M. Morse and George E. Kimball. Methods of Operations Research.
MIT Press, Cambridge, MA, 1951.
[26] Frederick S. Hillier and Gerald J. Lieberman. Introduction to Operations Re-
search. McGraw-Hill, 11 edition, 2021.
[27] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Intro-
duction. The MIT Press, 2 edition, 2018. ISBN 9780262039246.
[28] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of
Machine Learning. MIT Press, Cambridge, MA, 2nd edition, 2018. ISBN
9780262039406. URL https://cs.nyu.edu/˜mohri/mlbook/.
[29] Arnold Wolfers. “national security” as an ambiguous symbol. Political Sci-
ence Quarterly, 67(4):481–502, dec 1952. doi: 10.2307/2145138. URL
https://doi.org/10.2307/2145138.
[30] Carl von Clausewitz. On War. Princeton University Press, 1984. Originally
published posthumously in 1832; this is the authoritative English edition.
[31] Max Weber. Politics as a vocation. In H. H. Gerth and C. Wright Mills, editors,
From Max Weber: Essays in Sociology. Oxford University Press, New York,
1946. Reprinted from Max Weber’s Essays in Sociology. Translated, edited,
and with an introduction by H. H. Gerth and C. Wright Mills.
[32] U.S. Joint Chiefs of Staff. Doctrine for the Armed Forces of the United States.
Technical Report JP 1, Joint Chiefs of Staff, Washington, DC, March 2013.
Incorporating Change 1, 12 July 2017.
[33] U.S. Joint Chiefs of Staff. Joint planning. Technical Report JP 5-
0, Joint Chiefs of Staff, Washington, DC, December 2020. URL
https://www.esd.whs.mil/Portals/54/Documents/FOID/
Reading%20Room/Joint_Staff/18-F-1152_JP_5-0_Joint_
Planning_2020.pdf.
105
Lauri Vasankari
[34] Headquarters, Department of the Army. The operations
process. Technical Report ADP 5-0, Department of the
Army, Washington, DC, July 2019. URL https://
rdl.train.army.mil/catalog-ws/view/100.ATSC/
E4166A5D-0FE7-4780-916A-A7E9B227147C-1337689957702/
adp5_0.pdf.
[35] Anthony Bellione. The heart of decision superiority: Evolve or
lose – why your next war may be won or lost in seconds. Joint
Air Power Competence Centre (JAPCC), Journal Edition 36, Oc-
tober 2023. URL https://www.japcc.org/articles/
the-heart-of-decision-superiority/. Author listed as
Col (ret.) Anthony Bellione, USAF. Accessed 2026-02-21.
[36] Sun Tzu. The Art of War. Arcturus Publishing Limited, London, England,
2014. ISBN 9781784042028.
[37] Aleksandr Vasile´vicˇ Suvorov. Suvorov’s Art of Victory. H. Charles-
Lavauzelle, Paris, 1899. URL http://catalogue.bnf.fr/ark:
/12148/cb30353002b. Digital preservation: ark:/12148/bpt6k86570m.
[38] Joseph Clark. Pentagon Official Lays Out DOD Vision
for AI, February 2024. https://www.defense.gov/
News/News-Stories/Article/article/3682355/
pentagon-official-lays-out-dod-vision-for-ai/. DOD
News. Accessed 17.10.2025.
[39] North Atlantic Treaty Organization. Summary of the
NATO Artificial Intelligence Strategy, October 2021.
URL https://www.nato.int/en/about-us/
official-texts-and-resources/official-texts/2021/
10/22/summary-of-the-nato-artificial-intelligence-strategy.
Accessed 2026-03-07.
[40] U.S. Department of Defense. Department of Defense Data, An-
alytics, and Artificial Intelligence Adoption Strategy. Techni-
cal report, U.S. Department of Defense, November 2023. URL
https://media.defense.gov/2023/Nov/02/2003333300/
-1/-1/1/DOD_DATA_ANALYTICS_AI_ADOPTION_STRATEGY.PDF.
[41] Koichiro Takagi. Is the PLA overestimating the potential of artificial
intelligence? Joint Force Quarterly, 116(4):71–78, 2025. URL https:
//digitalcommons.ndu.edu/joint-force-quarterly/
vol116/iss4/10. 1st Quarter 2025.
[42] Lauri Vasankari. Tekoa¨ly ja automaatio tulevaisuuden laivastojoukoissa. Pro
gradu -tutkielma (master’s thesis), Maanpuolustuskorkeakoulu (National De-
fence University), Helsinki, Finland, 2022. URL https://www.doria.
fi/handle/10024/185612. Department of Military Technology (So-
106
BIBLIOGRAPHY
tatekniikan laitos). Persistent identifier: URN:NBN:fi-fe2022080953410. Ac-
cessed 2026-02-21.
[43] Peter B. Checkland. Systems Thinking, Systems Practice. John Wiley & Sons,
Chichester, UK, 1981. Revised edition, 1999.
[44] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne
Lachaux, Timothe´e Lacroix, Baptiste Rozie`re, Naman Goyal, and Guillaume
Lample. LLaMA: Open and efficient foundation language models. arXiv
preprint arXiv:2302.13971, 2023. URL https://arxiv.org/abs/
2302.13971.
[45] Mistral AI. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. URL
https://arxiv.org/abs/2310.06825.
[46] Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino
Vieillard, and et al. Gemma 3 technical report. Technical Report
arXiv:2503.19786, Google DeepMind, March 2025.
[47] Shu hsien Liao. Case-based decision support system: Architecture for simu-
lating military command and control. European Journal of Operational Re-
search, 123(3):558–567, 2000. ISSN 0377-2217. doi: https://doi.org/10.1016/
S0377-2217(99)00109-5. URL https://www.sciencedirect.com/
science/article/pii/S0377221799001095.
[48] J. Scrimgeour. Open surveys: An information system for the improve-
ment of international stability. Control Engineering Practice, 2(5):791–
802, 1994. ISSN 0967-0661. doi: https://doi.org/10.1016/0967-0661(94)
90344-1. URL https://www.sciencedirect.com/science/
article/pii/0967066194903441.
[49] Shashi D. Buluswar and Bruce A. Draper. Color machine vision for
autonomous vehicles. Engineering Applications of Artificial Intelligence,
11(2):245–256, 1998. ISSN 0952-1976. doi: https://doi.org/10.1016/
S0952-1976(97)00079-1. URL https://www.sciencedirect.com/
science/article/pii/S0952197697000791.
[50] Luiz Bortolan Neto, Michael Saleh, Vanessa Pickerd, George Yian-
nakopoulos, Zenka Mathys, and Warren Reid. Rapid mechani-
cal evaluation of quadrangular steel plates subjected to localised blast
loadings. International Journal of Impact Engineering, 137:103461,
2020. ISSN 0734-743X. doi: https://doi.org/10.1016/j.ijimpeng.
2019.103461. URL https://www.sciencedirect.com/science/
article/pii/S0734743X19301708.
[51] Mehdi Hosseinzadeh, Jawad Tanveer, Amir Masoud Rahmani, Khursheed
Aurangzeb, Efat Yousefpoor, Mohammad Sadegh Yousefpoor, Aso Dar-
wesh, Sang-Woong Lee, and Mahmood Fazlali. A Q-learning-based
smart clustering routing method in flying Ad Hoc networks. Journal
of King Saud University - Computer and Information Sciences, 36(1):
107
Lauri Vasankari
101894, 2024. ISSN 1319-1578. doi: https://doi.org/10.1016/j.jksuci.
2023.101894. URL https://www.sciencedirect.com/science/
article/pii/S1319157823004482.
[52] Rehan Akbani, Turgay Korkmaz, and G.V. Raju. EMLTrust: An enhanced
Machine Learning based Reputation System for MANETs. Ad Hoc Networks,
10(3):435–457, 2012. ISSN 1570-8705. doi: https://doi.org/10.1016/j.adhoc.
2011.08.003. URL https://www.sciencedirect.com/science/
article/pii/S1570870511001867.
[53] Manjit Kaur, Deepak Prashar, Leo Mrsic, and Arfat Ahmad Khan.
Machine learning-based routing protocol in flying ad hoc networks:
A review. Computers, Materials and Continua, 82(2):1615–1643,
2025. ISSN 1546-2218. doi: https://doi.org/10.32604/cmc.2025.
059043. URL https://www.sciencedirect.com/science/
article/pii/S1546221825001298.
[54] Miguel Aˆngelo Lellis Moreira, Guilherme Vinagre Pinto de Souza, Igor Pin-
heiro de Arau´jo Costa, Wilson Tarantin Junior, Luiz Paulo Fa´vero, Mar-
cos dos Santos, and Carlos Francisco Simo˜es Gomes. Defense per-
ception in the geopolitical scope: An exploratory study through un-
supervised machine learning. Procedia Computer Science, 221:689–
696, 2023. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2023.08.039. URL https://www.sciencedirect.com/science/
article/pii/S1877050923007974. Tenth International Conference
on Information Technology and Quantitative Management (ITQM 2023).
[55] Igor Pinheiro de Arau´jo Costa, Gabriel Custo´dio Rangel, Arthur Pin-
heiro de Arau´jo Costa, Gabriel Pereira de Oliveira Capela, Luiz Paulo
Fa´vero, Carlos Francisco Simo˜es Gomes, Marcos dos Santos, and Luiz
Frederico Hora´cio de Souza de Barros Teixeira. Multi-criteria decision-
making and machine learning techniques: A multidisciplinary analy-
sis of the world military scenario. Procedia Computer Science, 242:
184–191, 2024. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2024.08.263. URL https://www.sciencedirect.com/science/
article/pii/S1877050924019823. 11th International Conference on
Information Technology and Quantitative Management (ITQM 2024).
[56] J.R. James and C.J. Herget. Software tools for distributed intelli-
gent control systems. IFAC Proceedings Volumes, 24(10):87–90, 1991.
ISSN 1474-6670. doi: https://doi.org/10.1016/B978-0-08-041698-4.
50018-4. URL https://www.sciencedirect.com/science/
article/pii/B9780080416984500184. 3rd IFAC Workshop on Ar-
tificial Intelligence in Real-Time Control 1991, California, USA, 23-25
September 1991.
[57] Wen Jiang, Yihui Ren, and Yanping Wang. Improving anti-jamming
108
BIBLIOGRAPHY
decision-making strategies for cognitive radar via multi-agent deep re-
inforcement learning. Digital Signal Processing, 135:103952, 2023.
ISSN 1051-2004. doi: https://doi.org/10.1016/j.dsp.2023.103952. URL
https://www.sciencedirect.com/science/article/pii/
S1051200423000477.
[58] Alexandra Zabala-Lo´pez, Mario Linares-Va´squez, Sonia Haiduc, and Yezid
Donoso. A survey of data-centric technologies supporting decision-
making before deploying military assets. Defence Technology, 42:
226–246, 2024. ISSN 2214-9147. doi: https://doi.org/10.1016/j.dt.
2024.07.012. URL https://www.sciencedirect.com/science/
article/pii/S221491472400182X.
[59] Antonio A. Sa´nchez-Ruiz and Maximiliano Miranda. A machine learning
approach to predict the winner in StarCraft based on influence maps. Enter-
tainment Computing, 19:29–41, 2017. ISSN 1875-9521. doi: https://doi.org/
10.1016/j.entcom.2016.11.005. URL https://www.sciencedirect.
com/science/article/pii/S1875952116300647.
[60] Jay Liebowitz and Laura C. Davis. Sharing the solution: The need for
generic artificial intelligence decision support development tools in bat-
tle management. Computers & Industrial Engineering, 16(4):587–593,
1989. ISSN 0360-8352. doi: https://doi.org/10.1016/0360-8352(89)
90176-9. URL https://www.sciencedirect.com/science/
article/pii/0360835289901769.
[61] Shuangxi Liu, Zehuai Lin, Wei Huang, and Binbin Yan. Cur-
rent development and future prospects of multi-target assignment prob-
lem: A bibliometric analysis review. Defence Technology, 43:
44–59, 2025. ISSN 2214-9147. doi: https://doi.org/10.1016/j.dt.
2024.09.006. URL https://www.sciencedirect.com/science/
article/pii/S2214914724002228.
[62] Abu S.M. Masud, Paul Metcalf, and Don Hommertzheim. A
knowledge-based model management system for aircraft survivability
analysis. European Journal of Operational Research, 84(1):47–59,
1995. ISSN 0377-2217. doi: https://doi.org/10.1016/0377-2217(94)
00317-6. URL https://www.sciencedirect.com/science/
article/pii/0377221794003176. Decision Technology and Intelli-
gent Decision Support.
[63] Sven No˜mm and Adrian Venables. Towards generation of synthetic
data sets for hybrid conflict modelling. IFAC-PapersOnLine, 55(29):
25–30, 2022. ISSN 2405-8963. doi: https://doi.org/10.1016/j.ifacol.
2022.10.226. URL https://www.sciencedirect.com/science/
article/pii/S2405896322022510. 15th IFAC Symposium on Anal-
ysis, Design and Evaluation of Human Machine Systems HMS 2022.
109
Lauri Vasankari
[64] Jean Oh, Felipe Meneguzzi, and Katia Sycara. Chapter 11 - Probabilistic
Plan Recognition for Proactive Assistant Agents. In Gita Sukthankar, Christo-
pher Geib, Hung Hai Bui, David V. Pynadath, and Robert P. Goldman, ed-
itors, Plan, Activity, and Intent Recognition, pages 275–288. Morgan Kauf-
mann, Boston, 2014. ISBN 978-0-12-398532-3. doi: https://doi.org/10.1016/
B978-0-12-398532-3.00011-7. URL https://www.sciencedirect.
com/science/article/pii/B9780123985323000117.
[65] Jacques M. Perry, Raffaele Galliera, and Niranjan Suri. A Machine Learn-
ing Approach to the Determination of Value of Information to Operators
and Applications on the Tactical Edge. Procedia Computer Science, 205:
137–146, 2022. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2022.09.015. URL https://www.sciencedirect.com/science/
article/pii/S1877050922008808. 2022 International Conference
on Military Communication and Information Systems (ICMCIS).
[66] Dave Mechergui and Paramsothy Jayakumar. Efficient generation of accurate
mobility maps using machine learning algorithms. Journal of Terramechan-
ics, 88:53–63, 2020. ISSN 0022-4898. doi: https://doi.org/10.1016/j.jterra.
2019.12.002. URL https://www.sciencedirect.com/science/
article/pii/S0022489819301454.
[67] Matheus R.F. Mendonc¸a, Heder S. Bernardino, and Raul Fonseca
Neto. Reinforcement learning with optimized reward function for
stealth applications. Entertainment Computing, 25:37–47, 2018. ISSN
1875-9521. doi: https://doi.org/10.1016/j.entcom.2017.12.003. URL
https://www.sciencedirect.com/science/article/pii/
S1875952117300587.
[68] Yan Xia, S.S. Iyengar, and N.E. Brener. An event driven integration
reasoning scheme for handling dynamic threats in an unstructured en-
vironment. Artificial Intelligence, 95(1):169–186, 1997. ISSN 0004-
3702. doi: https://doi.org/10.1016/S0004-3702(97)00035-0. URL
https://www.sciencedirect.com/science/article/pii/
S0004370297000350.
[69] Pamul Yadav and Shiho Kim. Chapter Four - OODA loop for learn-
ing open-world novelty problems. In Shiho Kim and Ganesh Chan-
dra Deka, editors, Artificial Intelligence and Machine Learning for
Open-world Novelty, volume 134 of Advances in Computers, pages
91–130. Elsevier, 2024. doi: https://doi.org/10.1016/bs.adcom.2023.
06.002. URL https://www.sciencedirect.com/science/
article/pii/S0065245823000451.
[70] David W. Aha. The omnipresence of case-based reasoning in sci-
ence and application. Knowledge-Based Systems, 11(5):261–273,
1998. ISSN 0950-7051. doi: https://doi.org/10.1016/S0950-7051(98)
110
BIBLIOGRAPHY
00066-5. URL https://www.sciencedirect.com/science/
article/pii/S0950705198000665.
[71] Siyuan Zhao, Jiapeng Liu, Miłosz Kadzin´ski, Xiuwu Liao, and Yao
Wang. A probabilistic preference learning approach for multiple crite-
ria ranking in dynamic decision context. European Journal of Opera-
tional Research, 2025. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.
2025.08.008. URL https://www.sciencedirect.com/science/
article/pii/S0377221725006241.
[72] Ayhan Altinors, Ferhat Yol, and Orhan Yaman. A sound based method for
fault detection with statistical feature extraction in uav motors. Applied Acous-
tics, 183:108325, 2021. ISSN 0003-682X. doi: https://doi.org/10.1016/
j.apacoust.2021.108325. URL https://www.sciencedirect.com/
science/article/pii/S0003682X21004199.
[73] Gracieth Cavalcanti Batista, Johnny O¨berg, Osamu Saotome, Haroldo F. de
Campos Velho, Elcio Hideiti Shiguemori, and Ingemar So¨derquist. Ma-
chine learning algorithm partially reconfigured on FPGA for an image edge
detection system. Journal of Electronic Science and Technology, 22(2):
100248, 2024. ISSN 1674-862X. doi: https://doi.org/10.1016/j.jnlest.
2024.100248. URL https://www.sciencedirect.com/science/
article/pii/S1674862X24000168.
[74] Thierry D Fualdes and Claude J Barrouil. A common frame-
work for reasoning on uncertainty both at symbolic and numer-
ical levels. Future Generation Computer Systems, 9(4):339–347,
1993. ISSN 0167-739X. doi: https://doi.org/10.1016/0167-739X(93)
90036-O. URL https://www.sciencedirect.com/science/
article/pii/0167739X9390036O.
[75] John F. Gilmore. Military applications of expert systems. Future
Generation Computer Systems, 1(6):403–410, 1985. ISSN 0167-
739X. doi: https://doi.org/10.1016/0167-739X(85)90024-X. URL
https://www.sciencedirect.com/science/article/pii/
0167739X8590024X.
[76] R. Sutton and G.N. Roberts. Approaches to fuzzy autopilot design
optimization. IFAC Proceedings Volumes, 30(22):77–82, 1997. ISSN
1474-6670. doi: https://doi.org/10.1016/S1474-6670(17)46493-7. URL
https://www.sciencedirect.com/science/article/pii/
S1474667017464937. 4th IFAC Conference on Manoeuvring and Control
of Marine Craft (MCMC ’97), Briujuni, Croatia, 10-12 September.
[77] Amir Masoud Rahmani, Saqib Ali, Efat Yousefpoor, Mohammad Sadegh
Yousefpoor, Danial Javaheri, Pooia Lalbakhsh, Omed Hassan Ahmed, Mehdi
Hosseinzadeh, and Sang-Woong Lee. OLSR+: A new routing method based
on fuzzy logic in flying ad-hoc networks (FANETs). Vehicular Communi-
111
Lauri Vasankari
cations, 36:100489, 2022. ISSN 2214-2096. doi: https://doi.org/10.1016/
j.vehcom.2022.100489. URL https://www.sciencedirect.com/
science/article/pii/S2214209622000365.
[78] Wenyu Cai, Ziqiang Liu, Meiyan Zhang, and Chengcai Wang. Coopera-
tive artificial intelligence for underwater robotic swarm. Robotics and Au-
tonomous Systems, 164:104410, 2023. ISSN 0921-8890. doi: https://doi.org/
10.1016/j.robot.2023.104410. URL https://www.sciencedirect.
com/science/article/pii/S0921889023000490.
[79] Erhan Akbal, Ayhan Akbal, Sengul Dogan, and Turker Tuncer. An au-
tomated accurate sound-based amateur drone detection method based
on skinny pattern. Digital Signal Processing, 136:104012, 2023.
ISSN 1051-2004. doi: https://doi.org/10.1016/j.dsp.2023.104012. URL
https://www.sciencedirect.com/science/article/pii/
S1051200423001070.
[80] Jianan Wei, Ling Zhang, Junchao Yang, Molin Qin, Binyue Fan, Liu
Yang, and Shuya Cao. Machine learning–based six-channel dual-
peak photonic nose for identifying real organophosphorus nerve agents
and their simulants. Sensors and Actuators B: Chemical, 444:
138275, 2025. ISSN 0925-4005. doi: https://doi.org/10.1016/j.snb.
2025.138275. URL https://www.sciencedirect.com/science/
article/pii/S0925400525010512.
[81] Jinhong K. Guo, David Van Brackle, Nicolas LoFaso, and Martin O.
Hofmann. Extracting meaningful entities from human-generated tac-
tical reports. Procedia Computer Science, 61:72–79, 2015. ISSN
1877-0509. doi: https://doi.org/10.1016/j.procs.2015.09.153. URL
https://www.sciencedirect.com/science/article/pii/
S187705091502983X. Complex Adaptive Systems San Jose, CA
November 2-4, 2015.
[82] Xinjie Zhao and So Morikawa. Rapid assessment of large-scale
urban destruction in conflict zones using hypergraph-based visual-
structural machine learning. Journal of Engineering Research, 2024.
ISSN 2307-1877. doi: https://doi.org/10.1016/j.jer.2024.08.006. URL
https://www.sciencedirect.com/science/article/pii/
S2307187724002189.
[83] Mahdi Hashemi and Margeret Hall. Detecting and classifying on-
line dark visual propaganda. Image and Vision Computing, 89:95–
105, 2019. ISSN 0262-8856. doi: https://doi.org/10.1016/j.imavis.
2019.06.001. URL https://www.sciencedirect.com/science/
article/pii/S0262885619300848.
[84] Rabiye Kılıc¸, Nida Kumbasar, Emin Argun Oral, and Ibrahim Yu-
cel Ozbek. Drone classification using RF signal based spectral fea-
112
BIBLIOGRAPHY
tures. Engineering Science and Technology, an International Journal, 28:
101028, 2022. ISSN 2215-0986. doi: https://doi.org/10.1016/j.jestch.
2021.06.008. URL https://www.sciencedirect.com/science/
article/pii/S2215098621001403.
[85] Hyun Kwon and Sanghyun Lee. Novel Rifle Number Recognition Based
on Improved YOLO in Military Environment. Computers, Materials and
Continua, 78(1):249–263, 2024. ISSN 1546-2218. doi: https://doi.org/10.
32604/cmc.2023.042466. URL https://www.sciencedirect.com/
science/article/pii/S1546221824001747.
[86] Riddhi Mehta and Dr. Ankit Shah. An Insight into Real Time Vehicle De-
tection and Classification Methods using ML/DL based Approach. Procedia
Computer Science, 235:598–605, 2024. ISSN 1877-0509. doi: https://doi.org/
10.1016/j.procs.2024.04.059. URL https://www.sciencedirect.
com/science/article/pii/S187705092400735X. International
Conference on Machine Learning and Data Engineering (ICMLDE 2023).
[87] Nedyalko Petrov, Ivan Jordanov, and Jon Roe. Radar emitter signals recog-
nition and classification with feedforward networks. Procedia Computer
Science, 22:1192–1200, 2013. ISSN 1877-0509. doi: https://doi.org/10.
1016/j.procs.2013.09.206. URL https://www.sciencedirect.com/
science/article/pii/S187705091300999X. 17th International
Conference in Knowledge Based and Intelligent Information and Engineer-
ing Systems - KES2013.
[88] William Baker, Steven Nixon, Jeffrey Banks, Karl Reichard, and Kaitlynn
Castelle. Degrader analysis for diagnostic and predictive capabilities: A
demonstration of progress in DoD CBM+ initiatives. Procedia Computer
Science, 168:257–264, 2020. ISSN 1877-0509. doi: https://doi.org/10.
1016/j.procs.2020.02.253. URL https://www.sciencedirect.com/
science/article/pii/S1877050920303926. Complex Adaptive
Systems, Malvern, Pennsylvania, November 13-15, 2019.
[89] Ram S. Mohril, Bhupendra S. Solanki, Makarand S. Kulkarni, and
Bhupesh K. Lad. Residual life prediction in the presence of hu-
man error using machine learning. IFAC-PapersOnLine, 53(3):119–
124, 2020. ISSN 2405-8963. doi: https://doi.org/10.1016/j.ifacol.
2020.11.019. URL https://www.sciencedirect.com/science/
article/pii/S2405896320301634. 4th IFAC Workshop on Ad-
vanced Maintenance Engineering, Services and Technologies - AMEST 2020.
[90] Antonio Candelieri, Raul Sormani, Gaia Arosio, Ilaria Giordani, and
Francesco Archetti. A Hyper-solution Framework for SVM Classification:
Improving Damage Detection on Helicopter Fuselage Panels. AASRI Proce-
dia, 4:31–36, 2013. ISSN 2212-6716. doi: https://doi.org/10.1016/j.aasri.
2013.10.006. URL https://www.sciencedirect.com/science/
113
Lauri Vasankari
article/pii/S2212671613000073. 2013 AASRI Conference on In-
telligent Systems and Control.
[91] Beibei Li, Bin Feng, and Li Chen. A graph network-based learn-
able simulator for spatial-temporal prediction of rigid projectile pen-
etration. International Journal of Impact Engineering, 195:105123,
2025. ISSN 0734-743X. doi: https://doi.org/10.1016/j.ijimpeng.
2024.105123. URL https://www.sciencedirect.com/science/
article/pii/S0734743X24002483.
[92] Donald B. Malkoff. A framework for real-time fault detection and diag-
nosis using temporal data. Artificial Intelligence in Engineering, 2(2):97–
111, 1987. ISSN 0954-1810. doi: https://doi.org/10.1016/0954-1810(87)
90144-0. URL https://www.sciencedirect.com/science/
article/pii/0954181087901440.
[93] Nikolaos Vasilikis, Rinze Geertsma, and Andrea Coraddu. A digital twin
approach for maritime carbon intensity evaluation accounting for opera-
tional and environmental uncertainty. Ocean Engineering, 288:115927,
November 2023. ISSN 0029-8018. doi: https://doi.org/10.1016/j.oceaneng.
2023.115927. URL https://www.sciencedirect.com/science/
article/pii/S0029801823023119.
[94] Petros Boutselis and Ken McNaught. Using Bayesian Networks to fore-
cast spares demand from equipment failures in a changing service lo-
gistics context. International Journal of Production Economics, 209:
325–333, 2019. ISSN 0925-5273. doi: https://doi.org/10.1016/j.ijpe.
2018.06.017. URL https://www.sciencedirect.com/science/
article/pii/S0925527318302615. The Proceedings of the 19th In-
ternational Symposium on Inventories.
[95] Jason Whelan, Abdulaziz Almehmadi, and Khalil El-Khatib. Arti-
ficial intelligence for intrusion detection systems in unmanned aerial
vehicles. Computers and Electrical Engineering, 99:107784, 2022.
ISSN 0045-7906. doi: https://doi.org/10.1016/j.compeleceng.2022.
107784. URL https://www.sciencedirect.com/science/
article/pii/S0045790622000842.
[96] Maulik Sojitra, Nilesh Kumar Jadav, Rajesh Gupta, Usha Patel, Janam
Patel, Sudeep Tanwar, Giovanni Pau, Fayez Alqahtani, and Amr Tolba.
Interplay of ml and blockchain for secure internet of military vehi-
cles communication underlying 5g. Ad Hoc Networks, 178:103968,
2025. ISSN 1570-8705. doi: https://doi.org/10.1016/j.adhoc.2025.
103968. URL https://www.sciencedirect.com/science/
article/pii/S1570870525002161.
[97] Clara Maathuis and Kasper Cools. The Role of AI in Military Cyber Se-
curity: Data Insights and Evaluation Methods. Procedia Computer Sci-
114
BIBLIOGRAPHY
ence, 254:191–200, 2025. ISSN 1877-0509. doi: https://doi.org/10.
1016/j.procs.2025.02.078. URL https://www.sciencedirect.com/
science/article/pii/S1877050925004284. International Con-
ference on Digital Sovereignty (ICDS).
[98] Bandar Almaslukh. Deep learning and entity embedding-based intrusion de-
tection model for wireless sensor networks. Computers, Materials and Con-
tinua, 69(1):1343–1360, 2021. ISSN 1546-2218. doi: https://doi.org/10.
32604/cmc.2021.017914. URL https://www.sciencedirect.com/
science/article/pii/S1546221821011401.
[99] Shahaboddin Shamshirband, Nor Badrul Anuar, Miss Laiha Mat Kiah, and
Ahmed Patel. An appraisal and design of a multi-agent system based
cooperative wireless intrusion detection computational intelligence tech-
nique. Engineering Applications of Artificial Intelligence, 26(9):2105–
2127, 2013. ISSN 0952-1976. doi: https://doi.org/10.1016/j.engappai.
2013.04.010. URL https://www.sciencedirect.com/science/
article/pii/S0952197613000766.
[100] Joseph C. Hoecherl, Matthew J. Robbins, Brett J. Borghetti, and Raymond R.
Hill. Partially autoregressive machine learning: Development and testing of
methods to predict United States Air Force retention. Computers & Industrial
Engineering, 171:108424, 2022. ISSN 0360-8352. doi: https://doi.org/10.
1016/j.cie.2022.108424. URL https://www.sciencedirect.com/
science/article/pii/S0360835222004612.
[101] Devin Wasilefsky, William N. Caballero, Chancellor Johnstone, Nathan
Gaw, and Phillip R. Jenkins. Responsible machine learning for United
States Air Force pilot candidate selection. Decision Support Systems,
180:114198, 2024. ISSN 0167-9236. doi: https://doi.org/10.1016/j.dss.
2024.114198. URL https://www.sciencedirect.com/science/
article/pii/S0167923624000319.
[102] Yuhan Zhang, Yishu Wei, Yanshan Wang, Yunyu Xiao, COL Ret.
Ronald K. Poropatich, Gretchen L. Haas, Yiye Zhang, Chunhua Weng,
Jinze Liu, Lisa A. Brenner, James M. Bjork, and Yifan Peng. Ma-
chine learning applications related to suicide in military and veterans:
A scoping literature review. Journal of Biomedical Informatics, 167:
104848, 2025. ISSN 1532-0464. doi: https://doi.org/10.1016/j.jbi.
2025.104848. URL https://www.sciencedirect.com/science/
article/pii/S1532046425000772.
[103] Richard M. Satava. Virtual reality and telepresence for military medicine.
Computers in Biology and Medicine, 25(2):229–236, 1995. ISSN
0010-4825. doi: https://doi.org/10.1016/0010-4825(94)00006-C. URL
https://www.sciencedirect.com/science/article/pii/
001048259400006C. Virtual Reality for Medicine.
115
Lauri Vasankari
[104] Nizam U. Ahamed, Kellen T. Krajewski, Camille C. Johnson, Adam J.
Sterczala, Julie P. Greeves, Sophie L. Wardle, Thomas J. O’Leary, Qi Mi,
Shawn D. Flanagan, Bradley C. Nindl, and Chris Connaboy. Using machine
learning and wearable inertial sensor data for the classification of fractal gait
patterns in women and men during load carriage. Procedia Computer Science,
185:282–291, 2021. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2021.05.030. URL https://www.sciencedirect.com/science/
article/pii/S1877050921011121.
[105] Aashay Gondalia, Dhruv Dixit, Shubham Parashar, Vijayanand Raghava, Ani-
mesh Sengupta, and Vergin Raja Sarobin. IoT-based Healthcare Monitor-
ing System for War Soldiers using Machine Learning. Procedia Computer
Science, 133:1005–1013, 2018. ISSN 1877-0509. doi: https://doi.org/10.
1016/j.procs.2018.07.075. URL https://www.sciencedirect.com/
science/article/pii/S1877050918310202. International Con-
ference on Robotics and Smart Manufacturing (RoSMa2018).
[106] Mustafa Canan, Andres Sousa-Poza, and Anthony Dean. Complex
adaptive behavior of hybrid teams. Procedia Computer Science, 114:
139–148, 2017. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2017.09.013. URL https://www.sciencedirect.com/science/
article/pii/S1877050917318070. Complex Adaptive Systems
Conference with Theme: Engineering Cyber Physical Systems, CAS Octo-
ber 30 – November 1, 2017, Chicago, Illinois, USA.
[107] T.B. Sheridan. On trusting C3I, particularly in SDI: When the
PIE meets the sky. IFAC Proceedings Volumes, 19(8):57–62, 1986.
ISSN 1474-6670. doi: https://doi.org/10.1016/B978-0-08-034915-2.
50016-0. URL https://www.sciencedirect.com/science/
article/pii/B9780080349152500160. IFAC Workshop on Contri-
butions of Technology to International Conflict Resolution, Cleveland, OH,
USA, 3-5 June 1986.
[108] T. Wittig and R. Onken. Knowledge based cockpit assistant for con-
trolled airspace flight operation. IFAC Proceedings Volumes, 25(9):195–
200, 1992. ISSN 1474-6670. doi: https://doi.org/10.1016/S1474-6670(17)
50192-5. URL https://www.sciencedirect.com/science/
article/pii/S1474667017501925. 5th IFAC Symposium on Analy-
sis, Design and Evaluation of Man-Machine Systems (MMS’92), The Hague,
The Netherlands, 9-11 June 1992.
[109] Maria Grazia De Giorgi and Marco Quarta. Hybrid multigene genetic
programming - artificial neural networks approach for dynamic perfor-
mance prediction of an aeroengine. Aerospace Science and Technology,
103:105902, 2020. ISSN 1270-9638. doi: https://doi.org/10.1016/j.ast.
116
BIBLIOGRAPHY
2020.105902. URL https://www.sciencedirect.com/science/
article/pii/S1270963820305848.
[110] Rezoanul Hafiz Chandan, Nusrat Sharmin, Muhaimin Bin Munir, Abdur Raz-
zak, Tanvir Ahamad Naim, Tasneem Mubashshira, and Mokhlesur Rahman.
AI-based small arms firing skill evaluation system in the military domain. De-
fence Technology, 29:164–180, 2023. ISSN 2214-9147. doi: https://doi.org/
10.1016/j.dt.2023.02.024. URL https://www.sciencedirect.com/
science/article/pii/S221491472300051X.
[111] Lan Yang, Junqi Guo, Rongfang Bie, Anton Umek, and Anton Kos. Ma-
chine learning based accuracy prediction model for augmented biofeed-
back in precision shooting. Procedia Computer Science, 174:358–
363, 2020. ISSN 1877-0509. doi: https://doi.org/10.1016/j.procs.
2020.06.099. URL https://www.sciencedirect.com/science/
article/pii/S1877050920316197. 2019 International Conference
on Identification, Information and Knowledge in the Internet of Things.
[112] Philip Klahr. Artificial intelligence approaches to simulation. In D.J. Murray-
Smith, editor, UKSC 84, pages 87–92. Butterworth-Heinemann, 1984. ISBN
978-0-408-01504-2. doi: https://doi.org/10.1016/B978-0-408-01504-2.
50014-4. URL https://www.sciencedirect.com/science/
article/pii/B9780408015042500144.
[113] B.M. Knapp, A.R. Dudley, and J.S. Ryder. Modelling techniques for simu-
lation of submarine engagements. Mathematical and Computer Modelling,
12(8):1048–1049, 1989. ISSN 0895-7177. doi: https://doi.org/10.1016/
0895-7177(89)90219-7. URL https://www.sciencedirect.com/
science/article/pii/0895717789902197.
[114] Geethanjali Govindarajan, Gulam Nabi Alsath Mohammed, Abhishek Pre-
manand, and Kirubaveni Savarimuthu. Optimum design of a novel Ku-
band rasorber for RADAR warfare systems using ML neural network.
AEU - International Journal of Electronics and Communications, 185:
155453, 2024. ISSN 1434-8411. doi: https://doi.org/10.1016/j.aeue.
2024.155453. URL https://www.sciencedirect.com/science/
article/pii/S143484112400339X.
[115] Jinrui Li, Guohua Wu, and Ling Wang. A comprehensive survey
of weapon target assignment problem: Model, algorithm, and applica-
tion. Engineering Applications of Artificial Intelligence, 137:109212,
2024. ISSN 0952-1976. doi: https://doi.org/10.1016/j.engappai.2024.
109212. URL https://www.sciencedirect.com/science/
article/pii/S0952197624013708.
[116] Moayad Aloqaily, Ouns Bouachir, and Ismaeel Al Ridhawi. UAV-supported
communication: Current and prospective solutions. Vehicular Communi-
cations, 54:100923, 2025. ISSN 2214-2096. doi: https://doi.org/10.1016/
117
Lauri Vasankari
j.vehcom.2025.100923. URL https://www.sciencedirect.com/
science/article/pii/S2214209625000506.
[117] Xiaoyan Wang, Jingjing Yang, Zixiao Peng, Shunfang Wang, and
Ming Huang. Hilbert signal envelope-based multi-features methods
for GNSS spoofing detection. Computers & Security, 144:103959,
2024. ISSN 0167-4048. doi: https://doi.org/10.1016/j.cose.2024.
103959. URL https://www.sciencedirect.com/science/
article/pii/S0167404824002645.
[118] Harriet H. Kagiwada. Military modelling and computing: Where do
we go from here? Mathematical and Computer Modelling, 11:693–
698, 1988. ISSN 0895-7177. doi: https://doi.org/10.1016/0895-7177(88)
90582-1. URL https://www.sciencedirect.com/science/
article/pii/0895717788905821.
[119] North Atlantic Treaty Organization. Summary of
NATO’s revised Artificial Intelligence (AI) strategy, July
2024. URL https://www.nato.int/en/about-us/
official-texts-and-resources/official-texts/2024/
07/10/summary-of-natos-revised-artificial-intelligence-ai-strategy.
Accessed 2026-03-07.
[120] U.S. Department of Defense. Ethical Principles for Artificial Intelli-
gence. https://www.defense.gov/News/Releases/Release/
Article/2091996/, February 2020. Accessed 2026-03-07.
[121] National Security Commission on Artificial Intelligence. Final Report. Tech-
nical report, NSCAI, 2021. URL https://reports.nscai.gov/
final-report/.
[122] North Atlantic Treaty Organization. NATO DIANA Announces Companies
Chosen for the Next Phase of the Accelerator Programme. https://www.
nato.int/cps/en/natohq/news_228518.htm, 2024. NATO News
Release, 9 Oct 2024.
[123] U.S. Department of Defense. AUKUS Pillar II Milestones Hint at Future
Integrated Autonomous, Artificial Intelligence Operations. https://www.
defense.gov/News/Releases/Release/Article/3867890/,
August 2024.
[124] U.S. Department of Defense. DoD Directive 3000.09: Autonomy
in Weapon Systems. Technical report, Department of Defense,
January 2023. URL https://media.defense.gov/2023/
Jan/25/2003149928/-1/-1/0/DOD-DIRECTIVE-3000.
09-AUTONOMY-IN-WEAPON-SYSTEMS.PDF.
[125] International Committee of the Red Cross. Submission on Autonomous
Weapon Systems to the United Nations Secretary-General. https://www.
icrc.org/sites/default/files/wysiwyg/war-and-law/
118
BIBLIOGRAPHY
icrc_submission_on_autonomous_weapons_to_unsg.pdf,
2024.
[126] UK Ministry of Defence. Defence Artificial Intelligence Strat-
egy. https://www.gov.uk/government/publications/
defence-artificial-intelligence-strategy, 2022.
[127] UK Ministry of Defence. JSP 936 V1.1: Dependable Artifi-
cial Intelligence (AI) in Defence–Part 1: Directive. Techni-
cal report, UK Ministry of Defence, November 2024. URL
https://assets.publishing.service.gov.uk/media/
6735fc89f6920bfb5abc7b62/JSP936_Part1.pdf.
[128] Edmund J. Burke, Kristen Gunness, Cortez A. Cooper III, and Mark
Cozad. People’s Liberation Army Operational Concepts. Technical Report
RR-A394-1, RAND Corporation, 2020. URL https://www.rand.org/
pubs/research_reports/RRA394-1.html.
[129] Heiko Borchert, Torben Schu¨tz, and Joseph Verbovszky, editors. The
Very Long Game: 25 Case Studies on the Global State of Defense
AI. Contributions to Security and Defence Studies. Springer, 2024.
doi: 10.1007/978-3-031-58649-1. URL https://doi.org/10.1007/
978-3-031-58649-1.
[130] Peter Highnam. The Defense Advanced Research Projects Agency’s Ar-
tificial Intelligence Vision. AI Magazine, 41(2):83–85, 2020. doi: 10.
1609/aimag.v41i2.5301. URL https://doi.org/10.1609/aimag.
v41i2.5301.
[131] Allen Newell. Some problems of basic organization in problem-solving pro-
grams. Technical Report RM-3283-PR, RAND Corporation, Santa Mon-
ica, CA, 1962. URL https://www.rand.org/pubs/research_
memoranda/RM3283.html.
[132] A. Klinger. Natural language, linguistic processing, and speech understand-
ing: Recent research and future goals. Technical Report R-1377, RAND
Corporation, Santa Monica, CA, 1973. URL https://www.rand.org/
pubs/reports/R1377.html.
[133] Robert M. Kaplan. The mind system: A grammar-rule language.
Technical Report RM-6265/1-PR, RAND Corporation, Santa Monica,
CA, 1970. URL https://www.rand.org/pubs/research_
memoranda/RM6265z1.html.
[134] M. E. Maron. Artificial intelligence and brain mechanisms. Technical Report
RM-3522-PR, RAND Corporation, Santa Monica, CA, 1963. URL https:
//www.rand.org/pubs/research_memoranda/RM3522.html.
[135] Allan M. Din. Arms and Artificial Intelligence: Weapons and Arms Control
Applications of Advanced Computing. Oxford University Press for SIPRI,
Oxford, 1987. ISBN 0-19-829122-1.
119
Lauri Vasankari
[136] Carl G. Jacobsen. The Uncertain Course: New Weapons, Strategies and Mind-
sets. Oxford University Press for SIPRI, Oxford, 1987. ISBN 0-19-829115-9.
[137] Li Ang Zhang, Yusuf Ashpari, and Anthony Jacques. Understanding the
Limits of Artificial Intelligence for Warfighters: Volume 3, Predictive Main-
tenance. Research Report RRA1722-3, RAND Corporation, Santa Mon-
ica, CA, 1 2024. URL https://www.rand.org/pubs/research_
reports/RRA1722-3.html.
[138] Joshua Steier, Erik Van Hegewald, Anthony Jacques, Gavin S. Hartnett, and
Lance Menthe. Understanding the limits of artificial intelligence for warfight-
ers: Volume 2, distributional shift in cybersecurity datasets. Research Report
RRA1722-2, RAND Corporation, Santa Monica, CA, 1 2024. URL https:
//www.rand.org/pubs/research_reports/RRA1722-2.html.
[139] Theodora Ogden, Anna Knack, Me´lusine Lebret, James Black, and
Vasilios Mavroudis. The Role of the Space Domain in the Rus-
sia–Ukraine War: The Impact of Converging Space and AI Technolo-
gies. External Publication EP-70408, RAND Corporation and RAND Eu-
rope, feb 2024. URL https://www.rand.org/pubs/external_
publications/EP70408.html.
[140] Vincent Boulanin. Mapping the innovation ecosystem driving the
advance of autonomy in weapon systems. SIPRI working pa-
per, Stockholm International Peace Research Institute, dec 2016.
URL https://www.sipri.org/sites/default/files/
Mapping-innovation-ecosystem-driving-autonomy-in-weapon-systems.
pdf.
[141] Vincent Boulanin and Maaike Verbruggen. Mapping the devel-
opment of autonomy in weapon systems. SIPRI white paper,
Stockholm International Peace Research Institute, nov 2017. URL
https://www.sipri.org/sites/default/files/2017-11/
siprireport_mapping_the_development_of_autonomy_in_
weapon_systems_1117_1.pdf.
[142] Vincent Boulanin, Netta Goussac, Laura Bruun, and Luke Richards. Re-
sponsible military use of artificial intelligence: Can the European Union
lead the way in developing best practice? SIPRI policy report, Stockholm
International Peace Research Institute, nov 2020. URL https://www.
sipri.org/sites/default/files/2020-11/responsible_
military_use_of_artificial_intelligence.pdf.
[143] Vincent Boulanin, Kolja Brockmann, and Luke Richards. Re-
sponsible artificial intelligence research and innovation for interna-
tional peace and security. Policy report, Stockholm International
Peace Research Institute (SIPRI), November 2020. URL https:
120
BIBLIOGRAPHY
//www.sipri.org/publications/2020/policy-reports/
responsible-artificial-intelligence-research-and-innovation-international-peace-and-security.
[144] Fei Su, Vladislav Chernavskikh, and Wilfred Wan. Advancing
Governance at the Nexus of Artificial Intelligence and Nuclear
Weapons. SIPRI insights on peace and security, Stockholm In-
ternational Peace Research Institute (SIPRI), Stockholm, March
2025. URL https://www.sipri.org/publications/
2025/sipri-insights-peace-and-security/
advancing-governance-nexus-artificial-intelligence-and-nuclear-weapons.
[145] Jon Schmid, Chad J. R. Ohlandt, and Shawn Cochran. Net technical assess-
ment: A methodology for assessing military technology competition. Tech-
nical Report RR-A1350-1, RAND Corporation, May 2024. URL https:
//www.rand.org/pubs/research_reports/RRA1350-1.html.
[146] Alexander Blanchard and Laura Bruun. Bias in military artificial intelligence.
SIPRI background paper, Stockholm International Peace Research Institute
(SIPRI), Stockholm, December 2024. URL https://www.sipri.
org/publications/2024/sipri-background-papers/
bias-military-artificial-intelligence.
[147] Paul K. Davis and Paul Bracken. Artificial Intelligence for Wargam-
ing and Modeling. Journal of Defense Modeling and Simulation: Ap-
plications, Methodology, Technology, 19(3):1–16, 2022. doi: 10.
1177/15485129211073126. URL https://www.rand.org/pubs/
external_publications/EP68860.html. RAND external publica-
tion EP-68860.
[148] Edward Geist, Aaron B. Frank, and Lance Menthe. Understanding the Lim-
its of Artificial Intelligence for Warfighters: Volume 4, Wargames. Re-
search Report RRA1722-4, RAND Corporation, Santa Monica, CA, jan
2024. URL https://www.rand.org/pubs/research_reports/
RRA1722-4.html.
[149] Headquarters, Supreme Allied Commander Transformation (HQ
SACT). NATO Wargaming Handbook. NATO Allied Com-
mand Transformation, Norfolk, VA, USA, 2023. URL https:
//paxsims.wordpress.com/wp-content/uploads/2023/
09/nato-wargaming-handbook-202309.pdf. First version
publicly disclosed.
[150] James Black, Rebecca Lucas, John Kennedy, Megan Hughes, and Harper
Fine. Command and control in the future: Concept paper: Grappling with
complexity. Technical Report RR-A2476-1, RAND Corporation, January
2024. URL https://www.rand.org/pubs/research_reports/
RRA2476-1.html.
[151] David Schulker, Matthew Walsh, Avery Calkins, Monique Graham, Cheryl K.
121
Lauri Vasankari
Montemayor, Albert A. Robbert, Sean Robson, Claude M. Setodji, Joshua
Snoke, Joshua Williams, and Li Ang Zhang. Leveraging machine learning
to improve human resource management: Volume 1, key findings and rec-
ommendations for policymakers. Research Report RR-A1745-1, RAND Cor-
poration, Santa Monica, CA, 2 2024. URL https://www.rand.org/
pubs/research_reports/RRA1745-1.html.
[152] Irineo Cabreros, Joshua Snoke, Osonde A. Osoba, Inez Khan, and Marc N.
Elliott. Advancing Equitable Decisionmaking for the Department of De-
fense Through Fairness in Machine Learning. Research Report RR-A1542-
1, RAND Corporation, 6 2023. URL https://www.rand.org/pubs/
research_reports/RRA1542-1.html.
[153] Vincent Boulanin, editor. The Impact of Artificial Intelligence on Strate-
gic Stability and Nuclear Risk, Volume I, Euro-Atlantic perspectives.
Stockholm International Peace Research Institute, may 2019. URL
https://www.sipri.org/sites/default/files/2019-05/
sipri1905-ai-strategic-stability-nuclear-risk.pdf.
[154] Petr Topychkanov, editor. The Impact of Artificial Intelligence on
Strategic Stability and Nuclear Risk, Volume III, South Asian Perspec-
tives. Stockholm International Peace Research Institute, apr 2020. URL
https://www.sipri.org/sites/default/files/2020-04/
impact_of_ai_on_strategic_stability_and_nuclear_
risk_vol_iii_topychkanov_1.pdf.
[155] Lora Saalman, editor. The Impact of Artificial Intelligence on Strate-
gic Stability and Nuclear Risk, Volume II, East Asian Perspectives.
Stockholm International Peace Research Institute, apr 2020. URL
https://www.sipri.org/sites/default/files/2019-10/
the_impact_of_artificial_intelligence_on_strategic_
stability_and_nuclear_risk_volume_ii.pdf.
[156] Vincent Boulanin, Lora Saalman, Petr Topychkanov, Fei Su, and Moa Pelda´n
Carlsson. Artificial intelligence, strategic stability and nuclear risk. SIPRI
report, Stockholm International Peace Research Institute, jun 2020. URL
https://www.sipri.org/sites/default/files/2020-06/
artificial_intelligence_strategic_stability_and_
nuclear_risk.pdf.
[157] Vladislav Chernavskikh. Nuclear weapons and artificial intelligence:
Technological promises and practical realities. SIPRI background pa-
per, Stockholm International Peace Research Institute, sep 2024. URL
https://www.sipri.org/sites/default/files/2024-09/
bp_2409_ai-nuclear.pdf.
[158] Nivedita Raju and Wilfred Wan. Escalation risks at the space–nuclear
nexus. SIPRI insights on peace and security, Stockholm In-
122
BIBLIOGRAPHY
ternational Peace Research Institute, feb 2021. URL https:
//www.sipri.org/sites/default/files/2024-02/2402_
rpp_space-nuclear_nexus.pdf.
[159] Edward Geist and Andrew J. Lohn. How Might Artificial Intelligence Affect
the Risk of Nuclear War? Technical Report PE-296-RC, RAND Corporation,
April 2018. URL https://www.rand.org/pubs/perspectives/
PE296.html.
[160] Center for a New American Security. Paul scharre wins colby
award for book “army of none”. Press release, April 2019.
URL https://www.cnas.org/press/press-release/
paul-scharre-wins-colby-award-for-book-army-of-none.
Accessed 2026-03-07.
[161] Paul Scharre. Army of None: Autonomous Weapons and the Future of War.
W. W. Norton & Company, 2018. ISBN 9780393608984.
[162] Sam J. Tangredi and George V. Galdorisi. AI at War: How Big Data, Artifi-
cial Intelligence, and Machine Learning Are Changing Naval Warfare. Naval
Institute Press, 2021. ISBN 9781682476345.
[163] Christian Brose. The Kill Chain: Defending America in the Future of High-
Tech Warfare. Hachette Books, New York, NY, 2020.
[164] Jonathan Wong. Book review: “The Kill Chain: Defending America
in the Future of High-Tech Warfare”. RAND Commentary, July 2020.
URL https://www.rand.org/pubs/commentary/2020/07/
book-review-the-kill-chain-defending-america-in-the.
html. Accessed 2026-03-07.
[165] NDU Press. The Kill Chain: Defending America in the Fu-
ture of High-Tech Warfare. National Defense University Press
news / review, 2020. URL https://ndupress.ndu.edu/
Media/News/News-Article-View/Article/2541993/
the-kill-chain-defending-america-in-the-future-of-high-tech-warfare/.
Accessed 2026-03-07.
[166] Paul Scharre. Four Battlegrounds: Power in the Age of Artificial Intelligence.
W. W. Norton & Company, February 2023. ISBN 9780393866865. Hardcover
edition.
[167] Robert J. Bunker. Book review: Four battlegrounds: Power in the age
of artificial intelligence. Parameters Bookshelf (U.S. Army War College
Press), October 2023. URL https://press.armywarcollege.edu/
parameters_bookshelf/27/. Publication date: 2023-10-17. Accessed
2026-03-07.
[168] U.S. Department of Defense. Data, Analytics, and Artificial
Intelligence Adoption Strategy: Accelerating Decision Advan-
tage. Technical report, Department of Defense, June 2023. URL
123
Lauri Vasankari
https://media.defense.gov/2023/Nov/02/2003333300/
-1/-1/1/DOD_DATA_ANALYTICS_AI_ADOPTION_STRATEGY.PDF.
[169] Augusta Ada Lovelace. Sketch of the Analytical Engine Invented by Charles
Babbage, by L. F. Menabrea, with Notes by the Translator. Scientific Mem-
oirs, Selected from the Transactions of Foreign Academies of Science and
Learned Societies, 3:666–731, 1843. URL https://www.gutenberg.
org/ebooks/75107. Translation and notes by Ada Lovelace.
[170] George Boole. An Investigation of the Laws of Thought on Which are
Founded the Mathematical Theories of Logic and Probabilities. Walton
and Maberly, London, 1854. URL https://www.gutenberg.org/
files/15114/15114-pdf.pdf. Modern digital version published by
Project Gutenberg.
[171] Thomas Bayes. An essay towards solving a problem in the doctrine of
chances. Philosophical Transactions of the Royal Society of London, 53:370–
418, 1763. Edited and published posthumously by Richard Price.
[172] New World Encyclopedia contributors. Euclid, 2023. URL https://www.
newworldencyclopedia.org/entry/Euclid. Accessed: 2026-03-
07.
[173] Christopher M. Bishop and Hugh Bishop. Deep Learning: Foundations and
Concepts. Springer, 2024.
[174] Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge univer-
sity press, 2 edition, 2013. doi: https://doi.org/10.1017/CBO9780511803161.
[175] Alan M. Turing. Computing machinery and intelligence. Mind, 59(236):433–
460, 1950. doi: 10.1093/mind/LIX.236.433.
[176] A. Newell and H. Simon. The logic theory machine–a complex information
processing system. IRE Transactions on Information Theory, 2(3):61–79,
1956. doi: 10.1109/TIT.1956.1056797. URL https://ieeexplore.
ieee.org/document/1056797.
[177] Thomas Haigh. How the ai boom went bust. Communications of the ACM,
January 2024. doi: 10.1145/3634901. URL https://cacm.acm.org/
opinion/how-the-ai-boom-went-bust/.
[178] Edward A. Feigenbaum, Bruce G. Buchanan, and Joshua Lederberg. On gen-
erality and problem solving: A case study using the dendral program. In
B. Meltzer and D. Michie, editors, Machine Intelligence 6, pages 165–190.
Edinburgh University Press, Edinburgh, 1971.
[179] Edward H. Shortliffe. Computer-Based Medical Consultations: MYCIN. Else-
vier, 1976. ISBN 978-0444569691. URL https://www.shortliffe.
net/Shortliffe-1976/MYCIN%20thesis%20Book.htm.
[180] Frank Rosenblatt. The perceptron: A probabilistic model for information
storage and organization in the brain. Psychological Review, 65(6):386–408,
1958. doi: 10.1037/h0042519.
124
BIBLIOGRAPHY
[181] Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Compu-
tational Geometry. MIT Press, Cambridge, MA, 1969. ISBN 9780262343930.
URL https://direct.mit.edu/books/monograph/3132/
PerceptronsAn-Introduction-to-Computational.
[182] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. A training
algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual
Workshop on Computational Learning Theory, COLT ’92, page 144–152,
New York, NY, USA, 1992. Association for Computing Machinery. ISBN
089791497X. doi: 10.1145/130385.130401. URL https://doi.org/
10.1145/130385.130401.
[183] John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude E. Shan-
non. A Proposal for the Dartmouth Summer Research Project on Artificial
Intelligence. Technical report, Dartmouth College / Rockefeller Foundation,
August 1955. Proposal formalized term “Artificial Intelligence”; workshop
held summer 1956.
[184] DeepMind. Gemini: Google deepmind’s multimodal ai model. https://
deepmind.google/technologies/gemini, 2023. Accessed: 2026-
03-08.
[185] Anthropic. Model card and evaluations for claude mod-
els. Technical report, Anthropic, July 2023. URL https:
//www-cdn.anthropic.com/files/4zrzovbb/website/
bd2a28d2535bfb0494cc8e2a3bf135d2e7523226.pdf. Claude 2
released July 2023. Accessed 2026-02-21.
[186] xAI. Grok by xai. https://x.ai, 2023. Accessed: 2026-03-08.
[187] Financial Times. Meta’s AI chief says large language models will not reach
human intelligence. Financial Times, May 2024. URL https://www.ft.
com/content/23fab126-f1d3-4add-a457-207a25730ad9. Ac-
cessed 22 July 2025.
[188] Heise Online. Meta’s head of AI: Yann LeCun does not believe in the future
of generative AI, July 2025. https://www.heise.de/en/news/
Meta-s-head-of-AI-Yann-LeCun-does-not-believe-in-the-future-of-generative-AI-10276181.
html. Accessed 2026-03-07.
[189] Association for the Advancement of Artificial Intelligence (AAAI). AAAI
2025 Presidential Panel on the Future of AI Research. Presidential panel
report, Association for the Advancement of Artificial Intelligence, March
2025. URL https://aaai.org/wp-content/uploads/2025/
03/AAAI-2025-PresPanel-Report-Digital-3.7.25.pdf.
[190] Sam Altman. Reflections, January 2025. URL https://blog.
samaltman.com/reflections. Accessed 2026-03-07.
[191] Anca Dragan, Rohin Shah, Four Flynn, and Shane Legg. Taking a responsi-
125
Lauri Vasankari
ble path to agi, April 2025. URL https://deepmind.google/blog/
taking-a-responsible-path-to-agi/. Accessed 2026-03-07.
[192] Dario Amodei. Machines of Loving Grace: How AI Could Transform the
World for the Better, October 2024. URL https://www.darioamodei.
com/essay/machines-of-loving-grace. Accessed 2026-03-07.
[193] Parshin Shojaee*†, Iman Mirzadeh*, Keivan Alizadeh, Maxwell Horton,
Samy Bengio, and Mehrdad Farajtabar. The illusion of thinking: Understand-
ing the strengths and limitations of reasoning models via the lens of prob-
lem complexity, 2025. URL https://ml-site.cdn-apple.com/
papers/the-illusion-of-thinking.pdf.
[194] Marina Mancoridis, Bec Weeks, Keyon Vafa, and Sendhil Mullainathan.
Potemkin Understanding in Large Language Models. In Proceedings of the
42nd International Conference on Machine Learning (ICML 2025), Vancou-
ver, Canada, July 2025. doi: 10.48550/arXiv.2506.21521. URL https:
//icml.cc/virtual/2025/poster/44050. Poster #E-2703.
[195] ARC Prize Foundation. Arc-agi leaderboard. https://arcprize.org/
leaderboard, 2025. Accessed: 2025-06-20.
[196] Peter Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas
Reinartz, Colin Shearer, and Ru¨diger Wirth. CRISP-DM 1.0: Step-by-step
data mining guide. Technical report, CRISP-DM Consortium (NCR,
Daimler-Chrysler, SPSS, OHRA), January 2000. Published under ESPRIT
project; non-proprietary data mining methodology.
[197] Flavius Vegetius Renatus. Epitome of Military Science. Number 16 in Trans-
lated Texts for Historians. Liverpool University Press, Liverpool, 1993. Trans-
lated with introduction and notes.
[198] Niccolo` Machiavelli. The Prince. Dover Publications, 1992. Original work
published 1532; translated from Italian.
[199] Niccolo` Machiavelli. The Art of War. University of Chicago Press, 2005. Orig-
inally published in 1521; translated with introduction and notes by Christo-
pher Lynch.
[200] Nina Wile´n and Lisa Stro¨mbom. A versatile organisation: Mapping the mil-
itary’s core roles in a changing security environment. European Journal of
International Security, 7(1):18–37, 2022. doi: 10.1017/eis.2021.27.
[201] Jukka Anteroinen. Enhancing the Development of Military Capabilities by
a Systems Approach. PhD thesis, National Defence University, Finland,
Helsinki, 2013. Doctoral dissertation, Publication Series No. 33, Department
of Defence Technology.
[202] Max Weber. Economy and Society: An Outline of Interpretive Sociology.
University of California Press, 1978. Edited by Guenther Roth and Claus
Wittich; translated by Frank H. Knight, Ephraim Fischoff et al.
126
BIBLIOGRAPHY
[203] U.S. Army. FM 3-90 Chapter 1: The Art of Tactics, 2013. URL https:
//rdl.train.army.mil/catalog-ws/view/100.ATSC/
17614720-DF1D-40BE-9123-F80680BF3974-1274406509298/
fm3_90.pdf. Approved for public release.
[204] Joint Chiefs of Staff. Joint Publication 3-0: Joint Operations. Joint Chiefs
of Staff, 8 2011. URL https://edocs.nps.edu/dodpubs/topic/
jointpubs/JP3/JP3_0_110811.pdf. Accessed 2026-02-22.
[205] USAF College of Aerospace Doctrine, Research and Education (CADRE).
Three levels of war. Technical report, Air University Press, Maxwell
AFB, AL, 1997. URL https://faculty.cc.gatech.edu/
˜tpilsch/INTA4803TP/Articles/Three%20Levels%20of%
20War%3DCADRE-excerpt.pdf. Excerpt from Air and Space Power
Mentoring Guide, Vol. 1; accessed 2026-02-22.
[206] Department of the Army. Field Manual 100-5: Operations. Head-
quarters, Department of the Army, Washington, DC, 8 1982. URL
https://cgsc.contentdm.oclc.org/digital/collection/
p4013coll9/id/976. Accessed 2026-02-22.
[207] Headquarters, Department of the Army. Field Manual (FM) 3-0: Operations.
Washington, DC, March 2025.
[208] U.S. Army Combined Arms Center, Center for Army Lessons
Learned. 23-07 (594) Military Decision-Making Process: Orga-
nizing and Conducting Planning, November 2023. URL https:
//api.army.mil/e2/c/downloads/2023/11/17/f7177a3c/
23-07-594-military-decision-making-process-nov-23-public.
pdf. Available via army.mil (public release).
[209] NATO Standardization Office. APP-28: Tactical Planning for Land Forces, b
edition, 2019. Includes continuous planning cycle from receipt of mission to
orders; rapid decision-making process outlined.
[210] John R Boyd et al. A discourse on winning and losing, volume 400. Air
University Press Maxwell Air Force Base, AL, 2018.
[211] Grant Hammond. The mind of war: John Boyd and American security. Smith-
sonian Institution, 2001. ISBN 1-58834-178-X. Includes Boyd’s Appendix
“The OODA Loop” (orig. June 28, 1995 briefing).
[212] John Ferris. Netcentric warfare, c4isr and information operations: Towards a
revolution in military intelligence? Intelligence & National Security, 19(2):
199–225, 2004. doi: 10.1080/0268452042000302967.
[213] Peter Checkland and Sue Holwell. Information, Systems and Information Sys-
tems. John Wiley & Sons, 1998.
[214] David S. Alberts, John J. Garstka, and Frederick P. Stein. Network Cen-
tric Warfare: Developing and Leveraging Information Superiority. Na-
tional Defense University Press, Washington, DC, 2 edition, 1999. ISBN
127
Lauri Vasankari
9781579060190. URL http://dodccrp.org/files/Alberts_
NCW.pdf. CCRP PDF; accessed 2026-02-22.
[215] Center for Development of Security Excellence. NATO in-
formation short: Student guide. Student Guide IFS0007, De-
fense Counterintelligence and Security Agency, November 2024.
URL https://www.cdse.edu/Portals/124/Documents/
student-guides/shorts/IFS0007-guide.pdf.
[216] Susmit Sarkar, Aishika Chakraborty, Aveek Saha, and Anushka Bannerjee.
Securing air-gapped systems. In Proceedings of the International Ethical
Hacking Conference 2019, Advances in Intelligent Systems and Computing.
Springer, 2020. doi: 10.1007/978-981-15-0361-0 18.
[217] Robert W. Shirey. Internet security glossary, version 2. IETF RFC
4949, 2007. URL https://datatracker.ietf.org/doc/html/
rfc4949. Defines “air gap” and related security terminology.
[218] National Institute of Standards and Technology (NIST). air gap. NIST
Computer Security Resource Center (CSRC) Glossary, 2007. URL https:
//csrc.nist.gov/glossary/term/air_gap.
[219] James O’Donnell. We saw a demo of the new AI system powering Anduril’s
vision for war. MIT Technology Review, December 2024. URL https:
//www.technologyreview.com/2024/12/10/1108354/
we-saw-a-demo-of-the-new-ai-system-powering-andurils-vision-for-war/.
Accessed 2026-03-06.
[220] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against
support vector machines. In Proceedings of the 29th International Conference
on Machine Learning (ICML-12), pages 1467–1474, 2012. URL https:
//doi.org/10.48550/arXiv.1206.6389.
[221] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D.
Tygar. Can machine learning be secure? In Proceedings of the 2006 ACM
Symposium on Information, Computer and Communications Security (ASI-
ACCS), pages 16–25, 2006. doi: 10.1145/1128817.1128824.
[222] Andrea De Martino. Introduction to Modern EW Systems,
chapter 1. Artech House, 2 edition, 2018. URL https:
//api.pageplace.de/preview/DT0400.9781630815158_
A35217969/preview-9781630815158_A35217969.pdf.
[223] Paul Hannen. Introduction to radar and electronic warfare. In Radar and
Electronic Warfare Principles for the Non-Specialist. The Institution of En-
gineering and Technology, Edison, 4 edition, 2014. ISBN 9781613530115.
URL https://doi.org/10.1049/SBRA502E.
[224] Nigel Walton. ‘Four-Closure’: How Amazon, Apple, Facebook & Google
are driving business model innovation. In 2012 International Conference on
128
BIBLIOGRAPHY
Innovation Management and Technology Research. IEEE, May 2012. URL
https://ieeexplore.ieee.org/document/6236368.
[225] Ossi Ylijoki. Big Data – Towards Data-Driven Business. Doctoral dis-
sertation, Lappeenranta-Lahti University of Technology (LUT), Lappeen-
ranta, Finland, April 2019. URL https://urn.fi/URN:ISBN:
978-952-335-347-3. Acta Universitatis Lappeenrantaensis 845.
[226] Thomas H. Davenport and Rajeev Ronanki. Artificial intelligence for the real
world. Harvard Business Review, 96(1):108–116, 2018.
[227] Erik Brynjolfsson and Andrew McAfee. The Second Machine Age: Work,
Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton &
Company, New York, 2014.
[228] Michael Chui, James Manyika, and Mehdi Miremadi. Notes from the
AI frontier: Applications and value of deep learning. Technical report,
McKinsey Global Institute, 2018. URL https://www.mckinsey.
com/featured-insights/artificial-intelligence/
notes-from-the-ai-frontier-applications-and-value-of-deep-learning.
[229] Joshua S. Gans, Avi Goldfarb, and Ajay K. Agrawal. Theory Is All You Need:
AI, Human Cognition, and Causal Reasoning. Strategy Science, 9(4):356–
365, 2024. doi: 10.1287/stsc.2024.0189.
[230] Jacob Fraden. Handbook of Modern Sensors: Physics, Designs, and Applica-
tions. Springer, 5th edition, 2016.
[231] Jon S. Wilson, editor. Sensor Technology Handbook. Newnes, 2005. ISBN
978-0-7506-7729-5. doi: 10.1016/B978-0-7506-7729-5.X5040-X.
[232] Armada International. The Role of the Electromagnetic Spectrum in
Russian Surveillance, Offensive and Defensive Operations in Ukraine,
2024. https://www.armadainternational.com/2024/11/
the-role-of-the-electromagnetic-spectrum-in-russian-surveillance-offensive-and-defensive-operations-in-ukraine-electronic-warfare/.
Accessed: 2025-06-22.
[233] GlobalSecurity.org. The role of electromagnetic spectrum control in
warfare, 1990. https://www.globalsecurity.org/military/
library/report/1990/RSC.htm. Accessed: 2025-06-22.
[234] Yuntao Wang, Zhou Su, Shaolong Guo, Minghui Dai, Tom H. Luan, and Yil-
iang Liu. A survey on digital twins: Architecture, enabling technologies,
security and privacy, and future prospects. IEEE Internet of Things Jour-
nal, 10(17):14965–14987, 2023. doi: 10.1109/JIOT.2023.3263909. URL
https://ieeexplore.ieee.org/document/10090432.
[235] Sushil Jajodia, Paulo Shakarian, V.S. Subrahmanian, Vipin Swarup, and
Cliff Wang, editors. Cyber Warfare: Building the Scientific Foundation,
volume 56 of Advances in Information Security. Springer, 2015. doi:
10.1007/978-3-319-14039-1.
129
Lauri Vasankari
[236] Herbert Lin and Jaclyn Kerr. On cyber-enabled information warfare and in-
formation operations. In The Oxford Handbook of Cyber Security, chap-
ter 16, pages 251–272. Oxford University Press, 2021. doi: 10.1093/oxfordhb/
9780198800682.013.15.
[237] Vijay Srinivas Agneeswaran. Big-data – theoretical, engineering and analytics
perspective. In S. Srinivasan, editor, Big Data Analytics, volume 7678 of
Lecture Notes in Computer Science, pages 8–15. Springer, 2012. doi: 10.
1007/978-3-642-35542-4 2.
[238] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of
relative frequencies of events to their probabilities. In Measures of complexity:
festschrift for alexey chervonenkis, pages 11–30. Springer, 2015.
[239] Vladimir N Vapnik. Statistical learning theory.
Springer, 2 edition, 1999. ISBN 0387987800. URL
https://statisticalsupportandresearch.
wordpress.com/wp-content/uploads/2017/05/
vladimir-vapnik-the-nature-of-statistical-learning-springer-2010.
pdf.
[240] Yaser S Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning
from data. AMLBook, 2012.
[241] Wassily Hoeffding. Probability inequalities for sums of bounded random vari-
ables. Journal of the American statistical association, 58(301):13–30, 1963.
URL https://www.jstor.org/stable/2282952.
[242] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-
tion. arXiv preprint arXiv:1412.6980, 2014. URL https://doi.org/
10.48550/arXiv.1412.6980.
[243] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-RMSprop: Divide the
gradient by a running average of its recent magnitude. COURSERA: Neural
networks for machine learning, 2012.
[244] Adrien-Marie Legendre. On least squares. University of York, Department of
Mathematics, Historical Statistics, 1959. URL https://www.york.ac.
uk/depts/maths/histstat/legendre.pdf. Reprinted from D. E.
Smith, A Source Book in Mathematics, Vol. II, pp. 576–579.
[245] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Sta-
tistical Learning: Data Mining, Inference, and Prediction. Springer Series in
Statistics. Springer, New York, NY, 2 edition, 2009. ISBN 978-0-387-84857-
0. doi: 10.1007/978-0-387-84858-7. URL https://link.springer.
com/book/10.1007/978-0-387-84858-7.
[246] Charu C. Aggarwal. Data Mining: The Textbook. Springer, 2015. ISBN
978-3-319-14141-1. doi: 10.1007/978-3-319-14142-8.
[247] J. B. MacQueen. Some methods for classification and analysis of multivariate
observations. In Lucien M. Le Cam and Jerzy Neyman, editors, Proceed-
130
BIBLIOGRAPHY
ings of the Fifth Berkeley Symposium on Mathematical Statistics and Prob-
ability, Volume 1: Statistics, pages 281–297, Berkeley, CA, 1967. Univer-
sity of California Press. URL https://www.cs.cmu.edu/˜bhiksha/
courses/mlsp.fall2010/class14/macqueen.pdf.
[248] Franz Aurenhammer. Voronoi diagrams — a survey of a fundamental geo-
metric data structure. ACM Computing Surveys, 23(3):345–405, 1991. URL
https://dl.acm.org/doi/abs/10.1145/116873.116880.
[249] Stuart Lloyd. Least squares quantization in PCM. IEEE transactions on in-
formation theory, 28(2):129–137, 1982. URL https://ieeexplore.
ieee.org/document/1056489.
[250] Karl Pearson. On lines and planes of closest fit to systems of points in space.
Philosophical Magazine and Journal of Science, 2(11):559–572, 1901. doi:
https://doi.org/10.1080/14786440109462720.
[251] Harold Hotelling. Analysis of a complex of statistical variables into principal
components. Journal of Educational Psychology, 24(6):417–441, 1933. doi:
10.1037/h0071325. URL https://doi.org/10.1037/h0071325.
[252] C. J. Van Rijsbergen. Information Retrieval. Butterworth-Heinemann,
Newton, MA, USA, 2nd edition, 1979. URL https://openlib.org/
home/krichel/courses/lis618/readings/rijsbergen79_
infor_retriev.pdf.
[253] David Marvin Green, John A Swets, et al. Signal detection theory and psy-
chophysics, volume 1. Wiley New York, 1966.
[254] Tom Fawcett. An introduction to ROC analysis. Pattern Recognition Letters,
27(8):861–874, 2006.
[255] James A Hanley and Barbara J McNeil. The meaning and use of the area
under a receiver operating characteristic (ROC) curve. Radiology, 143(1):
29–36, 1982.
[256] David H. Hubel and Torsten N. Wiesel. Receptive fields of single neurones
in the cat’s striate cortex. The Journal of Physiology, 148(3):574–591, 1959.
doi: 10.1113/jphysiol.1959.sp006308.
[257] David H. Hubel and Torsten N. Wiesel. Receptive fields, binocular interaction
and functional architecture in the cat’s visual cortex. The Journal of Physiol-
ogy, 160(1):106–154, 1962. doi: 10.1113/jphysiol.1962.sp006837.
[258] Lawrence G. Roberts. Machine Perception of Three-Dimensional Solids. Ph.d.
dissertation, Massachusetts Institute of Technology (MIT), Cambridge, MA,
1963. Available from MIT Libraries: http://hdl.handle.net/1721.1/11589.
[259] John Canny. A computational approach to edge detection. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679–698,
1986. doi: 10.1109/TPAMI.1986.4767851.
[260] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns. IEEE
131
Lauri Vasankari
Transactions on Pattern Analysis and Machine Intelligence, 24(7):971–987,
2002. doi: 10.1109/TPAMI.2002.1017623. URL https://ieeexplore.
ieee.org/document/1017623.
[261] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision.
Cambridge University Press, second edition, 2004. ISBN 0521540518.
[262] David G. Lowe. Distinctive image features from scale-invariant keypoints.
International Journal of Computer Vision, 60(2):91–110, 2004. doi: 10.1023/
B:VISI.0000029664.99615.94. URL https://link.springer.com/
article/10.1023/B:VISI.0000029664.99615.94.
[263] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. SURF:
Speeded up robust features. Computer Vision and Image Understanding, 110
(3):346–359, 2008. doi: 10.1016/j.cviu.2007.09.014.
[264] Yann LeCun, Le´on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-
based learning applied to document recognition. Proceedings of the IEEE,
86(11):2278–2324, 1998. doi: 10.1109/5.726791. URL https://
ieeexplore.ieee.org/document/726791.
[265] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep resid-
ual learning for image recognition. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
URL https://ieeexplore.ieee.org/document/7780459.
[266] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classifi-
cation with deep convolutional neural networks. In F. Pereira, C.J. Burges,
L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Pro-
cessing Systems, volume 25. Curran Associates, Inc., 2012. URL https:
//proceedings.neurips.cc/paper_files/paper/2012/
file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
[267] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh,
Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bern-
stein, Alexander C. Berg, and Fei-Fei Li. ImageNet large scale visual recog-
nition challenge. International Journal of Computer Vision, 115(3):211–252,
2015.
[268] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature
hierarchies for accurate object detection and semantic segmentation, 2014.
URL https://arxiv.org/abs/1311.2524.
[269] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You Only
Look Once: Unified, Real-Time Object Detection . In 2016 IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR), pages 779–788,
Los Alamitos, CA, USA, June 2016. IEEE Computer Society. doi: 10.1109/
CVPR.2016.91. URL https://doi.ieeecomputersociety.org/
10.1109/CVPR.2016.91.
[270] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
132
BIBLIOGRAPHY
Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Im-
age is Worth 16x16 Words: Transformers for Image Recognition at Scale.
arXiv preprint arXiv:2010.11929, 2021. URL https://arxiv.org/
abs/2010.11929.
[271] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen
Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer
using shifted windows, 2021. URL https://arxiv.org/abs/2103.
14030.
[272] Maxime Oquab, Timothe´e Darcet, The´o Moutakanni, Huy Vo, Marc
Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco
Massa, Alaaeldin El-Nouby, et al. DINOv2: Learning robust visual fea-
tures without supervision. arXiv preprint arXiv:2304.07193, 2023. URL
https://doi.org/10.48550/arXiv.2304.07193.
[273] Oriane Sime´oni et al. DINOv3. arXiv preprint arXiv:2508.10104, 2025. URL
https://doi.org/10.48550/arXiv.2508.10104.
[274] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud A. A. Setio,
Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram
van Ginneken, and Clara I. Sa´nchez. A survey on deep learning in medical
image analysis. Medical Image Analysis, 42:60–88, 2017. doi: 10.1016/j.
media.2017.07.005.
[275] Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deep-
Face: Closing the gap to human-level performance in face verification. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pages 1701–1708, 2014. doi: 10.1109/CVPR.2014.220. URL https:
//ieeexplore.ieee.org/document/6909616.
[276] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A
unified embedding for face recognition and clustering. In IEEE Confer-
ence on Computer Vision and Pattern Recognition (CVPR), pages 815–823,
2015. doi: 10.1109/CVPR.2015.7298682. URL https://ieeexplore.
ieee.org/document/7298682.
[277] Sharada P. Mohanty, David P. Hughes, and Marcel Salathe´. Using deep learn-
ing for image-based plant disease detection. Frontiers in Plant Science, 7:
1419, 2016. doi: 10.3389/fpls.2016.01419. URL https://pmc.ncbi.
nlm.nih.gov/articles/PMC5032846/.
[278] Andreas Kamilaris and Francesc X. Prenafeta-Boldu´. Deep learning in agri-
culture: A survey. Computers and Electronics in Agriculture, 147:70–90,
2018. doi: 10.1016/j.compag.2018.02.016.
[279] Georg Klein and David Murray. Parallel Tracking and Mapping for Small AR
Workspaces. In ISMAR, pages 225–234, 2007. doi: 10.1109/ISMAR.2007.
4538852.
133
Lauri Vasankari
[280] Steven M. LaValle. Virtual Reality. Cambridge University Press, 2023. URL
https://doi.org/10.1017/9781108182874.
[281] Christian Zimmermann and Thomas Brox. Learning to Estimate 3D Hand
Pose from Single RGB Images. In ICCV, pages 4903–4911, 2017. doi: 10.
1109/ICCV.2017.523.
[282] Tamas Czimmermann, Gastone Ciuti, Mario Milazzo, Marcello Chiurazzi,
Stefano Roccella, Calogero Maria Oddo, and Paolo Dario. Visual-based defect
detection and classification approaches for industrial applications—a survey.
Sensors, 20(5):1459, 2020. doi: 10.3390/s20051459.
[283] Ruoxu Ren, Terence Hung, and Kay Chen Tan. A generic deep-learning-
based approach for automated surface inspection. IEEE Transactions on Cy-
bernetics, 48(3):929–940, 2018. doi: 10.1109/TCYB.2017.2668395. URL
https://ieeexplore.ieee.org/document/7864335.
[284] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger.
MVTec AD – A Comprehensive Dataset for Unsupervised Anomaly De-
tection in Industrial Inspection. In CVPR, pages 9592–9600, 2019. doi:
10.1109/CVPR.2019.00982.
[285] Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox.
PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation
in Cluttered Scenes. In Robotics: Science and Systems (RSS), 2018. URL
https://www.roboticsproceedings.org/rss14/p19.pdf.
[286] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu,
and Daan Wierstra. Matching Networks for One Shot Learn-
ing. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Gar-
nett, editors, Advances in Neural Information Processing Sys-
tems, volume 29. Curran Associates, Inc., 2016. URL https:
//proceedings.neurips.cc/paper_files/paper/2016/
file/90e1357833654983612fb05e3ec9148c-Paper.pdf.
[287] Jake Snell, Kevin Swersky, and Richard S. Zemel. Prototyp-
ical networks for few-shot learning. In NIPS, 2017. URL
https://papers.nips.cc/paper_files/paper/2017/hash/
cb8da6767461f2812ae4290eac7cbc42-Abstract.html.
[288] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-
learning for fast adaptation of deep networks. In ICML, 2017.
[289] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A
simple framework for contrastive learning of visual representations. In ICML,
2020.
[290] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momen-
tum contrast for unsupervised visual representation learning. In CVPR, 2020.
URL https://ieeexplore.ieee.org/document/9157636.
134
BIBLIOGRAPHY
[291] Jean-Bastien Grill, Florian Strub, Florent Altche´, Corentin Tallec,
Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo
Pires, Zhaohan Daniel Guo, Mohammad Azar, Bilal Piot, Koray
Kavukcuoglu, Re´mi Munos, and Michal Valko. Bootstrap your own
latent: A new approach to self-supervised learning. NeurIPS, 2020.
URL https://proceedings.neurips.cc/paper/2020/hash/
f3ada80d5c4ee70142b17b8192b2958e-Abstract.html.
[292] Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba,
and Pieter Abbeel. Domain randomization for transferring deep neural net-
works from simulation to the real world. In IROS, 2017. URL https:
//ieeexplore.ieee.org/document/8202133.
[293] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate
Saenko, Alexei A. Efros, and Trevor Darrell. Cycada: Cycle-consistent adver-
sarial domain adaptation. In ICML, 2018. URL https://proceedings.
mlr.press/v80/hoffman18a.
[294] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and
Harnessing Adversarial Examples. ICLR, 2015. URL https://arxiv.
org/abs/1412.6572.
[295] Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar,
Martin Wistuba, and Naigang Wang. Hardware-aware neural architecture
search: Survey and taxonomy. In Zhi-Hua Zhou, editor, Proceedings of
the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-
21, pages 4322–4329. International Joint Conferences on Artificial Intelli-
gence Organization, 8 2021. doi: 10.24963/ijcai.2021/592. URL https:
//doi.org/10.24963/ijcai.2021/592. Survey Track.
[296] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton.
Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991.
URL https://doi.org/10.1162/neco.1991.3.1.79.
[297] Saeed Masoudnia and Reza Ebrahimpour. Mixture of experts: a liter-
ature survey. Artificial Intelligence Review, 42(2):275–293, 2014. doi:
10.1007/s10462-012-9338-y. URL https://link.springer.com/
article/10.1007/s10462-012-9338-y.
[298] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. doi:
10.1023/A:1010933404324.
[299] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine
learning, 20(3):273–297, 1995.
[300] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks
for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
URL https://arxiv.org/abs/1409.1556.
[301] Warren B. Powell. Approximate Dynamic Programming: Solving the Curses
of Dimensionality. Wiley Series in Probability and Statistics. John Wiley &
135
Lauri Vasankari
Sons, 2nd edition, 2011. ISBN 978-0470604458. URL https://doi.
org/10.1002/9781118029176.
[302] Richard Bellman. Dynamic Programming. Princeton University Press, Prince-
ton, NJ, 1957. ISBN 9780691146683.
[303] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre,
George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda
Panneershelvam, Marc Lanctot, et al. Mastering the game of Go with deep
neural networks and tree search. Nature, 529(7587):484–489, 2016. doi:
10.1038/nature16961. URL https://www.nature.com/articles/
nature16961.
[304] Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michae¨l Mathieu,
Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo
Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-
agent reinforcement learning. In Proceedings of the International Conference
on Machine Learning (ICML), 2019. URL https://deepmind.com/
research/highlighted-research/alphastar.
[305] Re´mi Coulom. Efficient selectivity and backup operators in Monte-Carlo tree
search. In International conference on computers and games, pages 72–83.
Springer, 2006.
[306] Guillaume M. J.-B. Chaslot, Jaap-T. Saito, Bruno Bouzy, Jos W. H. M. Uiter-
wijk, and H. Jaap van den Herik. Monte-Carlo Strategies for Computer
Go. In Pierre-Yves Schobbens, Wim Vanhoof, and Glenn Schwanen, edi-
tors, Proceedings of the 18th BeNeLux Conference on Artificial Intelligence
(BNAIC’06), pages 83–90, Namur, Belgium, 2006. University of Namur.
[307] Levente Kocsis and Csaba Szepesva´ri. Bandit Based Monte-Carlo Planning.
In Johannes Fu¨rnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Pro-
ceedings of the 17th European Conference on Machine Learning (ECML
2006), volume 4212 of Lecture Notes in Computer Science, pages 282–293,
Berlin, Heidelberg, 2006. Springer-Verlag.
[308] Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic,
Shane Legg, and Dario Amodei. Deep reinforcement learning
from human preferences. In Advances in Neural Information Pro-
cessing Systems (NeurIPS), volume 30, 2017. URL https:
//proceedings.neurips.cc/paper_files/paper/2017/
file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf.
[309] Saurabh Arora and Prashant Doshi. A Survey of Inverse Reinforcement Learn-
ing: Challenges, Methods and Progress, 2020. URL https://arxiv.
org/abs/1806.06877.
[310] Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learn-
ing with double Q-learning. In Proceedings of the AAAI Conference on
136
BIBLIOGRAPHY
Artificial Intelligence, AAAI’16, pages 2094–2100, 2016. URL https:
//doi.org/10.1609/aaai.v30i1.10295.
[311] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and
Oleg Klimov. Proximal Policy Optimization algorithms. arXiv preprint
arXiv:1707.06347, July 2017. URL https://arxiv.org/abs/1707.
06347.
[312] Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang,
Alexandre Bayen, and YI WU. The Surprising Effectiveness of
PPO in Cooperative Multi-Agent Games. In S. Koyejo, S. Mo-
hamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Ad-
vances in Neural Information Processing Systems, volume 35, pages
24611–24624. Curran Associates, Inc., 2022. URL https://
proceedings.neurips.cc/paper_files/paper/2022/file/
9c1535a02f0ce079433344e14d910597-Paper-Datasets_
and_Benchmarks.pdf.
[313] John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter
Abbeel. High-dimensional continuous control using generalized advantage
estimation. In Proceedings of the International Conference on Learning
Representations (ICLR), 2016. URL https://arxiv.org/abs/1506.
02438v6.
[314] Leland McInnes, John Healy, and James Melville. UMAP: Uniform
manifold approximation and projection for dimension reduction. arXiv
preprint arXiv:1802.03426, 2018. URL https://arxiv.org/abs/
1802.03426.
[315] Warren B. Powell. Unified Framework for Optimization under Uncertainty.
INFORMS TutORials in Operations Research, 2022. ISBN 9780984337897.
URL https://doi.org/10.1287/educ.2016.0149.
[316] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and
Blaise Agu¨era y Arcas. Communication-efficient learning of deep net-
works from decentralized data. In Aarti Singh and Jerry Zhu, editors, Pro-
ceedings of the 20th International Conference on Artificial Intelligence and
Statistics, volume 54 of Proceedings of Machine Learning Research, pages
1273–1282, Fort Lauderdale, FL, USA, April 2017. PMLR. URL https:
//proceedings.mlr.press/v54/mcmahan17a.html.
[317] Jonas Geiping, Hartmut Bauermeister, Hannah Dro¨ge, and Michael Moeller.
Inverting gradients-how easy is it to break privacy in federated learning?
In Advances in Neural Information Processing Systems, volume 33, pages
16937–16947, 2020.
[318] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Mem-
bership inference attacks against machine learning models. In 2017 IEEE
137
Lauri Vasankari
Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. URL
https://ieeexplore.ieee.org/document/7958568.
[319] Cynthia Dwork. Differential privacy. In International Colloquium on Au-
tomata, Languages, and Programming, pages 1–12. Springer, 2006.
[320] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Bren-
dan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth.
Practical secure aggregation for privacy-preserving machine learning. In Pro-
ceedings of the 2017 ACM SIGSAC Conference on Computer and Communi-
cations Security, pages 1175–1191, 2017.
[321] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang, and Shiho
Moriai. Privacy-preserving deep learning via additively homomorphic encryp-
tion. IEEE Transactions on Information Forensics and Security, 13(5):1333–
1345, 2018. URL https://ieeexplore.ieee.org/document/
8241854.
[322] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine
translation by jointly learning to align and translate. In International Confer-
ence on Learning Representations (ICLR), 2015. URL https://arxiv.
org/abs/1409.0473.
[323] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT:
Pre-training of deep bidirectional transformers for language understanding.
In Proceedings of the 2019 conference of the North American chapter of the
association for computational linguistics: human language technologies, vol-
ume 1 (long and short papers), pages 4171–4186, 2019.
[324] Philip Gage. A new algorithm for data compression. The C Users Jour-
nal archive, 12:23–38, 1994. URL https://api.semanticscholar.
org/CorpusID:59804030.
[325] Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine trans-
lation of rare words with subword units. In Proceedings of the 54th Annual
Meeting of the Association for Computational Linguistics (Volume 1: Long
Papers), pages 1715–1725, 2016. URL https://aclanthology.org/
P16-1162.pdf.
[326] Taku Kudo and John Richardson. Sentencepiece: A simple and language inde-
pendent subword tokenizer and detokenizer for neural text processing. In Pro-
ceedings of the 2018 Conference on Empirical Methods in Natural Language
Processing: System Demonstrations, pages 66–71. Association for Computa-
tional Linguistics, 2018.
[327] Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh
Sharma. Explaining neural scaling laws. Proceedings of the National
Academy of Sciences, 121(27):e2311878121, 2024. doi: 10.1073/pnas.
2311878121. URL https://www.pnas.org/doi/abs/10.1073/
pnas.2311878121.
138
BIBLIOGRAPHY
[328] Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wain-
wright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina
Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke
Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Chris-
tiano, Jan Leike, and Ryan Lowe. Training language models to fol-
low instructions with human feedback. Advances in Neural Informa-
tion Processing Systems, 35:27730–27744, 2022. URL https://
proceedings.neurips.cc/paper_files/paper/2022/file/
b1efde53be364a73914f58805a001731-Paper-Conference.
pdf.
[329] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir
Karpukhin, Naman Goyal, Heinrich Ku¨ttler, Mike Lewis, Wen tau
Yih, Tim Rockta¨schel, Sebastian Riedel, and Douwe Kiela. Retrieval-
Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv preprint
arXiv:2005.11401, 2021. URL https://arxiv.org/abs/2005.
11401.
[330] Anthropic. Claude Gov models for U.S. national secu-
rity customers. https://www.anthropic.com/news/
claude-gov-models-for-u-s-national-security-customers,
June 2025. Accessed: 2026-03-01.
[331] OpenAI. Introducing ChatGPT Gov. https://openai.com/
global-affairs/introducing-chatgpt-gov/, January 2025.
Accessed: 2026-03-01.
[332] Google Cloud. Introducing ‘Gemini for Government’: Support-
ing the U.S. government’s transformation with AI. https:
//cloud.google.com/blog/topics/public-sector/
introducing-gemini-for-government-supporting-the-us-governments-transformation-with-ai,
August 2025. Accessed: 2026-03-01.
[333] Meta. How meta is supporting US national security
with AI. https://about.fb.com/news/2025/09/
meta-supporting-us-national-security-with-ai/,
September 2025. Accessed: 2026-03-01.
[334] Ministry of Defence, Finland. Government defence report. Technical
Report 2024:7, Publications of the Ministry of Defence (Finland), De-
cember 2024. URL https://julkaisut.valtioneuvosto.
fi/bitstream/handle/10024/166004/PLM_2024_7.pdf?
sequence=4&isAllowed=y. Published by the Ministry of Defence;
ISBN via URN:URN:ISBN:978-951-663-471-8.
[335] Filippo Santoni de Sio and Jeroen van den Hoven. Meaningful human control
over autonomous systems: A philosophical account. Frontiers in Robotics
139
Lauri Vasankari
and AI, 5:15, 2018. doi: 10.3389/frobt.2018.00015. URL https://www.
frontiersin.org/articles/10.3389/frobt.2018.00015.
[336] Advisory Council on International Affairs (AIV) and Advisory Com-
mittee on Issues of Public International Law (CAVV). Autonomous
weapon systems: The need for meaningful human control. Technical
Report Advisory Report No. 97, AIV/CAVV, The Hague, 2015. URL
https://www.advisorycommitteeinternationallaw.nl/
documents/2015/10/12/autonomous-weapon-systems.
[337] National Institute of Science and Technology. Outline: Proposed Zero Draft
for a Standard on AI Testing, Evaluation, Verification, and Validation. Su-
perIntelligence - Robotics - Safety & Alignment, 2, September 2025. doi:
10.70777/si.v2i5.15513.
[338] Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasir-
cioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones,
Chris Hicks, Nicholas Carlini, Yarin Gal, and Robert Kirk. Poisoning Attacks
on LLMs Require a Near-constant Number of Poison Samples, 2025. URL
https://arxiv.org/abs/2510.07192.
[339] Elham Tabassi. Artificial Intelligence Risk Management Framework (AI RMF
1.0). Technical Report NIST AI 100-1, National Institute of Standards and
Technology, Gaithersburg, MD, 2023. URL https://tsapps.nist.
gov/publication/get_pdf.cfm?pub_id=936225. NIST Trust-
worthy and Responsible AI. Accessed: 2025-10-09.
[340] ISO/IEC. Information technology — Artificial intelligence — Guidance on
risk management. International Standard, February 2023. URL https://
www.iso.org/standard/77304.html. Edition 1. Accessed: 2025-
10-10.
[341] ISO/IEC. Information technology — Artificial intelligence — Management
system. International Standard, December 2023. URL https://www.
iso.org/standard/42001. Edition 1. Accessed: 2025-10-10.
[342] Office of the Under Secretary of Defense for Research and Engineer-
ing. DoD Instruction 5000.89: Test and Evaluation. Technical Re-
port DoDI 5000.89, U.S. Department of Defense, Washington, DC,
November 2020. URL https://www.esd.whs.mil/Portals/54/
Documents/DD/issuances/dodi/500089p.PDF. Accessed: 2026-
03-07.
[343] Office of the Under Secretary of Defense for Research and Engineering. DoD
Instruction 5000.98: Operational Test and Evaluation and Live Fire Test and
Evaluation. Technical Report DoDI 5000.98, U.S. Department of Defense,
Washington, DC, December 2024. URL https://www.esd.whs.mil/
Portals/54/Documents/DD/issuances/dodi/500098p.PDF.
140
Supersedes OT&E/LFT&E content previously in DoDI 5000.89. Accessed:
2026-03-07.
[344] Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning
Mao, Xuezhe Ma, Avia Efrat, Ping Yu, LILI YU, Susan Zhang, Gargi Ghosh,
Mike Lewis, Luke Zettlemoyer, and Omer Levy. LIMA: Less Is More for
Alignment. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and
S. Levine, editors, Advances in Neural Information Processing Systems, vol-
ume 36, pages 55006–55021. Curran Associates, Inc., 2023. URL https://
proceedings.neurips.cc/paper_files/paper/2023/file/
ac662d74829e4407ce1d126477f4a03a-Paper-Conference.
pdf.
[345] Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros.
Dataset distillation, 2020. URL https://arxiv.org/abs/1811.
10959.
[346] Max Marion, Ahmet U¨stu¨n, Luiza Pozzobon, Alex Wang, Marzieh Fadaee,
and Sara Hooker. When Less is More: Investigating Data Pruning for Pre-
training LLMs at Scale, 2023. URL https://arxiv.org/abs/2309.
04564.
[347] Yu Gu, Jingjing Fu, Xiaodong Liu, Jeya Maria Jose Valanarasu, Noel CF
Codella, Reuben Tan, Qianchu Liu, Ying Jin, Sheng Zhang, Jinyu Wang, Rui
Wang, Lei Song, Guanghui Qin, Naoto Usuyama, Cliff Wong, Hao Cheng,
Hohin Lee, Praneeth Sanapathi, Sarah Hilado, Jiang Bian, Javier Alvarez-
Valle, Mu Wei, Khalil Malik, Jianfeng Gao, Eric Horvitz, Matthew P Lun-
gren, Hoifung Poon, and Paul Vozila. The illusion of readiness: Stress test-
ing large frontier models on multimodal medical benchmarks, 2025. URL
https://arxiv.org/abs/2509.18234.
[348] Richard Sutton. The bitter lesson, March 2019. URL http://www.
incompleteideas.net/IncIdeas/BitterLesson.html.
[349] Wayne P Hughes. Fleet Tactics and Coastal Combat. Naval Institute Press,
Annapolis, MD, 2000.
[350] Alfred Thayer Mahan. The Life of Nelson: The Embodiment of the Sea Power
of Great Britain. Little, Brown and Company, Boston, 1897.