Anomaly Detection and Prevention in IoT Using AI
Bilal, Osama (2025-07-31)
Anomaly Detection and Prevention in IoT Using AI
Bilal, Osama
(31.07.2025)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2025080881625
https://urn.fi/URN:NBN:fi-fe2025080881625
Tiivistelmä
The rapid expansion of the Internet of Things (IoT) has led to environments rich in interconnected sensors, devices, and communication protocols across critical domains such as smart homes, healthcare, industrial automation, and energy systems. However, this interconnectivity introduces complex cybersecurity vulnerabilities and operational faults. Traditional rule-based and statistical anomaly detection techniques struggle with scalability, adaptability, and high false-positive rates in heterogeneous, voluminous IoT data. In response, this thesis proposes an AI-powered anomaly detection framework tailored for real-world, multidomain IoT datasets.
This study utilizes a Kaggle microgrid dataset—5,000 entries with 27 features covering environmental conditions (e.g. temperature, humidity), energy metrics (consumption, generation, storage), grid/network performance, and blockchain transactions. So, implemented and rigorously evaluated both semi-supervised deep-learning models (Isolation Forest, Autoencoder, LSTM) and supervised ensemble classifiers (Random Forest, XGBoost, LightGBM, KNN), with the latter enhanced via SMOTE oversampling. All experiments were conducted in Google Colab using Python libraries such as scikit-learn and TensorFlow/Keras.
Results indicate unsupervised methods achieved modest performance: Isolation Forest recorded 71.6% accuracy (F1 ≈ 0.129), Autoencoder 73.0% (F1 ≈ 0.172), and LSTM (which can be used in supervised and unsupervised techniques) recorded 72.6% (F1 ≈ 0.160). In contrast, supervised models excelled—Random Forest achieved 99.65% accuracy with near-perfect precision and recall (F1 ≈ 0.997), XGBoost and LightGBM both reached around 99.38% accuracy (F1 ≈ 0.994), while KNN scored 77.3% accuracy (F1 ≈ 0.801). Confusion matrix evaluation confirmed Random Forest detected 719 true positives and zero false positives, underscoring its robustness for anomaly detection in IoT. Future work should explore hybrid deep learning-ensemble architectures, real-time deployment on edge computing platforms, and adaptive learning mechanisms to handle concept drift. In summary, this thesis demonstrates that supervised ensemble techniques—particularly Random Forest—outperform deep and unsupervised methods in multidomain IoT anomaly detection using static datasets.
This study utilizes a Kaggle microgrid dataset—5,000 entries with 27 features covering environmental conditions (e.g. temperature, humidity), energy metrics (consumption, generation, storage), grid/network performance, and blockchain transactions. So, implemented and rigorously evaluated both semi-supervised deep-learning models (Isolation Forest, Autoencoder, LSTM) and supervised ensemble classifiers (Random Forest, XGBoost, LightGBM, KNN), with the latter enhanced via SMOTE oversampling. All experiments were conducted in Google Colab using Python libraries such as scikit-learn and TensorFlow/Keras.
Results indicate unsupervised methods achieved modest performance: Isolation Forest recorded 71.6% accuracy (F1 ≈ 0.129), Autoencoder 73.0% (F1 ≈ 0.172), and LSTM (which can be used in supervised and unsupervised techniques) recorded 72.6% (F1 ≈ 0.160). In contrast, supervised models excelled—Random Forest achieved 99.65% accuracy with near-perfect precision and recall (F1 ≈ 0.997), XGBoost and LightGBM both reached around 99.38% accuracy (F1 ≈ 0.994), while KNN scored 77.3% accuracy (F1 ≈ 0.801). Confusion matrix evaluation confirmed Random Forest detected 719 true positives and zero false positives, underscoring its robustness for anomaly detection in IoT. Future work should explore hybrid deep learning-ensemble architectures, real-time deployment on edge computing platforms, and adaptive learning mechanisms to handle concept drift. In summary, this thesis demonstrates that supervised ensemble techniques—particularly Random Forest—outperform deep and unsupervised methods in multidomain IoT anomaly detection using static datasets.