Edge Computing with Embedded AI: Thermal Image Analysis
for Occupancy Estimation in Intelligent Buildings
Aly Metwaly
almetw@utu.fi
University of Turku
Turku, Finland
Jorge Peña Queralta
jopequ@utu.fi
University of Turku
Turku, Finland
Victor Kathan Sarker
vikasar@utu.fi
University of Turku
Turku, Finland
Tuan Nguyen Gia
tunggi@utu.fi
University of Turku
Turku, Finland
Omar Nasir
omar.nasir@helvar.com
Helvar Oy Ab
Espoo, Finland
Tomi Westerlund
tovewe@utu.fi
University of Turku
Turku, Finland
ABSTRACT
With the rise of the IoT, there has been a growing demand for peo-
ple counting and occupancy estimation in Intelligent buildings for
adapting their heating, ventilation and cooling systems. This can
have a significant impact on energy consumption at a global scale
as such systems consume about 40% of electricity and create about
36% of the CO2 emissions in Europe. Previous approaches to occu-
pancy estimation either utilize methods that do not ensure people’s
privacy when obtaining high accuracy estimations, such as RGB
cameras, or utilize thermal or radar sensors with lower accuracy.
Thermal vision for people detection has several advantages. It pro-
tects people’s privacy while being less affected by changes in the
environment. In addition, most of the previous image processing
approaches rely on streaming the data to the cloud to be analyzed.
However, with the development of the more distributed network
paradigms edge and fog computing, there has been a trend in mov-
ing computation towards the edge of the network. This process
of embedding intelligence into end-devices enables more efficient
energy consumption and network load distribution. In this work,
we present an embedded algorithm for room occupancy estimation
based on a thermal sensor with accuracy over the state-of-the-art.
We study the performance of a variety of deep learning models on
different embedded processors. We achieve a prediction accuracy of
98.9% for people counting estimation with minimal 2 KB RAM uti-
lization. Furthermore, the proposed algorithm has very low latency
achieving execution times under 14 ms.
CCS CONCEPTS
• Computer systems organization → Embedded software; •
Computingmethodologies→ Sceneunderstanding;Machine
learning algorithms; • Hardware → Sensors and actuators;
Digital signal processing; Sensor applications and deployments.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
INTESA ’19, October 13–18, 2019, NY, USA
© 2019 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00
https://doi.org/10.1145/nnnnnnn.nnnnnnn
KEYWORDS
Edge Computing; IoT; Embedded Intelligence; Embedded AI; Ther-
mal Imaging; Intelligent Buildings;
ACM Reference Format:
Aly Metwaly, Jorge Peña Queralta, Victor Kathan Sarker, Tuan Nguyen Gia,
Omar Nasir, and Tomi Westerlund. 2019. Edge Computing with Embedded
AI: Thermal Image Analysis for Occupancy Estimation in Intelligent Build-
ings . In INTESA ’19: INTelligent Embedded Systems Architectures and Appli-
cations, Co-Located with ES WEEK 2019 - October 13–18, 2019, NY, USA. ACM,
New York, NY, USA, 6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1 INTRODUCTION
In the Industry 4.0 era, cutting-edge technologies such as the IoT
and AI are emerging rapidly [7]. These technologies have the po-
tential to impact our daily lives through applications in smart cities,
smart homes or intelligent buildings [28]. Different industries are
adopting these technologies and transforming them into market
opportunities. One promising application is people counting and
occupancy estimation in buildings. The acquired information can
be utilized for more efficient planning and intelligent space man-
agement in smart workplaces. Furthermore, information about the
occupancy in buildings and individual rooms can have a signifi-
cant impact on energy consumption. Buildings are considered the
largest energy consumer in Europe using approximately 40% of
the total energy and creating about 36% of the total carbon dioxide
emissions [8]. Similarly, heating, ventilation, and air conditioning
(HVAC) systems in buildings were liable for 38.9% of the total en-
ergy consumption in 2017 in the USA [25]. By acquiring reliable
information on the occupancy of the buildings, energy consump-
tion can be drastically reduced if HVAC systems in the building are
adjusted automatically [13].
Within the IoT, there is a recent trend in more distributed net-
work architectures, in contrast with traditional cloud-centric com-
puting [20, 24]. Edge and fog computing paradigms involve moving
computational power and data analysis closer to where the data
originates [32, 33]. Combined with artificial intelligence algorithms
running at the local network level, this approach enables lower-
latency and reduced network load [16, 19, 29]. In this work, we
explore the case of embedded artificial intelligence, in which the
data analysis runs directly on the sensor node itself.
Previous works on building occupancy estimation or people
counting have used RGB cameras [9, 26], motion sensors [17, 18, 34],
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
and, more recently, thermal arrays [2–4, 11]. Motion sensors, such
as passive infrared (PIR) sensors, have the drawback of being in-
accurate as the number of people increases, as well as limited
range [13, 23]. RGB cameras are able to produce high accuracy
occupancy estimations, but require computationally intensive im-
age processing [12].
Thermal imaging is one of the most promising sensing technolo-
gies. It has been frequently used in smart-city applications [1]. The
advantages of the thermal cameras over the RGB cameras are that
they are not light-dependent and can work in dark environments.
However, thermal cameras with low and medium resolution usually
cannot recognize characteristics of the detected person. Therefore,
they are unable to identify features of the people in the scene.
In this paper, we propose novel solutions based on Deep Neu-
ral Networks (DNNs) to use images from generic, low-resolution,
thermal cameras to reliably detect occupancy and count number of
people with high accuracy. Compared to previous works, our mod-
els achieve higher occupancy prediction accuracy and enable faster
image processing than other micro-controller-based implementa-
tions. Our approach provides an almost error-free prediction in the
case of no occupancy and otherwise matches the number of people
in the room. For the tests included in the paper, we have trained
the model with a dataset where the occupancy ranged from 0 to 5
persons. However, the same model can be retrained with a more
varied dataset to provide a wider range of inference occupancy
estimation outputs.
The main contributions of this work are: (i) the design of multi-
ple deep learning models for estimating room occupancy based on
thermal images; (ii) the implementation of these models on Arm
Cortex M4 and M7 micro-controllers for real-time analysis of ther-
mal images; (iii) the analysis of the performance and impact on the
micro-controller computing resources of the proposed models; and
(iv) the comparison of our work with the state-of-the-art showing
an improved accuracy and reduced computation time.
The rest of the paper is organized as follow: Section 2 reviews
existing works in occupancy estimation. Section 3 introduces the
concept of embedded AI and describes the types of neural networks
utilized in this work. Section 4 describes the data acquisition pro-
cess, the hardware platforms utilized for testing and the evaluated
machine learning models. In Section 5, we demonstrate the superior
performance of our algorithm when compared to the state-of-the-
art and provide an overview of the models which produced the
best results. Finally, Section 6 concludes this work and outlines the
directions for future work.
2 RELATEDWORK
Oosterhout et al. introduced a head-detection system based on
stereo cameras for counting people from video streams [30]. The
method is robust and provides high accuracy ranging from 90-95 %
for different scenarios. In contrast, we rely on thermal cameras in
order to preserve people’s privacy and enable fast embedded image
processing. In addition, we are able to achieve higher accuracy.
Other early approaches which do preserve people’s privacy use PIR
sensors with some limitations.Wahl et al. presented an approach for
people counting for office environments [13] which use distributed
PIR sensors enhanced with algorithms to interpret the sensors’
information. They explored the performance of two people counting
algorithms on this experimental setup with different simulation
scenarios. Their approach required a larger number of sensors and
the accuracy decreased with an increasing number of people.
Beltran et al. presented a system for estimating occupancy [2]
called ThermoSense based on a thermal sensor array and a PIR
sensor. It can detect occupancy with an RMS error of approximately
0.35 persons. More recently, Gomez et al. developed a people count-
ing algorithm on thermal images-based on CNN [3]. The used CNNs
fit in less than 500 KB of memory and operated on Cortex-M4 MCU.
The CNN algorithm could provide an error-free detection accuracy
of 53.7% while using 308 KB of the MCU memory. The resolution
of the thermal sensors used is 80x60 which shows some features of
the people involved in the scene. The execution time for one image
is 63 seconds. In our work, we aim for a high error-free accuracy in
an office environment using a thermal sensor of 24x32 pixels which
cannot detect any features of the people.
Griffiths et al. used a thermal imager with 60x80 pixel resolu-
tion [11]. The algorithm used is based on the individuals’ height
differences for presence detection. Further, the algorithm detects
the movement direction of the individuals. Similarly, Tyndall et al.
proposed a low-pixel thermal imager system for occupancy estima-
tion [4] and used a classification algorithm. The system is based on
Thermosense [2] but is different from Thermosense in the choice of
the thermal sensor, positioning of the sensor and the classification
algorithm. In our work, we are able to achieve better real-time
occupancy estimation accuracy while embedding the AI models in
low power microprocessors.
A high accuracy method for estimating room occupancy with
thermal array sensors was proposed by Abedi et al. [21]. The au-
thors presented a real-time monitoring system which was only able
to detect the presence of people in a room giving a binary output.
They achieved an accuracy of over 99%. The authors rely on cloud
computing for image processing and their model is unable to es-
timate the exact number of people in the room. In our work, we
achieve a similar accuracy while estimating the number of people
and embed the algorithms so that it is not required to send raw
data to the cloud for processing.
3 EMBEDDED AI
With the increasing pervasiveness of the IoT in all aspects of our
daily lives, it is expected that billions of edge devices will be con-
nected to the internet in the near future. These devices will be
producing extremely large amounts of data. In the traditional cloud-
centric approach, all data acquired at the edge devices is sent to
the cloud to be crunched and processed. Then, the results of the
analysis and commands are sent back to the edge devices. As the
most important information resides on the data analysis results, the
process of sending raw data to the cloud can be avoided if part of
the computation is moved towards the edge of the network. Within
the edge and fog computing paradigms, embedded AI refers to
embedding artificial intelligence algorithms into low-power and
computationally-constrained devices. Jägare reflects on the benefits
of moving data analysis from cloud-centric architectures towards
embedded systems for given applications in a recent work [31].
These benefits include (i) reduced latency, increased reliability, and
Edge Computing with Embedded AI for Occupancy Estimation INTESA ’19, October 13–18, 2019, NY, USA
safety in time-critical applications; (ii) overall energy-efficiency and
reduced cost with a reduced impact to network traffic and cloud
server load; and (iii) enhanced privacy and security, with a lower
risk of raw data being exposed, and natural support for applications
where privacy is paramount and raw data cannot be shared.
In summary, applying AI at the edge instead of the cloud achieves
a more reliable low latency response. Also, it has the potential of
providing a better user experience with enhanced security and
privacy. However, applying AI algorithms on embedded devices
can present significant challenges. Embedded systems are resource-
constrained devices because of their low computational power, low
memory, and low power consumption requirements. In the rest
of this section, we overview the basic concepts for the neural net-
works that have been studied in this work. Each of these networks
has a different impact on system requirements (RAM, Flash) and
execution time.
3.0.1 Feedforward Neural Networks (FNNs). also known as Deep
FNNs are the basic deep learning models. It is called feedforward
because the information flow is only in the forward direction. In
other words, there is no feedback connection from the output that is
fed to the model [14]. FNNs form the basis of many other significant
neural networks such as the convolutional networks. In addition,
it is an essential step on the path to the recurrent networks [14].
FNNs is composed of different functions that are chained together.
Each function is called a layer and the overall length of the chain
is called the depth of the model. The training data shows only the
overall output of the whole network which specifying the output
of each layer, that’s why they are called the hidden layers. Each
of the hidden layers is vector-valued and their dimension deter-
mines the width of the model which is measured in the number of
neurons [14].
3.0.2 Convolutional Neural Networks (CNNs). Convolutional neu-
ral network (CNN) employs the convolution mathematical opera-
tion instead of general matrix multiplication in at least one of their
layers. The CNN is enhanced from the FNN by overcoming some of
the FNN disadvantages: sparse connectivity is used in CNNs to re-
duce the number of weights. On the other hand, Parameter sharing
is used to decrease the memory required for neural models. It also
reduces the complexity of the model at a given accuracy, which is
called the statistical efficiency [5]. A sliding window called kernel
is required to perform the convolution process. When convolution
is applied in machine learning, the input is usually a multidimen-
sional array and the kernel is usually a multidimensional array of
parameters (tensors) that are adjusted by the learning algorithm.
In the case of a 2D image, the input would be a frame matrix of
the number of pixels and the kernel would be a 2D convolution
sliding window [14]. Each layer in CNN has neurons arranged in 3
dimensions: width, height, and depth. The depth is the number of
channels (filters) for the layer.
3.0.3 Recurrent Neural Networks (RNNs). RNN is a special form of
the FNN with internal states and loops. The fundamental difference
is that the FNN neurons are not accessed twice whereas in RNN the
neurons can be accessed more than once through the loops in back-
propagation. This allows the information to persist in a time-series.
This feature makes RNNs used widely in speech recognition and
Table 1: STM32F401RE (F4) and STM32F722ZE (F7) specs.
STM32F401RE STM32F722ZE
Clock 84 MHz 216 MHz
Flash 512 KB 512 KB
SRAM 96 KB 256 KB
Pipeline Stages 3 6 (dual-issue)
Cache No 8 KB/I&D
I2C 3 3
Table 2: Distribution of samples in the training and test sets.
Dataset Labels0 1 2 3 4 5
Original Training 3540 196 229 201 74 125
Test 881 39 59 66 14 33
Augmented Training 3540 1568 1832 1608 592 1000
Test 881 312 472 528 112 264
(a) Original image (b) Zoomed
(c) Vertical Flip (d) Added Noise
Figure 1: Different types of data augmentation.
video processing [5]. The RNNs are one of the families that are used
for sequential data. It is specialized in processing sequential data in
time series. However, RNNs can be applied to 2-dimensional data
such as images which is the case in this work [14]. The look-back
of the RNN is the number of previous inputs that the network will
keep before it performs the back-propagation process. This is a
fundamental process that makes the RNN able to keep a time-series
of the inputs. Therefore, without back-propagation, each input to
the network is treated independently.
In this work, Gated Recurrent Unit (GRU) is used as a recurrent
neural network because the GRU has low complexity and high
performance in comparison to other variants of the RNNs [15].
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
0
100
200
Ex
ec
ut
io
n
Ti
m
e
(m
s) STM32 F4
STM32 F7
100
200
Ex
ec
ut
io
n
Ti
m
e
(m
s)
FN
N
L1
N
64
FN
N
L1
N
25
6
FN
N
L1
N
51
2
FN
N
L2
N
12
8
FN
N
L2
N
51
2
FN
N
L3
N
51
2
CN
N
K3
F6
L1
CN
N
K3
F6
L4
CN
N
K5
F6
L2
CN
N
K5
F6
L4
CN
N
K5
F8
L4
CN
N
K5
F1
0L
4
GR
UL
1N
8
GR
UL
1N
12
GR
UL
2N
8
GR
UL
2N
12
GR
UL
3N
8
GR
UL
3N
12
104
105
Fl
as
h
M
em
or
y
(B
yt
es
)
102
103
104
RA
M
M
em
or
y
(B
yt
es
)RAM
FLASH
Figure 2: Comparison of execution time, flash and RAM usage for the different models tested.
4 METHODOLOGY
In this work, cloud instances are used to train the models according
to the aforementioned needs and CPU instances are used to train
the GRUs model due to lower parallelism. The CPU instances run
on 4 Intel Xeon Scalable Processors (Cascade Lake) with a turbo
clock frequency of 3.6 GHz. Also, GPUs are used to train the FNN
and XNN models. GPU instances provide one NVIDIA Tesla K80
Accelerator which runs a pair of NVIDIA GK210 GPUs providing
a total of 2496 parallel processing cores. Also, the instance has 4
GPUs of Intel’s Broadwell microarchitecture running at 2.7 GHz.
The actual implementation of the embedded intelligence has
been carried out with two 32-bit MCUs from ST-Microelectronics.
We have used the STM32F401 and the STM32F722 from the Arm
Cortex M4 and M7 families respectively for running the DNNs as
these have sufficient resources. Two MCUs are used to provide
a more extensive evaluation and to overcome some of the limita-
tions that might occur. In the process, the deep learning algorithms
are applied first to the MCUs for realizing the proof of concept.
The features and available resources of the two MCUs used in the
experiments are listed in Table 1.
For applying the deep learning models, an expansion package
named X-CUBE-AI is used which helps in applying deep learning
algorithms and is capable of converting trained neural networks
and generating STM32-optimized library. In addition, the package
supports various deep learning frameworks such as Keras which is
used in our trained models [27].
4.1 Data Acquisition and Analysis
In this work, a fully calibrated 24x32 pixels FIR thermal sensor
array MLX90640 from Melexis is used. This is a medium resolu-
tion camera and therefore the images from it are not sufficient to
identify features which can help in revealing a person’s identity.
This conforms to our research requirements of ensuring the privacy
of individuals. Moreover, it has integrated sensors to measure the
supply voltage (VDD) and ambient temperature (Ta) of the chip.
The measurement outputs stored in the internal RAM are accessed
through the I2C interface [22]. In addition, there are two FOVs of
the thermal sensor array- 55x35 and 110x75 degrees of which the
wider one is used in our experiments. The output is a thermal image
where heat signatures are represented by the intensity of the colors.
In our experiments, the MLX90640 is installed in an indoor office
environment. For such a contained environment, it can be assumed
that people are constantly warmer than the ambient or room tem-
perature [6]. An additional RGB camera is set up to cross-check
the total number of people from the results of our experimental
setup. This serves as ground truth for model validation and bench-
marking. In this work, we have collected data for 2 days and 9 hours
resulting in a total of 5457 data samples from the thermal sensor.
The experiments involved zero to five people in the office room.
The experiments included one or more person(s) entering and exit-
ing the room sequentially and simultaneously. Moreover, people in
the room were sitting, standing or walking.
The total pool of collected samples is divided into 4365 (80%) for
the training-set and 1092 (20%) for the test-set. The training set was
further subdivided into training and validation sets, with a ratio
of 4:1. The case distribution of the training-set and the test-set are
shown in Table 2. Here, the case value refers to the ground-truth
of the number of people in the room.
4.2 Error Analysis
Hyper-parameter tuning for optimization is an important process
in ML which defines a set of optimal parameters for a learning
algorithm. These parameters are typically not adjustable or cannot
change during the training process. For example, in a DNN, the
number of layers and neurons are hyper-parameters.
Edge Computing with Embedded AI for Occupancy Estimation INTESA ’19, October 13–18, 2019, NY, USA
0 1 2 3 4 5
0
1
2
3
4
5
99.8 0.2 0.0 0.0 0.0 0.0
7.7 89.7 2.6 0.0 0.0 0.0
1.7 6.8 91.5 0.0 0.0 0.0
0.0 0.0 1.5 98.5 0.0 0.0
0.0 0.0 0.0 0.0 100.0 0.0
0.0 0.0 0.0 0.0 0.0 100.0
Predicted label
Tr
ue
la
be
l
FNN Confusion Matrix
0%
20%
40%
60%
80%
100%
Figure 3: Thermal sensor FNN_L1_N512 original data confu-
sion matrix.
A grid-search is an approach to choose the hyper-parameters
where all the possible combinations of hyper-parameters are tried
from a grid of parameters values. In this work, we adopted this
approach for tuning hyper-parameters. The model is trained on the
training subset and its performance is evaluated on the validation
subset using minimum squared errors (MSE) as the loss function.
For all training epochs, the model state with the lowest validation
error is selected as the best representative for a particular set of
hyper-parameters. The training itself is performed with an early
stopping manner in which the process is terminated if the change
in validation error is less than 0.1% for 10 consecutive epochs. The
data-set is preprocessed before being fed to the neural network by
centering the mean to 0 and scaling to unit variance. Moreover,
each layer is augmented with appropriate dropout and Adam is
used for gradient descent optimization with tuned learning rate
value [10]. After finding the best hyper-parameters, the best model
is tested with the test-set to measure its prediction accuracy.
5 EXPERIMENTATION AND RESULTS
In this section, the experimental results are presented and analyzed.
The purpose of these experiments is to evaluate the chosen MCUs
while applying the trained people counting algorithms. This is
achieved by running a set of different DNN models on the MCUs in
inference mode. That will be followed by an accuracy analysis for
the models that work on the MCUs. We utilize the original data-set
and an augmented data-set or training and testing the models.
The accuracy of the DNNs trained with the original data-set
was remarkably high. Consequently, we decided to make it more
challenging for the neural networks by augmenting the data with
more corner cases that are expected to be harder to process by
the algorithms. The data augmentations used are (i) cropping, (ii)
flipping upside down, (iii) flipping left to right, (iv) zooming out,
(v) adding random noise, (vi) rotating the image, and (vii) blurring
the image. A subset of the different data augmentation techniques
utilized is shown in Figure 1. The FNN & GRU models are labeled as
FNN_xL_yN where x is the depth and y is the width of the model.
The CNN models are denoted as CNN_Kx_Fy_Lz where x is the
kernel size, y the number of filters and z the number of layers.
The total number of samples for the thermal sensor data-set after
augmentation is 12709 the same division method mentioned earlier
for separating into training, validation and test sets is followed.
The augmented data-set samples are divided to 10140 (80%) for the
training-set and 2569 (20%) for the test-set. The distribution os the
augmented training and the test sets are shown in Table 2.
5.1 Results
The DNNs provide robust and accurate results with the thermal
sensor data-sets. The data quality is suitable to result in high ac-
curacy even with the relatively simpler FNNs. The algorithm was
able to learn when to detect a temperature signature as a human
or another heat source. In Table 3, a side by side comparison of
the best performing models is presented. As shown, there are three
different novel solutions for the thermal sensor. The solutions have
different resources requirements. This allows the possibility of tai-
loring the algorithm based on the resources available in the MCU.
The resources required are shown in Table 3.
The Flash and RAM requirements of the models, together with
the execution time for the different models, are presented in Fig-
ure 2. The three DNN models that have been used are different in
their structure and thus cannot be directly compared. However, we
compare them against the targeted application of people counting.
An example of the structural difference between network types
is that the number of neurons and layers are lower in the GRUs
than in the FNNs. This is mainly because the GRUs depends on the
look-back and do not need a high width or depth. In consequence,
similar MSEs were achieved with a lower number of layers and neu-
rons in GRUs. The Flash requirements for the networks is reported
after compression using the X-CUBE-AI package.
In terms of resource utilization, the CNNmodels have the highest
impact on processor resources. Because of the convolution layers,
CNNs require larger RAM usage and longer computation times. On
the opposite side, FNNs are the simplest models in terms of net-
work structure, and this has a direct relation regarding the memory
usage and computation time. GRUs are situated in a middle point.
Nonetheless, because of the lower number of neurons and layers in
GRU models, their RAM requirements are also lower.
The best accuracy obtained with each of the models is very simi-
lar, ranging from 97.27% to 98.90%. The best FNN model achieves
a prediction accuracy of 98.90%, which considerably improves the
state-of-the-art results in people counting from thermal images.
The confusion matrix illustrating the performance of this model is
shown in Figure 3. Only the work from Abedi et al. [21] achieves
higher accuracy. However, in their case, the authors only detect
whether the room is empty or not, with a binary output. Moreover,
in that work, the machine learning analysis runs on cloud servers.
Implementing the models in embedded processors enables a more
robust design with lower latency. A comparison with other previ-
ous works is summarized in Table 4. Within the works utilizing
thermal cameras and estimating the exact number of people in the
image, our prediction accuracy is over 15% better than the previous
work by Tyndall et al. [4]. We also achieve the best accuracy within
embedded AI algorithms for any type of thermal or PIR sensor.
INTESA ’19, October 13–18, 2019, NY, USA A. Metwaly et al.
Table 3: Summary of prediction performance and system requirements for the best model of each network type.
Pred. Acc. F4 Exec. Time F7 Exec. Time Alloc. Flash Alloc. RAM Test MSE Valid. MSE
FNN_L1_N512 98.90% 44.141 ms 13.269 ms 196.07 KB 2.01 KB 0.0137 0.004
CNN_K3_F8_L3 98.26% 77.075 ms 18.435 ms 8.46 KB 22.13 KB 0.0174 0.039
GRU_L1_N12 97.27% 60.7 ms 20.097 ms 109.88 KB 0.055 KB 0.030 0.017
Table 4: Comparison of system setup, data analysis technique and results of our method with the state-of-the-art.
Sensor Placement Output Platform Processing Accuracy
Beltran et al. [2] PIR+Thermal Ceiling Numbered Tmote Sky Custom NA
Gomez et al. [3] Thermal Wall Numbered Cortex M4 CNN 53.7%
Tyndall et al. [4] PIR+Thermal Ceiling Numbered Arduino K* algorithm 82.56%
Abedi et al. [21] Radar+Thermal Ceiling Binary Cloud CNN 99%
Zappi et al. [23] PIRs Wall Numbered (0-3) GT60 MCU Custom 89%
Ours Thermal Ceiling Numbered (0-5) STM32F FNN 98.90%
6 CONCLUSION AND FUTUREWORK
Knowing the number of people can help manage resources in smart
buildings and places where automation can improve management
and dramatically reduce the total consumption of electricity hence
effectively decreasing greenhouse emissions. In this paper, we pre-
sented a novel solution for people counting with high prediction
accuracy. The proposed algorithms have low computational, power
and memory requirements making those suitable for resource-
constrained devices used in IoT-based applications. It is observed
that the thermal imaging technique is promising for counting peo-
ple and more effective than other approaches such as the ones based
on RGB cameras. Among the tested algorithms, FNN_L1_N512 re-
sulted in the highest accuracy of 98.90%. The two MCUs are able to
run the FNN_L1_N512 algorithm in inference mode. The algorithm
utilized 4% of CPU processing cycles on the STM32F401 MCU while
using 37% of its flash memory and less than 2.1% of total avail-
able RAM. In our experiments, two other models (CNN_K3_F8_L3
and GRU_L1_N12) resulted in similar prediction accuracy, offering
various choices for the flash memory and RAM and hence can be
tailored according to the available resources of the MCU. In this
work, we have focused on novel solutions of embedded AI enhanced
thermal sensor for counting people. In future work, we will extend
the dataset to include more cases, as well as study the impact of the
camera location and distance to subjects on the prediction accuracy.
REFERENCES
[1] A. Anjomshoaa et al. 2018. City scanner: Building and scheduling a mobile
sensing platform for smart city services. IEEE Internet of Things Journal (2018).
[2] A. Beltran et al. 2013. Thermosense: Occupancy thermal based sensing for hvac
control. In ACM BuildSys Workshop. ACM.
[3] A. Gomez et al. 2018. Thermal image-based CNN’s for ultra-low power people
recognition. In ACM International Conference on Computing Frontiers. ACM.
[4] A. Tyndall et al. 2016. Occupancy Estimation Using a Low-Pixel Count Thermal
Imager. IEEE Sensors Journal (2016).
[5] B. Moons et al. 2018. Embedded Deep Learning: Algorithms, Architectures and
Circuits for Always-on Neural Network Processing (1st ed.). Springer.
[6] B. Thomas et al. 2016. Thermal Imaging Systems for Real-Time Applications in
Smart Cities. Aalborg Universitet (2016).
[7] C. J. Bartodziej. 2017. The concept industry 4.0. In The Concept Industry 4.0.
Springer, 27–50.
[8] European Commission. 2002. European union directive on the energy perfor-
mance of buildings (EPBD). European Commission, Tech. Rep. 2002/91/EC (2002).
[9] D. B. Yang et al. 2003. Counting people in crowds with a real-time network
of simple image sensors. In Proceedings Ninth IEEE International Conference on
Computer Vision. 122–129 vol.1.
[10] D. P. Kingma et al. 2014. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 (2014).
[11] E. Griffiths et al. 2018. Privacy-preserving Image Processing with Binocular
Thermal Cameras. 1, 4 (2018).
[12] F. Jazizadeh et al. 2018. Personalized thermal comfort inference using RGB video
images for distributed HVAC control. Applied Energy (2018).
[13] F. Wahl et al. 2012. A Distributed PIR-based Approach for Estimating People
Count in Office Environments. 15th IEEE CSE and 10th IEEE/IFIP EUC, 640–647.
[14] I. Goodfellow et al. 2016. Deep Learning. MIT Press.
[15] J. Chung et al. 2014. Empirical Evaluation of Gated Recurrent Neural Networks
on Sequence Modeling. CoRR (2014), 1–9.
[16] J. Peña Queralta et al. 2019. Edge-AI in LoRabased healthcare monitoring: A case
study on fall detection system with LSTM Recurrent Neural Networks. In 2019
42nd International Conference on Telecommunications, Signal Processing (TSP).
[17] J. Yun et al. 2014. Human movement detection and identification using pyroelec-
tric infrared sensors. Sensors (Switzerland) 14 (2014).
[18] K. Hashimotoet al. 1997. People count system using multi-sensing application.
In Transducers 97.
[19] L. Qingqing et al. 2019. Edge Computing for Mobile Robots: Multi-Robot Feature-
Based Lidar Odometry with FPGAs. In 12th ICMU, IEEE.
[20] L. Qingqing et al. 2019. Visual Odometry Offloading in Internet of Vehicles with
Compression at the Edge of the Network. In 12th ICMU, IEEE.
[21] M. Abedi et al. 2019. Deep-learning for Occupancy Detection Using Doppler Radar
and Infrared Thermal Array Sensors. In ISARC, Vol. 36. IAARC Publications.
[22] Melexis. [n.d.]. MLX90640 32x24 IR array. Datasheet.
[23] P. Zappi et al. 2007. Enhancing the spatial resolution of presence detection in a
PIR based wireless surveillance network. 295 – 300.
[24] R. Mahmud, et al. 2018. Fog computing: A taxonomy, survey and future directions.
In Internet of everything. Springer, 103–130.
[25] S. Koebrich et al. 2017. 2017 Renewable Energy Data Book Including Data and
Trends for Energy Storage and Electric Vehicles Acknowledgments. (2017), 142.
[26] S. Lu et al. 2018. Dynamic HVAC Operations with Real-time Vision-based Occu-
pant Recognition System. In 2018 ASHRAE Winter Conference, Chicago.
[27] STM. 2019. User manual Getting started with X-CUBE-AI Expansion Package
for Artificial Intelligence ( AI ). January (2019), 1–62.
[28] T. K. L. Hui et al. 2017. Major requirements for building Smart Homes in Smart
Cities based on Internet of Things technologies. FGCS (2017).
[29] T. Nguyen Gia et al. 2019. Edge AI in Smart Farming IoT: CNNs at the Edge and
Fog Computing with LoRa. In 2019 IEEE AFRICON.
[30] T. V. Oosterhout et al. 2011. Head Detection in Stereo Data for People Counting
and Segmentation. 2003 (2011), 620–625.
[31] U. Jägare. 2019. Embedded Machine Learning Design FD Arm Special Edition. John
Wiley & Sons, Inc. 30 pages.
[32] V. K. Sarker et al. 2019. Offloading SLAM for Indoor Mobile Robots with Edge-
Fog-Cloud Computing. In ICASERT.
[33] V. K. Sarker et al. 2019. A Survey on LoRa for IoT: Integrating Edge Computing.
In Int. Workshop on Smart Living with IoT, Cloud and Edge Computing.
[34] Y. Agarwal et al. 2010. Occupancy-driven EnergyManagement for Smart Building
Automation. In ACM BuildSys ’10 Workshop. ACM, 1–6.