Supporting web programming assignment 
assessment with test automation and RPA 
 
 
 
 
 
 
Software Engineering 
Master’s Degree Programme in Information and Communication Technology 
Department of Computing, Faculty of Technology 
Master of Science in Technology Thesis  
 
 
Author: 
Tomi Salomaa 
 
Supervisors: 
MSc (Tech.) Sampsa Rauti (UTU) 
MSc Jari-Matti Mäkelä (UTU) 
 
 
 
 
October 2022 
 
The originality of this thesis has been checked in accordance with the University of Turku quality 
assurance system using the Turnitin Originality Check service. 
Master of Science in Technology Thesis 
Department of Computing, Faculty of Technology 
University of Turku 
 
Subject: Software Engineering  
Programme: Master’s Degree Programme in Information and Communication Technology 
Author: Tomi Salomaa 
Title: Supporting web programming assignment assessment with test automation and RPA 
Number of pages: 85 pages, 15 appendix pages 
Date: October 2022 
 
Automated software solutions to support and assist in assessment of student implemented applications 
are not a rarity, but often need to be custom engineered to fit a specific learning environment or a 
course. When such a system can be fielded in use properly, it has a tremendous potential to lighten the 
workload of course personnel by automating the repetitive manual tasks and testing student 
submissions against assignment requirements. Additionally, these support systems are often able to 
shorten the feedback loop which is seen to have a direct impact on student learning. 
In this thesis test automation and robotic process automation are researched to discover how they can 
be used to support web programming assignment assessment. The background on software testing, 
automation and feedback related pedagogy are researched mainly by the methods of literature review 
and expert interview. A third methodology – design science – is then applied for the purpose of 
verifying and extending the learnt theory in an empirical manner. A research artifact is created in the 
form of a prototype capable of supporting in assessment tasks. Performance of the prototype is 
measured by recording set execution metrics while assessing anonymized case study student 
submissions from a web development course arranged by University of Turku: DTEK2040 Web and 
Mobile Programming. 
Thesis concludes that to support assessment through test automation is to focus on unit and system 
level testing of functionalities while assuming the exact implementation at code level cannot be fully 
known. Suggestion is made that relying on assignment descriptions as basis for test design is not 
enough, but rather requirements engineering should be done together with course personnel to take 
advantage of their experience in what sort of errors are to be tolerated in student submissions. Thesis 
also concludes that automation can perform interaction with student submissions, file manipulation, 
record keeping and tracking tasks at a satisfactory level. The potential to shorten the feedback loop 
and summarizing quantitative feedback for the student is recognized, however, to build an automated 
system to identify, gather and summarize formative, pedagogically more valuable feedback was noted 
to be out of scope for this thesis and suggested as future work to possibly extend the prototype with. 
 
Keywords: automation, testing, RPA, robot framework, web application, assessment 
 Diplomityö 
Tietotekniikan laitos, Teknillinen tiedekunta 
Turun yliopisto 
 
Oppiaine: Ohjelmistotekniikka  
Tutkinto-ohjelma: Tieto- ja viestintätekniikan tutkinto-ohjelma (DI) 
Tekijä: Tomi Salomaa 
Otsikko: Supporting web programming assignment assessment with test automation and RPA 
Sivumäärä: 85 sivua, 15 liitesivua 
Päivämäärä: Lokakuu 2022 
 
Automatisoidut ohjelmistoratkaisut, jotka tukevat ja avustavat opiskelijoiden toteuttamien sovellusten 
arvioinnissa, eivät ole harvinaisia, mutta ne useimmiten joudutaan rakentamaan tiettyyn 
oppimisympäristöön tai opintosisältöön sopiviksi. Tällaiset järjestelmät omaavat kuitenkin valtavan 
potentiaalin keventää kurssihenkilöstön työtaakkaa automatisoimalla toistuvia manuaalisia työtehtäviä 
ja automaatiotestaamalla opiskelijoiden palauttamia tuotoksia asetettuja tehtävävaatimuksia vastaan. 
Järjestelmät johtavat varsin usein myös opiskelijan näkökulmasta nopeampaan palautesykliin, jolla 
kyetään todeta olevan suora vaikutus oppimiseen. 
Tässä opinnäytetyössä tutkitaan testiautomaatiota sekä robottiprosessiautomaatiota pyrkimyksenä 
selvittää kuinka näitä teknologioita voitaisiin hyödyntää tukemaan web-ohjelmointitehtävien 
arviointia. Ohjelmistotestauksen, automaation ja palautteen pedagogiikan taustoja tutkitaan pääasiassa 
kirjallisuuskatsauksen ja asiantuntijahaastattelun menetelmin. Lisäksi sovelletaan kolmatta 
metodologiaa, suunnittelutiedettä, jonka tavoitteena on vahvistaa teoriaosuuden havaintoja sekä pyrkiä 
empiirisesti laajentamaan niitä. Suunnittelutieteen kautta tutkimusartifaktina syntyy prototyyppi, jonka 
suorituskykyä ja hyötyjä mitataan keräämällä dataa hyödyntäen aitoja, anonymisoituja 
opiskelijapalautuksia Turun yliopiston järjestämän DTEK2040: Web and Mobile Programming -
kurssin tiimoilta. 
Opinnäytetyön johtopäätöksenä on, että arvioinnin tukeminen testiautomaation avulla on keskittymistä 
yksikkö- ja järjestelmätason toiminnallisuuksien testaukseen. Testaukseen on liitettävä myös oletus, 
että arvioitavan kohteen tarkkaa toteutusta kooditasolla ei voida täysin tuntea. Tehtäväkuvausten 
käyttö testitapausten suunnittelun perustana todetaan riittämättömäksi, ja vaatimussuunnittelu 
ehdotetaan tehtävän yhdessä kurssin henkilökunnan kanssa, jotta heidän kokemuksiaan voidaan 
hyödyntää yleisimpien opiskelijapalutuksissa ilmenevien virhetapausten kartoittamiseksi sekä 
testitapausten tarkkuuden ja arvioinnin jyrkkyyden säätämiseksi. Prosessiautomaation osalta todetaan, 
että automaatio kykenee suorittamaan vuorovaikutusta opiskelijoiden palautusten, tiedostojen 
käsittelyä, kirjanpito- ja seurantatehtäviä tyydyttävällä tasolla. Mahdollisuus palautesilmukan 
lyhentämiseen ja summaavan palautteen yhteenvetoon opiskelijalle tunnustetaan myös empiirisesti. 
Laadullisen, pedagogisesti arvokkaamman palautteen kokoaminen ja jalostaminen todettiin tämän 
opinnäytetyön mittakaavassa liian suureksi projektiksi ja sen empiiristä toteutusta ehdotettiin yhtenä 
mahdollisena jatkotutkimusaiheena. 
 
Asiasanat: automaatio, ohjelmistotestaus, automaatiotestaus, RPA, robot framework, verkkosovellus, 
arviointi 
Table of contents 
1 Introduction ........................................................................................................ 1 
1.1 Background ........................................................................................................... 1 
1.2 Problem statement and research questions ....................................................... 2 
1.3 Scope and delimitations ....................................................................................... 2 
1.4 Research methods and sources .......................................................................... 3 
1.5 Structure of the thesis .......................................................................................... 5 
2 Testing web applications .................................................................................. 7 
2.1 Objectives of software testing.............................................................................. 7 
2.2 Testing levels ........................................................................................................ 9 
2.2.1 Unit and integration testing .............................................................................................. 9 
2.2.2 System and acceptance testing ..................................................................................... 12 
2.3 Testing methods and techniques ........................................................................13 
2.3.1 Static testing .................................................................................................................. 13 
2.3.2 Dynamic testing ............................................................................................................. 15 
2.3.3 Black box techniques ..................................................................................................... 16 
2.3.4 White box techniques .................................................................................................... 20 
2.4 Test design and development .............................................................................22 
2.5 Challenges of web application testing ................................................................23 
2.6 Foundation for the first main research question ...............................................25 
2.6.1 Q1.1: Which testing levels should be focused on?........................................................ 26 
2.6.2 Q1.2: Which testing techniques are applicable for testing student submissions? ........ 26 
2.6.3 Q1.3: How to turn an assignment briefing into test cases? ........................................... 28 
3 Test automation and RPA ............................................................................... 29 
3.1 Differentiating between RPA and test automation .............................................29 
3.2 Use of automation in web application testing ....................................................32 
3.3 Automated formulating of feedback from an assignment solution ..................34 
3.4 Foundation for the second main research question ..........................................39 
3.4.1 Q2.1: What manual work related to assessing and feedback is there to automate? .... 39 
3.4.2 Q2.2: What kind of feedback should be gathered from the student solutions to 
assignments?................................................................................................................................. 42 
4 Combining test automation and RPA to assess assignments ..................... 45 
4.1 Formulating a design ...........................................................................................45 
4.1.1 General guidelines and automation targets ................................................................... 45 
4.1.2 Guidelines for supporting feedback ............................................................................... 47 
4.1.3 Assessment guidelines for the exercises ...................................................................... 48 
4.2 Generating a design .............................................................................................49 
4.2.1 Due diligence ................................................................................................................. 50 
4.2.2 Risk identification ........................................................................................................... 52 
4.2.3 Bot creation and dry run ................................................................................................ 53 
5 Implementation and results ............................................................................ 55 
5.1 Architecture overview ..........................................................................................55 
5.2 Environment .........................................................................................................56 
5.2.1 Development environment ............................................................................................. 56 
5.2.2 Software ......................................................................................................................... 57 
5.3 Pipeline .................................................................................................................58 
5.3.1 Logical structure and shell scripts ................................................................................. 58 
5.3.2 Robot scripts .................................................................................................................. 62 
5.4 Test cases .............................................................................................................66 
5.4.1 Part 0 – Basics of web applications ............................................................................... 68 
5.4.2 Part 1 – React and JavaScript ....................................................................................... 70 
5.4.3 Part 2 – Communication with server.............................................................................. 72 
5.4.4 Part 3 – Web application with database ........................................................................ 74 
5.5 Analysing and evaluating the design ..................................................................76 
5.5.1 Meeting the set requirements ........................................................................................ 76 
5.5.2 Quality of the test cases ................................................................................................ 78 
5.5.3 Answering the main research questions ....................................................................... 81 
5.5.4 Suggestions and further potential .................................................................................. 82 
6 Conclusions and future work .......................................................................... 84 
References 
Appendix A: DTEK2040 assessment process automation potential 
Appendix B: General requirements for DTEK2040 automated assessment 
system 
Appendix C: Prototype directory and file structure tree 
Appendix D: Pipeline execution times 
Appendix E: Example summary template 
Appendix F: Common keywords 
Appendix G: Custom library 
 
  
List of figures 
Figure 1: An example state transition diagram of a web site. .............................................................. 19 
Figure 2: Suggested implementation steps to take for creating new automation systems. ................. 40 
Figure 3: Robot Framework architecture by Robot Framework Foundation. ....................................... 50 
Figure 4: Pipe-and-filters implemented in prototype solution. .............................................................. 55 
Figure 5: Shell script logical layers. ...................................................................................................... 59 
Figure 6: Example of a test case card as a document. ……………………………………………………67 
Figure C1: Prototype directory and file structure tree. …………………………………………………...C-1 
Figure E1: Example representing summary template with results from ex2. ………………….………E-1 
 
List of tables 
Table 1: Queries used for searching scientific reference material for the thesis. ................................... 3 
Table 2: Eggert’s four stages of engineering design process. ................................................................ 4 
Table 3: Areas of evaluation and assessment where static testing is often applied. ........................... 14 
Table 4: Myers’ heuristics for identifying equivalence classes. ............................................................ 17 
Table 5: Valid and invalid inputs extracted from a specification by following the Myers’ heuristics. .... 17 
Table 6: A decision table based on a typical login page elements. ...................................................... 19 
Table 7: Sequential implementation levels of automation. ................................................................... 30 
Table 8: Automation potential and identified risk factors of assignment assessment in DTEK2040.... 31 
Table 9: Issues to consider when gathering, constructing and providing feedback. ............................ 43 
Table 10: Steps and tasks outline for robot implementation. ................................................................ 53 
Table 11: Development workstation specifications.. ............................................................................. 56 
Table 12: Packages installed on top of base image. ............................................................................ 57 
Table 13: Static automation tests implemented for DTEK2040 exercise 0. ......................................... 69 
Table 14: Dynamic automation tests implemented for DTEK2040 exercise 0. .................................... 70 
Table 15: Static automation tests implemented for DTEK2040 exercise 1. ......................................... 71 
Table 16: Dynamic automation tests implemented for DTEK2040 exercise 1. .................................... 71 
Table 17: Static automation tests implemented for DTEK2040 exercise 2. ......................................... 73 
Table 18: Dynamic automation tests implemented for DTEK2040 exercise 2. .................................... 73 
Table 19: Dynamic automation tests implemented for DTEK2040 exercise 3. .................................... 75 
Table 20: Average execution times per submission. ............................................................................ 77 
Table 21: Comparison of average prototype results versus manual results. ....................................... 79 
Table A1: Automation potential within steps extracted from the manual assessment process. .......... A-1 
Table B1: General system requirements built based on theory sections. ........................................... B-1 
Table D1: Pipeline execution times in seconds. ..................................................................................D-1 
  
List of codes 
Code 1: Example Robot Framework script contents. ........................................................................... 62 
Code 2: Example of executing the support_tasks.robot script with declared global variables. ............ 63 
Code F1: Common keywords. ...………………………………………………………..……………...…...F-1 
Code G1: Custom library. ………..……………………………………………………………...................G-1 
 
 
Abbreviations 
API Application programming interface 
GUI Graphical user interface 
ISTQB International Software Testing Qualifications Board 
JSON JavaScript Object Notation 
RF Robot Framework 
RPA Robotic process automation 
SPA Single page application 
UI User interface
1 
 
1 Introduction 
1.1 Background 
It is perhaps fair to say that learning to program is challenging. While learning the related theory is 
one part of it, one could argue that hands-on practice by coding solutions to different kinds of 
problems is extremely important. Due to practice being invaluable for learning programming, 
programming courses are often designed to be very practical, containing multiple programming 
assignments to allow students to learn through repetition of the principles or concepts being taught. 
The hands-on, learning-by-doing approach has been present in teaching of programming from the 
very beginning. From the very early on this task-and-assignment-based way of teaching 
programming has also sparked interest in being able to at least partly automate the work related to 
assessing student solutions. Descriptions of using automation to support grading of programming 
submissions are available from as early on as the year 1960 when Hollingsworth presented the 
automated grading system [17]. Hollingsworth used the automated grader during programming 
courses held at the Rensselaer polytechnic institute, a private research university in USA. 
The motives for using automation were very similar to what can be found mentioned in reviews and 
surveys written on the subject matter today [1; 10; 21; 29; 33]: increasing class sizes, extensive 
workload related to assessing and time required to manually perform the assessment process. 
Additionally, the research results often list many more reasons to use automation over purely 
manual approaches such as the consistency and accuracy of assessment as well as removal of 
unintended biases, as noted by Romli & al. [33, pp. 1186], for example. 
The research and overall interest for automated solutions in this field has been notable and as such a 
lot of advancements have also been made since the Hollingsworth’s automated assignment grader. 
For example, in the year 2018 Keuning et al. were able to identify over a hundred [21, pp. 11-12] 
tools that can automatically assess and generate feedback for programming exercises. The same 
review also indicated that many of these tools are often custom built for a purpose, such as a 
specific programming course or a thesis, or aim to support the teaching and assessment of very 
basic principles of programming. From this perspective this thesis does not seem to be unique, 
however, the need for pursuing a customized solution seems reasonable and is hard to avoid as long 
as programming courses offer unique content that needs to be taken into account. 
 
2 
 
1.2 Problem statement and research questions 
Automation is a possible avenue of approach to support an instructor in the assessment of student 
assignments. Implementing test automation and robotic process automation (RPA) together may be 
able to check for the basic functionalities of a student’s proposed solution to an assignment, but to 
also collect and summarize individual and solution specific feedback. Aside from the hands-on 
assessment work, automation has the potential to help save workhours spent in the so-called 
business processes of the overall process of grading students on any course. 
This thesis aims to develop a prototype system, while relying on open-source and free-to-use tools, 
that combines test automation and RPA for the purposes of supporting programming assignment 
assessment work. The concept seeks to integrate with already existing manual workflows and 
platforms for DTEK2040 Web and Mobile Programming, which is described as intermediate level 
studies and arranged by the university of Turku as part of the Bachelor of Science (technology) 
studies in information and communication technology. 
The resulting prototype serves to present potential viability of automation in the areas of automation 
testing and RPA for programming courses while being specifically developed with DTEK2040 web 
application related exercises in mind. This study also researches aspects related to programming 
assignment feedback and as a part of the results presents ideas how these aspects could be later 
incorporated into an automated solution to possibly enhance student learning. 
Thesis presents, and through the course of this study answers, the following research questions: 
Q1: How to support the assessment of web application programming assignments with test 
automation? 
Q1.1: Which testing levels should be focused on?  
Q1.2: Which testing techniques are applicable for testing student solutions to assignments? 
Q1.3: How to turn an assignment briefing into test cases? 
Q2: How to support the assessment and feedback process of assignment assessing with RPA? 
Q2.1: What manual work related to assessing and feedback is there to automate? 
Q2.2: What kind of feedback should be gathered from the student solutions to assignments? 
1.3 Scope and delimitations 
Thesis studies software testing, test automation and RPA, which are observed within the context of 
applying the methods and technologies to support programming assignment assessment. To further 
3 
 
narrow the scope, programming assignments have been limited to web application programming 
assignments as described in the course contents of DTEK2040. 
As this study proposes to construct a prototype to empirically verify the theoretical research efforts 
and their viability, but to also provide limited automation functionality for student submission 
assessment and scoring, the use of DTEK2040 as a case study target applies its own limitations to 
the scope for developing the prototype. The technological choices and assignment requirements that 
are in use within DTEK2040 must also be considered during development. While these hard 
delimitations most certainly guide the research, consideration of the generalizability of the final 
solution and results of this research are carried along throughout the study. 
The prototype will use a tool called Robot Framework (RF). RF is a “generic open-source 
automation framework” [32] that originated at Nokia but has since become a framework maintained 
by a registered association called Robot Framework. Today the framework is widely used for 
testing and robotic automation in the software industry and beyond. [31] The choice of this 
framework has been affected by personal familiarity with the tool but also the extensive range of 
available libraries and programming language support of the framework which can be seen to help 
generalize the achieved solution in the future. 
1.4 Research methods and sources 
Three main research methods are used in conducting this study: literature review, expert interview 
and design science. The primary platforms used for searching academic articles were ACM Digital 
Library, Google Scholar and IEEE Xplore. Identical search queries were used in each one. The 
queries were constructed from a base query that was supplemented with a correlating topic specific 
support-query to find results. E.g., to find web application testing articles the base query would be 
extended with the “Testing” support query. The queries are presented in Table 1. 
Table 1: Queries used for searching scientific reference material for the thesis. 
General topic Query string Query type 
Web application (web OR website OR “web application” OR React OR Angular 
OR Vue OR JavaScript OR HTML OR DOM OR Python OR 
Java) 
Base 
Testing AND (test OR testing OR "dynamic analysis" OR "static 
analysis") 
Support 
Automation AND ((automation AND (test OR testing)) OR "robotic process 
automation" OR RPA) 
Support 
Assessment, feedback AND (pedagogy OR learning OR course OR (programming 
AND assignment) OR assessing OR grading AND feedback) 
Support 
4 
 
 
Additional inclusion criteria for academic sources were that the article should be peer-reviewed and 
published in the year 2005 or later. Other relevant book sources are extracted from the references 
found from academic articles. Aside from articles and books, web-based sources from well-known 
and respected authorities such as the ISTQB are included as their material is often referenced in the 
industry. Expert interviewing is used to supplement the findings from literature review, especially 
to gain the perspective relevant to the focus determined by DTEK2040.  
The used methods of literature review and expert interview are also tied with design science. Design 
science is applied in the empirical part of this study. There the theory gathered prior is used to 
develop the design science artifact, a functional prototype, by following engineering design process. 
The process that is being followed is described by Eggert to contain four stages [11, pp. 6-8] as 
represented in Table 2. 
Table 2: Eggert’s four stages of engineering design process. 
Stage Activities Goal 
Formulation Gather information such as requirements, 
performance targets, constraints and 
considerations. 
To understand the problem and to start 
preparing a plan for its solution. 
Generating Synthesize or generate alternative designs to 
satisfy the gathered expectations. 
To produce alternative design 
candidates for later analysis and 
evaluation. 
Analysis Predict the performance and/or behaviour of the 
design candidates. 
To deduce whether a candidate design 
satisfies previously set constraints.   
Evaluation Compare design candidates based on their 
predicted performance by using criteria gathered 
in the formulation stage. 
To decide the best design alternative 
to be implemented in practice. 
 
While the described process is linear in the sense that it should be followed from formulation to 
evaluation in order, the generating and analysis stages together form a possible redesign iteration 
within the process. If a design candidate does not satisfy the constraints set during the formulation 
stage, it may be taken back into the generating stage for alterations and then reanalysed. 
In this thesis the formulation stage is covered with Sections 2 and 3 which provide the necessary 
background information to understand the problem. These sections also partly give basis to design 
specifications that are based on theory gathered from literature and expert interview. The rest of the 
stages will be covered within the contents of Section 4 before finally presenting the results achieved 
with the decidedly best design implementation in Section 5. 
5 
 
1.5 Structure of the thesis 
This thesis approaches developing the prototype to answer the research problem through four 
sections covering relevant theory and implementation. The first two sections following this 
introduction lay the foundation for testing, test automation and RPA. For these sections the most 
relevant sources come from literature and past research conducted on the subject. The latter two 
sections aim to build upon this theory and implement a combined solution as well as present the 
detailed results. For these sections literature and documentation sources are enhanced with the 
results of conducted expert interview. 
The theory starts with Section 2. Here the required background information on testing of web 
applications will be provided. The aim is to cover the basis of and seek answers to sub-questions of 
Q1 through the following sub-sections: 
2.1. describes the traditional objectives set for testing and how these could be considered when 
assessing solutions for programming exercises. 
2.2. provides an answer for Q1.1. by considering the most usual way of dividing testing efforts 
based on their focus and highlighting the most meaningful levels of testing in the problem 
context. 
2.3. provides an answer to Q1.2. by considering and comparing testing techniques, considering 
the special aspects of course exercises such as their potentially transforming nature. 
2.4. provides an answer to Q1.3. by listing what should be considered when building a test case 
and how these notes should be incorporated with exercise assignments. 
2.5. describes the most common challenges related to web application testing and how these 
should be considered when building the test cases or even the assignments. 
The theory continues in Section 3. In this section the goal is to research and further define the 
differences between test automation and RPA. This section also researches the importance, nature 
of relevant feedback and the related principals to be considered when extracting feedback topics 
from student solutions to exercises. Section 3 seeks to answer the sub-questions of Q2: 
3.1. describes the differences between test automation and RPA. Justifies their use for specific 
tasks and purposes in the assessment process and provides principles for implementing 
6 
 
automation. Partly answers Q2.1. by exploring what to consider when implementing RPA 
and what could be the potential targets in DTEK2040 assessment process. 
3.2. describes the motives and requirements for using automation in web application testing. 
Completes the answer to Q2.1. by gathering principal ideas and focus areas to consider in 
terms of test automation and assessment for automation testing. 
3.3. researches and describes the feedback that can be seen useful to advance the learning of 
programming. Considers what should and what should not be automated as well as answers 
Q2.2. by proposing a list of observations to be gathered and formulated into feedback from 
student solutions to exercises. 
After these theory sections a model solution will be proposed. Section 4 describes this model with 
its open-source tools and libraries used as well as the overall functionality. The detailed 
implementation is then presented in Section 5 along with the results. Results are gathered from 
testing the thesis artifact with anonymized student solutions received as case study material from 
past instances of DTEK2040, and once analysed will be used to present the collected answers to 
research questions Q1 and Q2. 
7 
 
2 Testing web applications 
 
2.1 Objectives of software testing 
In the software industry, software testing is a quality assurance technique implemented throughout 
the software development lifecycle and applied widely to verify and validate different aspects of the 
end-product. This means that in the industry practices testing is often incorporated from the very 
early stages onwards to assure the quality by evaluating work-products such as requirements, 
specifications, program design as well as source code. [18; 23] As a process, testing can be said to 
contribute towards bettering reliability and the overall quality of the program under test by verifying 
and validating the various aspects of said product before it is brough available to end-users. 
According to sources such as Myers & al. the main goal of software testing as a process is to “find 
as many of the errors as possible” [23, pp. 6] to achieve the goal of increasing quality. 
Apart from finding errors - and in this case finding errors from the perspective of assignment 
assessment taskers - testing is often credited with many other objectives as well. For example, 
ISTQB brings up aspects such as (a) building confidence to the quality of the product and the 
development work and (b) providing a safety net for the developers to do their work as being 
relevant goals for software testing. From assessment support perspective this could perhaps 
translate into boosting the student’s morale by shortening the feedback loop through the help of 
automation testing. Ultimately the objectives set for software testing may vary depending on the 
overall context and details such as the test level that the testing efforts are being focused on. [19] 
This can also mean that by focusing to the pedagogical aspects extractable from testing results, 
software testing may very well lend itself to enhancing teaching efforts and student learning instead 
of only seeking errors in the system under test. 
Though, it could be seen that quality as a central value remains even when applying software testing 
as a support for programming assignment assessment activities. The motivation behind testing for 
product quality can be seen to differ from an industrial context: testing activities are not there to 
necessarily assure an instructor or lecturer of the student’s work quality but rather to support the 
pedagogical process. Thus, testing and the testing objectives need to be justified through 
pedagogical standpoints [18] and be able address the need for feedback of the learner who is also 
the developer of the work product under test. 
8 
 
Feedback itself can be formed through formative and summative assessment of assignment. From 
the formative point-of-view, quality as an objective means focusing on aspects that provide the 
learner opportunities to improve one’s knowledge and skill that are relevant and within the scope of 
the studies the programming assignment relates to. On the other hand, testing objectives derived 
from the summative assessment point-of-view should provide the basis for making judgements 
about student achievements and progress, i.e. grading the student work product. [7] 
Many studies propose and support the view that assessing for functionality is the most common 
approach when assessing programming assignments [1; 18; 29; 33]. Typically testing objectives for 
functional testing are gathered from basis consisting of given requirements and specifications such 
as business requirements, user stories, use cases or specific functional requirements [19]. The end-
goal is to verify and validate that the system does what it is supposed to do. In the cases of typical 
programming assignments, functional requirements often boil down to the given assignment 
description and sub-task descriptions. 
Software testing is understandably an integral part of web application development. Same reasons 
of bettering the overall quality of a software product apply to web applications as to any other 
software product. From the testing perspective web applications have a lot of common ground with 
traditional desktop applications when it comes to testing functionality, configuration and 
compatibility aspects of the product. They also present some unique issues that need to be taken 
into consideration. Some of the issues and considerations to note are underlined by Arom & Sinha 
in their review on techniques, tools and state of the art of web application testing. The aspects they 
raise in their research are: (1) performance requirements deriving from large user population, (2) 
state change related faults, (3) web browser related compatibility issues on top of operating system 
related compatibility ones, (4) multiple potential error occurrence points within a typical multi-
tiered web application architecture and (5) the dynamic nature of software components being 
rendered at runtime based on user input as well as server response. [3] 
Most of the theory and observations presented above about the objectives of software testing are 
also further supported by the results of the expert interview conducted with the personnel teaching 
and assessing student submissions for DTEK2040 [22]. From the results of this interview, one could 
analyse that at least the following notes are related to testing objectives: 
• Finding errors from submissions is a core task; gained results are a basis for scoring and 
forming feedback. 
9 
 
• Assessing for functionality is the most common approach as the assignments often are very 
clearly formed to contain a set of specific functionality requirements; correct 
implementation reflects the student’s understanding of the corresponding subject. 
• Regarding course assignments dealing with React applications, state change related faults 
are among the somewhat recurring types of faults between different course iterations that 
should be looked for. 
Regarding the assessment of non-functionalities, it was mentioned that if such aspects are 
considered then they should be derivable from the task assignment as clear requirements. Assessing 
the visual quality or structure of the code was mentioned to be often too difficult due to the 
subjective nature of such assessment. Answers provided also gave basis for understanding that 
many non-functional aspects were not necessary to be assessed given the scope and focus of 
DTEK2040. The reasoning behind very scoped assessing of functionalities was that the 
functionalities present in any given student submission is affected heavily whether an aspect was 
considered in the course material and the student should be able to present understanding of it. 
Nevertheless, the theory and observations backed up by the interviewees are interesting and should 
be considered when also thinking about which testing levels are relevant and should be focused on.  
2.2 Testing levels 
Software testing can be scoped to target certain abstractions levels. A common way of representing 
the testing levels and tying them to software development specifications is to present both in a so-
called V-model where unit testing tests for unit specifications, integration testing tests for 
subsystem design, system testing tests for system specifications and acceptance testing tests for 
business needs and constraints. Names given to these testing levels may vary, with the most 
commonly variation being at the unit level. Sometimes this most atomic level of testing is referred 
to as module [4, pp. 104; 23, pp. 85] or component [21] level as well, but they all essentially have 
the same meaning. 
2.2.1 Unit and integration testing 
Unit testing focuses on software components that can be tested and verified separately in isolation 
from the rest of the software. Units can be thought to form the backbone of all functionalities of a 
software product and testing at unit level is considered important because defects at this level may 
be difficult to identify later when the whole software system is being considered. [4] Testing at this 
10 
 
level is a widely accepted practice in the industry and frameworks for testing are readily available 
almost regardless of the programming language or technology of choice [7]. 
In the most practical sense, unit testing is the process of testing entities of software such as 
functions. Main purpose is to catch local defects in that entity at the algorithm level. This also 
means testing activities themselves without exception need to rely on accessing the individual unit 
at a code level to be able to perform any sort of meaningful validation or verification. It is also why 
unit testing is often preferred to be performed by the developers themselves as they develop the 
units [7; 19] instead of an outsider tester. 
Most typical defects detected at unit level are incorrectly implemented functionalities due to 
incorrectly coded logic or incorrect data flows. While the testing objectives for this level are almost 
exclusively related to functional testing, the main challenge at unit testing level lies with the test 
cases. Depending on the test coverage type and exhaustiveness, a single unit may require a plethora 
of written test cases since defects may need to be considered from the perspectives of execution 
paths that would not necessarily be obvious when considered from a business logic perspective of 
the system. [4; 23, pp. 85-111] 
Unit level also poses certain challenges when observed within the context of this study: unit content 
variance. Basic programming assignments such as the ones requiring the student to write a method 
that takes certain parameters as input and then produces a required output are straightforward to 
test. Verifying that the method has been written as instructed should be easy in such cases but for 
more intermediate assignments the approach may not be as strictly set. As an extension, this also 
means that writing unit tests beforehand to use for supporting assessment activities is challenging 
because the student may quite freely approach the assignment when it comes to creating units. As it 
stands with software product development, the already mentioned habit of developers writing their 
unit tests perhaps springs from this. Often the one writing the unit logic is also the best person to 
match the unit test to that logic; writing these tests without knowing the exact behaviour on 
structure of the unit is quite a challenge. 
In some programming courses relying on testing as a supportive element to assignment assessment 
the challenges of unit testing are partly taken into consideration by injecting hard requirements into 
the assignments. These hard requirements may include for example forcing the student to include a 
method named certain way into the program or, in a more web related context, giving forms and 
elements within the form certain identifiers by defining id attributes to be used. These requirements 
make the student implementation more testable as certain attribute values can be expected and used 
11 
 
with pre-created unit tests that are run against the submission.1 The challenges related to testing 
individual algorithms at a code level are not entirely confined to unit level, but they do lessen as we 
proceed to higher-level testing. 
From unit testing level the next step upwards is integration testing. Modern web applications are 
complex and, especially with service-oriented architecture solutions for example, integrate many 
components and data sources. This is also why integration testing is often another key testing level 
when web applications are being developed. The testing activities performed at this level can be 
done to verify that the interaction points of the web application work as intended and that the data 
flow between individual units and even from database interfaces is in a valid and required form. 
[34] Because the focus at this level is on the interactions and interfaces between separate units and 
as such it also often presents additional requirements for the test environment in the form of stubs 
and drivers. 
While integration testing is a step up from the unit level, it still requires knowledge about the 
structure of the program and the interacting units to be carried out thoroughly. This is also why the 
usability of integration testing as a supportive test level for assessing support can be seen to suffer 
from many of the challenges that are also present at the unit testing level. However, integration 
testing does include certain traits that make it a little less inclined to requiring exact knowledge of 
separate units in terms of testing goals. For example, we may still write a test for an interface if we 
know two units should communicate by transferring data in JSON-format to test that format is 
being complied to, even though we would not know beforehand how the units themselves are going 
to behave internally. Though, even if integration testing may focus largely on testing interfaces, 
given the same approach of using hard requirements in assignments integration testing could be 
seen as even more of a valid focus when assessing certain web application programming 
assignments and parts of larger tasks, such as those concerning and dealing with APIs and database 
queries in general. 
With DTEK2040 it is mentioned that the challenges brought up here regarding the use of unit and 
integration testing certainly exist and would most likely prove a challenge to automated testing in 
the current state of the course assignments. However, an additional note was also made that of 
course it is possible to modify the currently existing assignments and the tasks that are given so to 
 
1 Such techniques are assumable observable with Web Software Development course, for example. The course in 
question has at times very strict asserts in its automation tests but on the other hand allows for automating majority of 
the assessment work on part of web programming tasks. The course (https://wsd.cs.aalto.fi/) is arranged by Aalto 
university. 
12 
 
better be able to support the testing and automated assessment at these levels of testing. [22] This is 
an important note in the sense that testing at these levels does not necessarily need to be a one-way 
street. While the test object has the functional requirements that need to be fulfilled, it can also be 
built in such a way that it readily supports specific testing activities. An example of this with web 
applications would be the already mentioned requirement of using a certain id attribute with a 
specific element so that it is more predictably accessible and available for web automation test 
engines. 
2.2.2 System and acceptance testing 
It is worthwhile to note that system testing is a very large area of interest even from a study 
perspective [13] and it can be described or understood in many ways [23, pp. 119-131]. Though, a 
common notion is that system testing is the first level to consider also the non-functional 
requirements set for the system under test. Once the functionalities have been verified through unit 
and integration testing, system testing can be performed in the system environment with proper unit 
integrations and interactions. The focus is often on the end-to-end tasks and validation of business 
behaviours. 
While test types are usually not too tightly boxed to certain levels of testing [19], system level can 
still be considered the level to perform testing activities related to stress testing, usability testing, 
security testing and configuration testing due to the nature of system testing [23]. Because system 
testing can also be mainly concerned with the non-functionalities of the program, the tests 
themselves also tend to follow a testing methodology that is concerned whether the observed 
outputs to specific inputs are equal to the expected outcome that is based on set requirements. This 
sort of approach makes many of the system tests less or entirely separated from knowledge of the 
inner workings of the code that will be executed when the actual test is performed. 
Acceptance testing is a lot like system testing in this manner, however, the point-of-view for testing 
activities is traditionally from the end-user perspective and the goal to accept the system for 
production. In this sense the testing done at his level can be more confined compared to the system 
testing, focusing almost entirely on testing if the software meets the defined business requirements 
and workflows rather than individual functionalities or non-functional aspects of the system. 
Additionally to the business-focus the most notable difference to system testing is the absolute 
exclusion of the inner machinations of the program under test. [6]  
13 
 
From the perspective of this study, both the system test and acceptance test levels provide 
opportunity to be the focus levels for testing activities. Uncoupling test cases from the need to know 
specifically how the software under test has been built has potential to make the tests more reusable 
and does not unnecessarily limit the student from creating differing solutions that still meet the 
requirements from assignment point-of-view. System test and acceptance levels are also able to 
consider any non-functional requirements that might be part of a programming assignment, such as 
requirements for usability or security. 
It is, however, important to note that for DTEK2040 assignments system testing is not considered to 
be able to fully cover the assessment and feedback requirements from pedagogical point-of-view 
[22]. It is mentioned that the coverage of testing should at least be extended to consider the course 
material thoroughly enough instead of just verifying functional outcomes. For example, the returned 
student assignment might seemingly be able to produce expected, correct behaviour when the 
system under test is system tested, this does not always mean that the implementation follows the 
provided study material. In such cases deeper scrutiny should be able to uncover that, for example, 
the application is creating unwanted side-effects or performing against the core principles of a SPA, 
which in turn could be considered as a defect from assessment perspective. The need for testing to 
cover the pedagogical aspects as well places interesting challenges in terms of deciding what kinds 
of testing methods and techniques to use to achieve meaningful support role for assessing student 
submissions. 
2.3 Testing methods and techniques  
2.3.1 Static testing 
Software testing methods can be divided into two approaches: static and dynamic. Static testing is 
more generally also referred to as static analysis and it is traditionally performed manually by 
examining a work product. In quality engineering static testing allows for quality assurance to 
participate in the software development very early on as static analysis can be performed ideally as 
early as the first requirements are being formed for the software being developed. From testing and 
quality assurance perspective utilizing static testing is also extremely beneficial as often the defects 
found early in the software development life cycle are not only cheaper but also simpler to fix than 
the ones waiting to be discovered as failures during compilation or runtime of a program.  
Static testing, also often referred to as static analysis, is also a well adopted approach when it comes 
to automated grading and assessment of programming exercise solutions [1; 14; 17]. The approach 
14 
 
and the related methods and techniques are also a growing research trend in the automation 
assessment field as observed by Paiva et al. in their review of automated assessment in computer 
science education [27]. One explanation offered by Paiva et al. in their review for the growing 
interest is that static analysis allows for a more human-like grading and feedback while also being 
more consistent in grading and feedback quality due to the automation. Other observations that 
support the use of static approach are perhaps more regarding its practicality: static analysis is often 
less demanding to perform overall as it does not require large-scale test suite scripting, setting up an 
environment and then executing the program to assess it. Due to this the approach is also said to 
provide some additional security aspects but also allows for assessing solutions that are only partly 
functional or unable to be executed successfully. [14] 
Typical areas of an assignment solution to be evaluated and assessed through static means are for 
example (1) coding style, (2) programming errors, (3) software metrics, (4) design and (5) special 
features [1]. These are presented by Ala-Mutka in her survey of automated assessment approaches 
and their contents are partly represented in Table 3 to provide more insight into the individual areas. 
Table 3: Areas of evaluation and assessment where static testing is often applied. 
Static testing objectives 
Category Example objectives for analysis 
Coding style Syntax 
Structural deficiencies 
Unused variables 
Language standards and best practices for readability 
Maintainability 
Programming errors Dead code 
Redundancy 
Logical errors 
Anti-patterns 
Software metrics Application size 
Lines of code 
Complexity 
Design Structural similarity 
Design patterns 
Special features Keywords 
Regular expressions 
Plagiarism 
 
Gupta has also researched the use of static analysis for source code assessment purposes [14]. He 
observes that when static analysis is performed on source code, the process should start by 
generating an intermediate representation of the work product and then standardizing the 
representation to reduce diversity. The forms themselves could be characters and strings, abstract 
15 
 
syntax tree or graphs, for example. After this the static analysis can be carried out in so many 
techniques, but often in terms of testing the technique is to compare the work product to an example 
model and assess metrics such as similarity between them. 
2.3.2 Dynamic testing 
Static testing is often complimented with dynamic testing which involves testing the work product 
through executing the program to probe for failures and find defects. It is also worth mentioning 
that while one of the benefits of static testing was the fact that it allows for testing products that 
might not even be able to execute, in terms of programming assignments producing a solution that 
is able to be executed could be considered a desired minimum requirement, especially on an 
intermediate level course. 
In Section 2.1. of this thesis it was also noted that when considering the objectives of software 
testing from the perspective of existing automated assessment tools, assessing assignments for 
functionality appears to be the most common approach. Testing for specific required functionalities 
is also often easier and more straight-forward to test by interacting with the program rather than 
attempting to analyse the flows in a static manner. Interacting also reveals the true behaviour of the 
program and in that sense dynamic testing is crucial for the non-functionalities, or the business 
logic, as well and therefore dynamic testing is an important and integral approach to consider when 
creating a solution for automatic assessment of programming assignments. 
Dynamic testing is the approach that is usually conducted when test design and various testing 
techniques are being considered. Whereas static testing mostly takes the form of some sort of a 
review process with or without automated tools, dynamic testing could be said to culminate in the 
action of executing a designed test case. This is to be done by providing the system under test a 
specific input and observing if the output meets set expectations. [15; 23] 
Testing and test design techniques within the dynamic testing approach are commonly divided into 
two categories. The division can be described to be based on whether knowledge of the internal 
structure of the system under test is required or not [15; 23; 33]: black box techniques that are 
designed based on system specifications and models, and white box techniques that are based on 
internal structure of the system and its components i.e., on the code. While this divide seems to be 
widely accepted in the industry, the techniques might not always turn out to be only black and 
white; often a test design technique might be a combination of the two and as such usually referred 
to as a grey box technique. Categorization into experience-based techniques [15, pp. 81] is also 
16 
 
sometimes used, however, in this study the theoretical categorization will only be done into black 
and white box techniques on the basis mentioned before while also accepting that techniques may 
be used jointly between categories to perform more thorough testing to achieve the desired testing 
objectives. 
2.3.3 Black box techniques 
Black box techniques are called as such because they are by nature data-driven and rely on input / 
output outcomes to produce test results [23, pp. 8-10]. The inner workings of the system are not 
visible or even of interest from a testing perspective and as such it is often imagined that the system 
itself is a metaphorical black box that only takes input and produces output without a view to what 
is happening precisely during this process. Sometimes these techniques are also referred to as 
specification-based techniques [15, pp. 82] or functional testing [25] techniques. 
Black box testing is currently the more relied on category out of the two when it comes to existing 
automated assessment tools [33, pp. 1187]. A plethora of black box testing technique variations 
exist [33] but the most frequently agreed upon techniques of black box testing based on seminal 
works [15; 23] and software testing related studies [25] are: (1) equivalence partitioning, (2) 
boundary value analysis, (3) cause-effect techniques, (4) all pairs or pairwise testing and (5) error 
guessing. 
The first technique, equivalence partitioning, aims to minimize the total number of test cases by 
partitioning the input domain into equivalence classes where representatives under a specific class 
can be expected to produce the same output when used as an input. Identifying these equivalences 
begins by identifying input conditions from the defined software specifications after which the 
conditions can be partitioned into groups. Myers proposes a few heuristics [23, pp. 51-52] 
represented in Table 4 [pp. 17] for identifying equivalence classes. 
  
17 
 
Table 4: Myers’ heuristics for identifying equivalence classes. 
 Number of equivalence classes to be identified 
Identified input condition Valid Invalid 
A range of values 1 2 
Specified number of values 1 2 
A set of input values + 
a reason to expect each set being 
handled differently by the system 
1 for each set 1 for each set 
A “must-be” situation 1 1 
 
To provide a simple, but concrete, example of applying the technique in web application context we 
can consider the following specification for a text input element: The username input field must 
only allow for strings that consists of alphabet characters. The input string can only be longer than 
or equal to 5 characters, but no longer than 12 characters. From this specification we would be 
able to deduce the following contents presented in Table 5. 
 
Table 5: Valid and invalid inputs extracted from a specification by following the Myers’ heuristics. 
Input condition Valid input Invalid input 
Input string must 
consist of alphabet 
characters only 
for each character in 
toLowerCase(input): 
  character ∊ {a, … , z} 
for any character in string.toLowerCase: 
  character ∉ {a, … , z} 
Input string is between 
[5 - 12] characters in 
length.  
4 < length(input) < 13 length(input) <= 4 length(input) >= 12 
 
The example also demonstrates the technique can narrow down the amount testing required to 
verify functionality through assuming that every combination of alphabetical characters is valid if it 
fits the length requirement and is vice-versa invalid if even one of the characters is included is non-
alphabet or does not fit the length requirement range. 
  
18 
 
Boundary value analysis as a technique builds on top of equivalence partitioning. As a technique it 
exploits the knowledge that the edges - or boundary values - of equivalence classes are usually 
where errors causing defects are more often discovered [15; 23]. The most notable difference to the 
previous technique is that not all elements within a class are equally representative of that said 
equivalence. Instead, the edge values of a class are taken as representative elements. To tie this with 
the example already used, with boundary value analysis we would only perform our test cases with 
input strings 5 and 12 characters in length instead of assuming that string of any length between 5 
and 12 would do to test the functional validity. 
This technique may also be applied in different variations. Both bottom and upper boundaries may 
be tested, and the boundaries may be tested for only valid or for only invalid values. The most 
thorough way of applying boundary value analysis is to perform a so called three-point analysis 
where tests are targeted and expected results validated for exact boundary value, value directly 
above and value directly below. 
The technique understandably may create more tests for a given function that basic equivalence 
partitioning, but it is also considered to be more able in catching errors. It’s also worth to note that 
boundary values are not always present and as such the use of boundary value analysis is not always 
an option even if equivalence class technique can be used: for example, any classification of non-
ordinal objects are rarely potential targets for boundary value analysis. 
Equivalence partitioning and boundary value analysis are good techniques for limiting the amount 
of test cases in specific situations where the input data or output results can be expected to act in 
parts equally and as such can be also classified. However, this can also be undesirable; sometimes it 
might be needed to explore potential input combinations to search for errors. Cause-effect 
techniques in this study refer to a set of techniques that aim to accomplish the exploration of 
possible input combinations and the resulting state transitions and outputs. 
Combinations can be explored meticulously through graphing techniques such as cause-effect 
graphing [23, pp. 61-80] but in practice - especially if the system under test is relatively simple - the 
mapping is done by collecting a set of conditions and expected outcomes into a decision table such 
as the one in table below. Table 6 [pp. 19] represents a minimalistic example of a decision table 
created based on a login page of a web application. 
  
19 
 
Table 6: A decision table based on a typical login page elements. 
Condition Rule 1 Rule 2 Rule 3 Rule 4 
Username correct (True / False) F F T T 
Password correct (True / False) F T T F 
Expected output     
Redirect to “/home/myaccount” F F T F 
 
State transition testing and use case testing [15, pp. 91-96] are also considered to belong into the 
category of cause-effect techniques and will be treated as such in this study. Transitions and use 
cases are useful from the perspective of web application testing in a sense that page transitions and 
redirects offer natural basis for designing state transition diagrams. The state transition graphs are 
then converted into state tables, as in the example shown in Figure 1, which are then used to 
produce test cases. 
 
 
Figure 1: An example state transition diagram of a web site. 
 
Cause-effect techniques are advantageous in a sense that they can focus, visualize, and make clear 
of the expected system behaviour at a very high level which provides useful basis for testing what 
most likely matters to the end-user. While these techniques may be useful for a system with limited 
20 
 
states and complexity, such as the authentication-based state transfer illustration above, using them 
may quickly become tedious for illustrating and mapping larger systems unless the process can be 
aided with automated tools. 
Software testing is not all graphs and tables, however. Often experience and intuition of the tester 
plays a great role in hunting for errors and that is also why it is not too rare to find certain 
techniques labelled under the label of “experience-based”, as mentioned at the end of Section 2.3.2. 
One such black box technique is error guessing, which foregoes the afore mentioned other 
techniques to simply create test cases for errors that are deemed probable in the given context based 
on experience and intuition [15, pp. 118-119; 23, pp. 80-81] of either the tester or the collective 
consisting of testers and other stakeholders. 
While such a technique may feel unreliable it is also important to realise that none of the techniques 
are mutually exclusive but rather complementary of one another when applicable. Error guessing 
can be especially potent technique for creating test cases from the perspective of programming 
assignment assessment since the lecturers and assistants involved with any given course may have 
experience regarding hundreds if not thousands of assessed assignments throughout the years the 
programming course has been taught. 
2.3.4 White box techniques 
Dynamic white box techniques are concerned with logic coverage: the basis for these techniques 
comes from the internal structure and the paths, statements, decisions, or conditions that present 
themselves within the source code. According to a recent review [27], white box testing has been 
used for marking an assignment solution source code on runtime, but it appears not to be fielded in 
any serious way to test the functionality of student assignments. 
The nature of dynamic white box testing is most likely the reason it is also not very widely used by 
automated programming assignment tools. To make the techniques useful and create a set of 
predetermined set of tests to assess the internal structure of a student solution, one would have to 
know how the solution will be coded by the learner. However, dynamic white box techniques may 
have some limited use if hard requirements for certain elements exist in the programming 
assignments: testing-wise it is then possible to expect certain methods, variables, or elements to 
exist in the code structures and test cases for functionalities taking advantage of these hard-
requirements can be created. In general, white box techniques can be divided into statement, 
decision, and condition coverage techniques [15, pp. 97-116; 23, pp. 42-49]. Flow charts and 
21 
 
control graphs formed from the code structure often prove themselves as helpful mediums to create 
test cases and assess required coverage. 
Approaching the coverage through statements is called statement testing. Statement testing aims 
for a full coverage of every executable statement within the code that is under test, which makes it 
somewhat usable in verifying that the code can execute as it should but is otherwise proposed to be 
not very meaningful as a lone testing technique. Therefore, statement coverage is often overtaken 
with the more useful technique of decision coverage. 
Decision testing technique in many cases fulfils statement coverage as an in-built feature. Decision 
testing aims to hit every possible path or branch within the code logic at least once which means 
that 100% decision coverage should also gain 100% statement coverage unless the program is such 
that there are, for example, no decisions or multiple entry points to the program or its subroutines 
exist. 
While decision coverage can be considered stronger than statement coverage, it is not always 
enough either. Decision testing in its purest form is good for decisions that continue into two 
possible paths - e.g., true or false - but require additional cases to be created for handling decisions 
with more than two possible decisions, for example switch -statements. [23, pp. 43-46] To tackle 
such issues, white box technique of condition coverage may be used. 
Perfect condition coverage consists of enough test cases to test every possible outcome of every 
decision at least once. While this technique again is a step up from decision testing in terms of 
meaningful coverage, condition coverage fails frequently in reality to truly achieve the goal of 
testing every possible outcome within the code structure by simply hitting every possible statement 
condition once. This is because certain condition combinations - especially with multiple condition 
statements - often cause situations where certain condition combinations have satisfied the 
requirement of testing each condition of that statement once, but some paths have been left explored 
afterwards since they were not reached with these condition combinations. 
To really achieve full coverage of all statements, decisions, and conditions of the program, one 
needs to apply the multiple-condition coverage. The approach of this technique is to create test 
cases enough so that for each executable decision all possible condition combination outcomes and 
all program and subroutine entry points are tested at least once. 
The reality of these techniques is, however, that often techniques such as multiple-condition 
coverage are not able to reach 100% coverage simply due to the number of resources it would take 
22 
 
to create the required amount of test cases. Condition combinations, for example, can easily become 
so numerous that testing all of them is not feasible unless critical for assuring certain non-functional 
requirements such as security related ones are met. 
2.4 Test design and development 
According to Myers et al. the key issue to consider when designing test cases is: “What subset of all 
possible test cases has the highest probability of detecting the most errors?” [23, pp. 41] In regard to 
automated assessment of assignments this piece of wisdom would most likely need to be 
transformed into such a perspective that the issue is to not necessarily detect most errors but detect 
the most relevant errors to form feedback to support the growth of learner’s skills and knowledge as 
well as verify to which degree the student solution manages to meet the assignment requirements. 
Nevertheless, for test cases to be effective, efficient and drive their purpose, they need to be 
designed. The design process brings together all the aspects that have been discussed so far: test 
objectives, analysis, consideration of testing levels and choices regarding proper approaches and 
techniques to be made. The process of design starts with identifying the test conditions, continues to 
specify test cases and finally specifying the test procedures. [15] 
Identifying the test conditions means mapping out what characteristics of software should be 
checked and verified by testing. These can and should be gathered from the software specifications 
such as requirements and other related work products. Conditions may of course vary depending on 
the context and scope of testing: conditions to be found from unit level are rarely equal to 
conditions to be found on system level, for example. Here the static analysis methods are also useful 
as deploying them is often the way to gain required insight for identifying conditions of the 
software under test. 
Once the conditions to test are clear, a test case to carry out these conditions can be created. In 
many cases already existing work products may be of use here as well because work products such 
as user stories for example may be able to provide structure for the test case to follow, especially 
with the higher-level test cases. As for designing the test cases themselves, prior described white 
box and black box methods can be followed.  
Finally, once a test case has been designed it can be built into a test procedure to be executed as to 
verify the expected outputs from identified conditions. The overall process of test development is 
quite simple when arranged into these three main steps, however, the process needs to be gone 
23 
 
through a volume of times to produce enough test cases for any meaningful amount of test 
coverage. 
Some strategies exist to heuristically approach the decision of which techniques to field for creating 
test cases and suites. One such heuristic is “The Strategy” [23, pp. 82] that dictates the following 
when applied to what we know of testing techniques already: 
1. For combination of input conditions, cause-effect technique should be used first. 
2. Boundary value analysis should always be used. 
3. The above techniques should then be supplemented by identifying valid and invalid 
equivalence classes for both the input and output values. 
4. If enough experience, supplement techniques in 1. -3. with error-guessing. 
Finally, examine the program logic and deduct if white box techniques are required to reach the 
desired coverage; apply decision coverage, condition coverage, combination of both or multiple-
condition coverage as required to satisfy the set coverage criteria. 
The concept of coverage when developing tests is an important one. The aim of test coverage is to 
quantitively assess the extent and quality of testing [19, pp. 80]. The meaning of coverage needs to 
be defined, though, before any percentages of coverage can be attempted to achieve; for structure-
based testing conditions and statements may prove to be relevant metrics to measure test coverage 
but usually the more meaningful coverage metrics may be the number of requirements verified. 
2.5 Challenges of web application testing 
Due to their nature web applications present certain challenges to testing if compared to testing of 
so-called traditional software. Web applications are often mentioned to be considered as distributed 
systems built with various architectural choices. Some typical characteristics for such applications 
mentioned, for example, by Di Lucca and Fasolino are: (1) concurrent accessibility by many users, 
(2) varied execution environments, (3) systems often consist of components that may vary in their 
nature and even technology and (4) ability to create software components at run time. [8, pp. 220] 
These characteristics are mentioned to inherently place certain testing requirements for commercial 
web applications in terms of non-functional aspects such as performance, availability, and security 
testing. Testing for functionalities is also affected by the large variance of components and separate 
services involved in a web application: test environments may often need to be set up to consider 
24 
 
multiple different technology choices and dataflows. As web applications often include both server-
side logic and client-side logic, test environments need to take this into consideration when testing 
at system, integration or even at unit level: sometimes to test even the smallest component of an 
application, input or interaction with a server may be required. [8] However, ideally real backend 
interaction should not always be relied on especially at the lowest levels of testing as then the nature 
of testing itself changes and the focus changes if end-to-end services are employed. 
Di Lucca and Fasolino also note that the client-server nature of web applications also often means 
that points of failure are plenty. This poses certain technical challenges when conducting tests at 
system or acceptance levels. [8] In complex and market level products pinpointing failures may not 
always be easy as issues may rise from client-side or server-side code interpretation, compatibility 
issues or from relied on backend services for variety of reasons. Common elements to note from 
server-side layers when testing web applications is the persistent data storage and API integrations. 
With more complex systems there is often also the need to consider server side load balancing, but 
in the scope of this study more relevant might be the contextual JavaScript generation which to a 
degree applies to both sides in modern web applications: JavaScript may be offered to the client 
quite dynamically from the server depending on the context of use but it also means that the client-
side in-browser application renders different content depending on the context. 
Many of the challenges involved in testing of web applications can be approached and explained 
from the aspects of observability and controllability. These aspects can be defined as follows: 
• Observability: “How easy it is to observe the behavior of a program in terms of its outputs, 
effects on the environment, and other hardware and software components.” [2, Section 3.1, 
Definition 3.11] 
• Controllability: “How easy it is to provide a program with the needed inputs, in terms of 
values, operations, and behaviors.” [2, Section 3.1, Definition 3.12] 
In essence, observability dictates the difficulty of determining test results. With web applications 
and their multi-tiered architectures, true results of tests conducted at a higher level are rarely fully 
available and visible from the UI which in turn results to lower general observability of testing. 
Same can be said of controllability: rarely all web application testing can be executed by simply 
feeding inputs through a single source such as application UI; providing test values may require 
manipulation of URLs to feed parameters or manipulation of client storage solutions such as local 
25 
 
storage, session storage or cookies to properly execute a test case. Such things result into lower 
controllability and are not uncommon for testing web applications. 
Of course, the challenges mentioned above are not only brought up in studies by the authors cited 
but also presented in very foundational works such as The Art of Software Testing by Myers. He 
proposes that to tackle the testing challenges of internet applications one needs to first and foremost 
understand the system under test at the very component level. This proposal is further clarified to 
include having documented knowledge available and understanding the expected behaviour of 
functionalities and performance of the website. [23, pp. 193-200] 
Myers also outlines a strategy that relies on categorizing internet applications into three-tiered 
client-server applications: (1) presentation layer where the user interface is provided, (2) business 
layer that models processes such as authentication or transaction, and (3) data layer which considers 
the data application uses or is collected from the user [23, pp. 201]. These layers are encouraged to 
be tested independently to be able to narrow down and identify defects and their sources during 
testing; skipping the layered approach and conducting overarching end-to-end system tests instead 
may not tell where a defect springs from. 
Not all the challenges mentioned in this sub-section of course necessarily apply to their fullest when 
supporting the assessment work of assignments with software testing activities, especially the non-
functional challenges related to concurrent users and web application availability. However, even 
assignment tasks often include interaction between client and server and thus present these 
challenges at least to a degree. Handling such aspects is also identifiable to be one source of for 
errors in assignment submissions [22] and as such taking these into consideration during testing is 
relevant even for this purpose. 
2.6 Foundation for the first main research question 
Throughout this main Section there has been an overarching goal to cover the general theory behind 
software testing while also narrowing it down to the context of this study by taking into 
consideration how it would be applicable to DTEK2040. With the combined approach to covering 
the theoretical background there was also an agenda to gain answers for the sub-questions related to 
the first of the main research questions. 
In the very basic sense software testing was determined to be about identifying defects, getting rid 
of the errors causing these identified defects and through that process raising the overall quality of a 
software product. While quality of the product is one thing, testing was also determined to build 
26 
 
confidence for those working on the product by providing them a safety net and possibly allowing 
for a more stress-free product development. 
Software testing for programming assignment assessment purposes was mentioned to not pursue the 
quality aspects but to rather catch deviations from the assignment requirements and thus provide a 
supportive tool or a method for forming both formative and summative assessment of assignments. 
2.6.1 Q1.1: Which testing levels should be focused on? 
The first sub-question deals with the abstraction levels of testing and in essence is asking where the 
testing effort should be directed to best serve assessment purposes. Based on the foundational 
theory and the contents of the expert interview describing the details of assignments in DTEK2040, 
the most useful levels of testing would be unit testing and system testing. 
Reasoning behind the focus on unit level testing is that the assessment in DTEK2040 is for a large 
part mentioned to consist of a set of clear functionalities. Assessment is then carried out so that a 
working functionality scores the student a point and the contrary results in missing a point from the 
total available. In many cases this will mean checking the existence and behaviour of an individual 
component that the student should have managed to create following the course material examples. 
Focusing on system level testing is valid from multiple perspectives as well. For example, based on 
the expert interview it was clear that whether an assignment requires the student to create a React 
SPA or even just a single static html web page, the very basic expectation is that the student 
submitted work product should be able to compile and run or interpret properly. Another reason to 
focus system testing is that when assignments start dealing with the challenging issues of web 
applications, such as state transitions and data flows across multi-tier architecture, these can 
naturally be covered in system testing by implementing end-to-end tests or user interface tests. 
Overall, the choice to focus on unit level and system level testing also provide further guidance to 
choosing testing methods and techniques. This brings us to the next sub-question Q1.2. 
2.6.2 Q1.2: Which testing techniques are applicable for testing student submissions? 
As for testing methods in general, it was mentioned that a rather clear division into static and 
dynamic testing can be made. Static testing deals with test objects without requiring execution of 
any system-under-test code and was also noted to be a trending methodology when it comes to 
automated programming assignment assessment solutions. One technique to field static testing for 
assessment purposes is to simply perform model-based comparison of student solutions to model 
27 
 
solutions, given there is not much expected or allowed deviation in the submitted work products. 
For static assessment of source code, it was noted that a general technique is also to first transform 
the code into an intermediate representation such as characters, strings, abstract syntax tree or 
graphs to reduce the potential diversity and then perform the assessment against set metrics. 
Static testing was described to be often complimented with dynamic testing that involves executing 
the system-under-test code to test it. Dynamic testing was also the approach of choice when dealing 
with functional testing, which also raises its importance for assessment purposes given the testing 
levels that were placed into focus when answering Q1.1. Techniques within dynamic testing were 
further categorized into black box and white box techniques based on the required knowledge about 
the inner workings of the system under test. White box techniques were not considered to be very 
useful for dynamic testing and traditionally were not favoured in automated programming 
assignment assessment tools either. 
From the black box techniques covered during this Section, most suitable ones for web application 
assignment assessment purposes in DTEK2040 are perhaps the cause-effect techniques and the 
error guessing technique. Cause-effect techniques were noted to be effective in covering aspects and 
functionalities related to state transitions and expected outputs from multiple input combinations. 
These were also the aspects identified as being natural for web applications in general and very 
suitable candidates to be tested in system level testing. Error guessing, on the other hand, as an 
experience-based black box technique can transform the current knowledge of the course personnel 
into test cases that target the most-likely sources of errors in specific assignments. Therein also lies 
the challenge: designing and implementing experience-based test cases would most likely require 
very close collaboration with the course personnel who have the necessary expertise to say what 
kind of errors often appear in student works. This kind of collaboration within the scope of this 
thesis is not necessarily possible, given the scope, constraints on time and other resources. 
White box techniques, though already mentioned to be less likely to be used, could still be 
considered for unit tests depending on how accurately the course material is expected to be 
followed in terms of the student solution to an assignment. From the interview it was gathered that 
the current material for DTEK2040 strongly guides the students to craft their submissions certain 
way, but some deviations or algorithmic level leeway is still possible, which at least complicates 
building very rigid white box technique-based tests for assessment. Thus, these are not considered 
to be very applicable either. 
28 
 
2.6.3 Q1.3: How to turn an assignment briefing into test cases? 
In terms of test design and constructing test cases it was deduced that in general the goal would be 
design test cases so that they are able to maximize the probability of capturing errors. For 
assessment purposes it was also noted that this goal would be best understood slightly differently: 
the goal is to capture specific implementation errors based on tasks within the given assignment 
rather than attempting to cover all types of errors from the system under test. 
A strategy was proposed in Section 2.4. which relied in starting with cause-effect techniques to map 
out a combination of test inputs and then proceed to supplement these techniques with boundary 
value analysis, applying equivalence classes and possibly perform additional error-guessing. 
Finally, white box techniques were suggested to be used as needed if the black box techniques 
cannot cover the program logic with desired coverage. 
The proposed strategy seems to be for the most part appropriate and can be followed within the 
scope of this study to design and build test cases for automated assignment assessment support. 
Black box techniques, and especially cause-effect techniques, were already mentioned to be suitable 
techniques through the answers to sub-questions Q1.1. and Q1.2. However, techniques such as 
boundary value analysis may not always be required when assessing submissions since the 
functional requirements may not be so detailed that they set clear boundaries; though, they could be 
in the future as the course contents are updated or the proposed solution of this study is possibly 
applied to other kinds of programming assignments. 
From coverage perspective the test cases should cover the required functionalities of any given 
assignment. The coverage should be enough to provide basis for at least deciding if the required 
functionalities are “pass” or “fail” to support straight-forward scoring. Coverage should also be 
considered and designed so that it would allow for spotting errors that do not necessarily result in a 
failed functionality but could be considered a qualitative or non-functional error when observed 
within the context of course material. Covering such cases is also the more challenging part in terms 
of design and where the experience-based techniques can prove valuable to provide insight as to 
what should be tested.  
29 
 
3 Test automation and RPA 
 
3.1 Differentiating between RPA and test automation 
One way to describe test automation is to say that it “is the task of creating mechanically 
interpretable representation of a manual test case.” [35] Automated cases may be programmed with 
a programming or a scripting language and many languages, such as java and python for example, 
have evolved extensive support in the form of libraries to make test automation relatively easy and 
straight forward to implement. Frameworks built for automation exist as well; some market 
themselves as geared towards test automation [32] while others consider themselves more focused 
on RPA in general [31]. 
But how does RPA and test automation truly differ? Considering both can - and often are - executed 
with same tools and technologies, one might argue that the differentiation is at times more 
philosophical and deals with the context automation is being fielded and aimed to be used. RPA can 
be described as technology that aims to mimic human behaviour to achieve benefits such as reduced 
labour costs, increased productivity, reduced error rates [9]. Perhaps therefore RPA is often tightly 
coupled with the mental image of being a tool to automate general business processes by tackling 
many, traditionally human executed, mundane, and transactional tasks involved within said 
processes. 
Classical automation works best when the process to be automated has explicit rules that can be 
followed. Leaning on this, Doguc for example mentions that best-suited processes to automate with 
RPA have (1) high transaction volume, (2) are highly standardized, (3) have well-defined implicit 
logic and (4) are mature, meaning that an automated solution will be usable into the future rather 
than becoming obsolete due to changes in the process structure or functions. [9] Jha et al. propose to 
[20] implement automation by following sequential levels that start from performing due diligence 
and end up with execution and maintenance. Interpretation of contents involved with these levels 
are shown in Table 7 [pp. 30]. 
  
30 
 
Table 7: Sequential implementation levels of automation. 
Levels of RPA implementation 
Level Contains 
Due diligence Deciding tools that are a good match for the project. 
Investigating automation viability of processes and 
determining the return on investment. 
Technical feasibility assessment with a proof-of-
concept. 
Risk identification Deciding whether a process is a preferred 
candidate for automation. Identifying 
stability 
repetitiveness 
level of organization / standardization 
Bot creation and dry run Identifying the steps and tasks that are to be 
automated and robotized. 
Performing a “smoke test” for the automated 
process to prove the steps can be carried through 
and the process itself is correct. 
Execution and maintenance Deploying the bot for execution. 
Maintaining dynamic parts of the process. 
 
Just like Doguc, Jha et al. also underlines the importance of choosing steady, repetitive and highly 
organized processes as candidates for automation. They also mention these aspects as basis for risk 
identification, as can be seen from the description for the corresponding level. It is also worth 
noting that the process itself is unlike a waterfall: if bot creation and dry run, for example, fail to 
produce the expected successful outcomes for that level, it is entirely advisable to step back and 
perform risk identification or even due diligence levels again to further analyse the system under 
automation. 
While automation is mentioned to offer concrete gains even as a short-term solution, it contains a 
challenge in that it is closely tied to the aspect of identifying potential use cases. Rarely any process 
is forever unchanging or without any dynamic parts and thus any automation solution that is to be 
built will also require maintenance to keep producing benefits as a long-term solution. Integrating 
artificial intelligence with RPA is considered as a potential supportive factor in the future, but as of 
now automated systems with artificial intelligence have not proven cognitive enough to 
meaningfully remove this challenge. [9; 20] RPA is also still seen to have somewhat limited use in a 
sense that fully implemented end-to-end automation solutions can be considered unrealistic in terms 
of resources required to build them. [20, pp. 256] 
31 
 
From RPA perspective and based on the results of expert interview, the assignment assessment 
process of DTEK2040 certainly offers potential steps to be automated. To describe the general 
process executed in a concise manner, following steps can be identified from expert interview [22]: 
1) Student returns the solution for an assignment on course Moodle workspace. 
2) The assessor downloads the student submission from Moodle. 
3) The assessor extracts the downloaded submission to access the assignment files. 
4) The extracted submissions are either run dynamically or opened with relevant tools such as 
VSCode. 
5) Assess the submission both dynamically and statically against the task requirements. Form 
feedback based on observed errors. 
6) Enter and upload the assessment results and feedback for the student to Moodle. 
7) Clean up tasks and organize the assessed works to prevent mix ups when assessing other 
submissions for the same assignment in the future. 
To further attempt to analyse the actual automation potential, each step can be transformed into a 
more high-level description and be identified to involve specific tasks from the assessor point of 
view. These tasks have been extracted from the interview and presented in Appendix A 
correspondingly while also attempting to present the related automation potential and risk based on 
theoretical background. A concise collection of overall estimated automation potential and risks for 
DTEK2040 are represented in Table 8. 
Table 8: Automation potential and identified risk factors of assignment assessment in DTEK2040. 
Automating the DTEK2040 assessment process 
Potential Risks 
a. Manual tasks related to navigating and fetching 
submissions from the learning platform (i.e. 
Moodle) are straight-forward to automate and 
can save work time in a compounding manner. 
 
b. Automating manual tasks such as recursive 
extracting of submission files, book-keeping 
and collecting scores, summarizing feedback 
forms can save hours of manual work 
throughout the course. 
a. Automated solution to interact with the learning 
platform and the course workspace requires 
maintenance; even the simplest modifications 
to a website element may break the robot by 
affecting, for example, the navigation logic. 
 
b. The learning platform may not allow for robotic 
interaction or use CAPTCHA and other means - 
such as request limits - to hinder the use of 
RPA. 
(to be continued) 
32 
 
 
Table 8 (continues) 
c. Technology such as Docker can be in-built to 
the automation solution to provide a stable and 
secure environment for dynamic assessment. 
 
d. Automated static assessment can be more 
consistent and less reliant on assessor’s 
experience than manually performed. 
 
e. The tasks required to perform manual labour in 
an organized manner, such as arranging 
submissions in local directories based on their 
assessed / not assessed status, can be cut out 
from the process. 
c. Scripting course and assignment content 
specific static analysis logic for robot 
assessment purposes may prove to be 
challenging and not worth the return on 
investment. 
 
d. Automation itself may not be able to formulate 
in-depth, qualitative feedback from 
submissions. 
 
e. Introducing RPA and test automation to the 
assessment process requires related skills from 
the personnel to maintain the solution as 
course contents are updated. 
 
f. Incorporating automation may require adding to 
or reformatting of assignment instructions. 
 
As shown in the table contents, DTEK2040 has a lot of identified automation potential related to 
even small individual manual tasks from simple website navigation to performing file manipulation 
and looking for errors in a source code file. However, there are also a lot of risks that most likely 
will stand to make a fully automated end-to-end solution a challenging task and not feasible in 
terms of return on investment. 
3.2 Use of automation in web application testing 
Amman and Offutt describe test automation as “The use of software to control the execution of 
tests, the comparison of actual outcomes to predicted outcomes, the setting up of test preconditions, 
and other test control and test reporting functions.” [2, Section 3, Definition 3.9] They consider 
automated testing to be necessary for efficient and frequent testing but also mention that the task of 
automating may often prove challenging in the case of software with low controllability or 
observability. Some studies claim that the advantages to be achieved from using test automation 
include saved resources in terms of time and effort spent in making testing more efficient, improved 
accuracy and discovery of defects compared to manual efforts, increased test coverage and 
repeatability [12; 36]. 
As for web application testing, the related challenges and the overall process of coming up with a 
test design has already been covered as those are aspects that are applicable to software testing in 
general. The automation of web application testing brings along some nuances as the principles of 
33 
 
automation are combined to the art of testing. Some of these details are related to details such as 
what kinds of tests should be automated, what tools should be used to do so and what kind of basis 
for testing should be available to start implementing test automation. 
Considering automation from the perspective of software and testing levels, automating unit and 
integration tests is common work during software development. In terms of responsibilities 
involved in testing activities, unit and integration level testing is usually attributed to developers as 
part of the development tasks whereas designated testers often focus on system level testing and 
beyond. These testing tasks may include functional as well as specialised non-functional tests such 
as security or performance testing. 
At system level, attractive targets for test automation can be observed to follow the general 
principles for potential automation targets where the most likely candidates for automated tests are 
the ones that require a lot of data handling, are performed constantly and regularly, or require 
extreme precision during the test execution. Such tests are for example regression, end-to-end, 
performance, security, load, stress, and many of the usability tests [30]. User interface testing is an 
example type of testing that can be involved in many of the tests just mentioned and thus can have a 
fairly large role especially when it comes to testing web applications. [12; 30; 35] Going by the 
layered tiers presented by Myers, user interface testing would fall under the presentation layer and 
have three main test areas [23, pp. 203-205]: 
• Content: testing the human-interface element, accuracy of the information presented to the 
end-user and features affecting the user experience. 
• Architecture: testing navigational and structural errors such as broken links, missing pages 
or false redirects. 
• Environment: testing aspects such as browsers and operating system configuration effects to 
web application and its functionalities. 
All these test areas are likely to contain candidate cases for test automation but Myers himself 
proposes to at least migrate architecture tests into regression tests, which itself is already mentioned 
as one of the automation test examples earlier based on other cited sources. 
There is a plethora of technology and environment specific automation tools available for 
conducting software testing. While commercial tools of course exist, many of the industry favoured 
tools are in fact open source and readily available. Among these tools the so called XUnit 
34 
 
frameworks are mentioned to be the most used ones. These frameworks provide the means to write 
test cases in a supported programming language - such as JUnit for Java or HtmlUnit for HTML - 
so that the tests can be implemented with oracles to determine whether a given test passes or fails as 
it is executed against the system under test. [28] 
Another category of testing tools is the capture and replay tools which are in fact able to combine 
manual testing and automation testing to some extent by recording the manual actions to be 
performed automatically and repeated later as required [28]. However, even though tools such 
Selenium can be assigned to this category, they also provide the means to simply script actions 
rather than requiring the manual recording of a test case to turn it into an automated one. 
Main tool to be used within the context of this study was mentioned to be the RF, which would 
perhaps fit into a third category not labelled among the two already presented by Polo et al. RF is a 
more generic test automation framework that can be used to create and execute automated tests by 
extending different libraries meant for specific purposes [32]. Some of the libraries that are 
available are created and based on tried-and-true web application testing technologies such as the 
Selenium based SeleniumLibrary or Playwright based BrowserLibrary. These example libraries 
often provide means for acceptance test driven automation testing and as such they are most 
suitable for testing that can be conducted through the web application user interface. Many other 
libraries are of course available that are better suited for cases that should be conducted by the 
means of API testing, for example. As with any web application automation tool, the potential and 
suitability of different libraries are to be considered before beginning test automation 
implementation tasks. 
3.3 Automated formulating of feedback from an assignment solution 
Hattie and Timperley describe feedback as information that is provided regarding aspects of 
performance or understanding of a given subject. Many different entities and sources can act as 
providers of such information but the feedback itself should in any case aim towards improvement 
of teaching and learning. [16] Paiva et al. on the other add to this notion by adding that assessment 
already in itself acts as feedback not only for the student but for the teacher also: the one learning 
will be kept aware of their success in reaching set learning goals and the teacher will be informed 
about the ongoing learning process in general [27]. 
Nicol and Macfarlane-Dick seek to discover principles of effective feedback in their study 
regarding formative assessment and self-regulated learning [24]. As the short description of this 
35 
 
study would suggest, the viewpoint taken towards what is good and effective feedback is that it 
should also help the students grow and guide their own learning in the future as well, not only in the 
framework of a specific study session or a single course. The study or its results are not per se about 
assessment of programming studies or problem-based learning, but the outcomes contain 
observations that seem quite general in terms of what could be considered as effective feedback.  
The principles presented in the study by the authors can be categorized so that they have either a 
cognitive, motivational or behavioural rationale behind them. The seven principles are [24]: 
1. Helps clarify what good performance is. 
2. Facilitates the development of self-assessment in learning. 
3. Delivers high quality information to students about their learning. 
4. Encourages teacher and peer dialogue around learning. 
5. Encourages positive motivational beliefs and self-esteem. 
6. Provides opportunities to close the gap between current and desired performance. 
7. Provides information to teachers that can be used to help shape teaching. 
Principles 1 - 4 are presented by the authors from a very cognitive standpoint. The first principle is 
rationalized through the potential mismatch existing between the teacher’s and the student’s 
concepts of (a) what are the goals for learning, (b) what are the criteria for evaluating the learning 
process and (c) what are the expected standards. This mismatch is through existing research seen to 
negatively impact the student’s ability to process received external feedback in a constructive 
manner and thus any feedback should aim to align the understanding of these concepts between the 
teacher and the student if it appears to be necessary. [24, pp. 206-207] The second principle in a 
way extends from the first one by noting that external feedback should also allow the student to 
develop their ability to individually judge one’s own product against set standards and criteria, 
provided they are also clear for the student. [24, pp. 207-208] 
The third principle of delivering high quality information is an interesting one in a sense that it 
considers more than just content. This principle considers high quality, effective feedback as one 
that is: (1) provided in a timely manner, (2) focused on strengths, weaknesses and corrective advice, 
and (3) contains both praise and constructive criticism. [24, pp. 208-210] Especially the timing as a 
key-component is something that can sometimes be observed missing from the external feedback 
being provided in courses with large amounts of participants and similarly large number of 
assignments to be assessed. 
36 
 
The fourth principle deals with the cognitive aspect of external feedback in quite a general manner. 
The principle attempts to underline that even though feedback may be of high quality and follow the 
rest of the principles uncovered prior, it can still be misunderstood by the student receiving the 
feedback. If feedback is misunderstood, it is often also at least partly ignored by the student. Thus, 
it is suggested that feedback should also incorporate an opportunity for engaging the teacher into 
discussion about the feedback to catch and potentially clear up any confusion or concern regarding 
the received feedback. [24, pp. 210-211] 
The fifth principle is the first one that is clearly approached through the lens of motivation, as the 
short descriptive name also indicates. The core rationale behind this principle is that high-stake 
assessing, such as one-time assignments or traditional exams, is often found out to negatively 
impact the motivation of a student and should thus be avoided as the only channel of feedback. [24, 
pp. 211-212] Nicol and Macfarlane-Dick refer to existing research to show that such assessing 
usually leads to the students focusing on performance in a very metric-focused manner rather than 
attempting to master the concepts and achieve a more sustainable learning process. Relying purely 
on grading or marks as the form of feedback from assessments was also mentioned to have a 
negative motivational effect. Accompanying such feedback with comments was mentioned not to 
improve the impact since the numerical mark or grade was usually focused on and the 
supplementing commentary ignored by the students. 
The authors suggest that comments alone without grades are the general superior format of external 
feedback in terms of encouraging positive motivation and self-esteem. Additionally, multiple 
assignments and tasks with low-stake assessment should be favoured. Many smaller tasks carry the 
benefit for the student in terms of providing the opportunity to receive more external feedback in 
concise pieces. Automated testing with incorporated feedback is also explicitly mentioned as a 
potential approach to help pursue this principle. [24, pp. 212] 
The sixth and the seventh principle take a behavioural approach to the feedback process. To provide 
an opportunity to essentially catch up to the learning expectations: the student needs to have an 
opportunity to resubmit an assignment or at least repeat the learning cycle based on the contents of 
a received external feedback to put the feedback in use. If resubmissions cannot be offered or the 
assignments in a course are such that feedback received from an assignment will not directly carry 
over to the next assignment, then it would be beneficial to provide the students feedback while any 
assignment is a work-in-progress. [24, pp. 213-214] The final principle on the other hand raises the 
observation that any feedback that is provided to the students should be based on such assessment 
37 
 
or data that can be used to frequently deduce the learning level and the understanding of course 
contents. This in turn should provide opportunities to improve the teaching, course contents and the 
overall learning process of the given course. 
When these principles are reflected to DTEK2040, it would seem like they are partly followed. For 
example, the first principle could be considered to manifest itself through the detailed assignment 
tasks which very illustratively and straightforward manner describe the features that are expected to 
be implemented as part of an assignment [37]. Additionally, during the interview feedback was a 
theme to be discussed and during this part to specific mention was made about any excessive need 
to ever provide the students with clarification about taskers; on the contrary, it was in many 
situations noted that the course material is quite explicit in what is being required from the student’s 
solution [22]. 
On the other hand, based on the observations of feedback samples from the DTEK2040 Moodle-
platform and the interview with the course instructors, the feedback models and the process would 
seem to partly go against the principles number three, four and five. The third principle is in a sense 
being broken against since timely feedback is not necessarily possible. However, this is not 
necessarily even because of the feedback workload or for the fact that an instructor would not have 
the time to provide feedback: because DTEK2040 allows the student to complete the content and 
continue onward at one’s own pace, feedback from the previous assignment may not be available 
when the student continues to the next one. Thus, there is a chance that some feedback will not be 
affecting the next solution. 
Fourth and fifth principles are also something that are not actively being carried out in the feedback 
process of DTEK2040. Regarding the encouragement of teacher and peer dialogue, no mentions 
arose during the interview related to this and providing the assignment specific feedback on the 
course’s workspace in Moodle does not seem to integrate or provide means for the student to 
engage the feedback provider into a discussion. This sort of exchange could of course currently 
happen directly through email if the student decides to do so. With regards to the fifth principle of 
encouraging positive motivational beliefs and self-esteem, DTEK2040 could be mentioned not to 
strictly follow this principle due to the provided feedback being very grade centric. While it is true 
that the feedback includes commentary if a student has not managed to score the perfect grade from 
an assignment, the combination of comments and scoring was mentioned to lean towards negatively 
affecting the motivation rather than being any different from purely grade-based feedback. 
38 
 
The principles gathered by Nicol and Macfarlane-Dick are, as mentioned, regarding feedback in a 
more general manner while taking a certain point-of-view to consider the effectiveness in terms of 
also developing self-regulated learning of the student. Keuning et al. in their systematic literature 
review then again present five feedback types regarding automated feedback for programming 
exercises specifically [21]. The feedback types are: 
1. Knowledge about task constraints 
2. Knowledge about concepts 
3. Knowledge about mistakes 
4. Knowledge about how to proceed 
5. Knowledge about meta-cognition 
These feedback types by Keuning et al. have a lot of common surfaces to the seven principles of 
Nicol and Macfarlane-Dick. For example, the first type consists of components such as 
requirements of the task and general processing rules of the assignment which could be considered 
as clarifying the goals, criteria and expectations. [21] 
Paiva et al. also consider the feedback types presented by Keuning et al. in their own review of 
automated assessment in computer science education [27]. In their study Paiva et al. deduce that in 
fact many of the modern automated solutions only extensively cover the third feedback type. 
Concretely the third type is told to include information about the test cases that the assessed code 
has failed, technical errors, solution related errors and issues related to quality aspects such as style 
and performance. 
Paiva et al. mention that the knowledge about mistakes partly does tie in with knowledge about how 
to proceed. This fourth feedback type is also seen as rarity in today’s automated assessment 
solutions, however, there are some advances regarding this. Some automated assessment tools can 
produce personalized feedback and offer guidance by recommending corrections to the tested 
source code to fix bugs or suggesting more optimal solutions even if the provided solution would be 
fundamentally correct. [27, pp. 1:15] 
The first two types are traditionally not covered by automated assessment tools today in any 
meaningful way since they are seen more akin to matters of configuration or manual labour tasks by 
the instructors or exercise authors. It is also mentioned that the fifth feedback type, which aims to 
check whether the student understand why an answer is or is not correct, is also not commonly 
automated. In many cases the open-ended nature of solutions to most programming assignments 
39 
 
usually makes automating such feedback as knowledge about meta-cognition challenging. [27, pp. 
1:15] 
To reflect the previous theory and feedback types to DTEK2040 practices, it can be mentioned that 
the types are at least very applicable to the course contents as they are relevant in scope. While the 
assessment and feedback process of DTEK2040 is currently executed manually, the focus areas 
regarding feedback types are still noticeably the same: majority of the feedback provided to the 
students are knowledge about the mistakes, since the feedback is heavily reliant on using the 
assignment required functionalities as a basis; mistakes related to these implementations transform 
into feedback. Other types of feedback, such as knowledge how to proceed, seem to require activity 
from the student if this sort of feedback is wanted: if the student encounters bugs that impede 
progressing or finishing the assignment, feedback for these must be asked, for example, during 
voluntary workshop sessions arranged throughout the course. 
3.4 Foundation for the second main research question 
3.4.1 Q2.1: What manual work related to assessing and feedback is there to automate? 
In the Section 3.1. three considerable primary components were brought up to answer sub-question 
Q2.1. These components are (1) the principals to use for assessing the suitability of a process from 
automation perspective, (2) the levels - or steps - to take when implementing RPA from ground up 
to a process and (3) the analysis of DTEK2040 assessment process. 
The principals that largely determine the suitability of a process could concisely summarized into 
four points. These four points were also underlined to apply for automation in general; the 
principals could be followed when considering which business processes to automate with RPA or 
which test cases are most likely to provide the best return for investment. The four cornerstones 
mention that a potential candidate process should: 
• have high transaction volume; 
• be highly standardized; 
• have well-defined implicit logic; 
• be mature and preferably not dynamic in terms of future changes. 
40 
 
The second aspect to consider in implementation is to do it logically, following certain steps to 
build the automation system into a solution from ground-up. These steps are represented in the 
Figure 2: 
 
Figure 2: Suggested implementation steps to take for creating new automation systems. 
 
Some targets within the assessment process of DTEK2040 were identified through analysis and 
presented in a Table 8 [pp. 31]. From RPA point-of-view, the identified targets included manual 
tasks to be executed on the Moodle platform for fetching student material and possibly uploading 
feedback and assessment results as well as file manipulation tasks related to handling student 
submissions in different formats and performing book-keeping of grades and feedback as the 
automated assessments are executed. 
In the introduction of this thesis, it was also mentioned that the answer to this sub-question would 
be formed from the collective results of both the Section 3.1 and Section 3.2, the latter sub-section 
providing insight into use of automation in web application testing. Within Section 3.2 it was 
brought to attention that the principles of automating a process such as a manual test case are 
largely the same ones observed as general guiding principles for automation. It was also again 
confirmed that system level testing would perhaps be the best focus area for automated testing in 
terms of assessing web application programming assignments. 
41 
 
A more concrete suggestion extractable from the observations about automating web application 
testing was the division of test cases into three main test areas: 
a) Content testing: focuses on the human-interface element, accuracy of the presented 
information and the features affecting user experience. 
b) Architecture testing: focuses on testing for navigational and structural errors. 
c) Environment testing: focuses on browser and system configuration effects to web 
application and its functionalities. 
In the context of DTEK2040 the areas could perhaps be organized into order of importance 
followingly: 1) architecture testing, 2) content testing and 3) environment testing. From the expert 
interview and the course contents it would be fair to deduce that the assignment taskers themselves 
are highly focused on assessing functional aspects, which tend to be related to manipulating the 
application UI or state transfer functionalities of the application. It is also arguable the architecture 
testing contains tests targeting database interfaces and data validation. 
Environment testing, on the other hand, is not seen as a focus-area in the context of DTEK2040; 
assignments do not explicitly detail environmental requirements such as the need to function on 
multiple browsers. The only environment related issues that would perhaps rise during the 
assignments and could be considered from test automation perspective are related to deploying 
some of the assignments to be executed on a cloud platform. 
All in all, to answer the question what manual work related to assessing and feedback is there to 
automate, it can be said that the assessment process of DTEK2040 includes automation targets of 
opportunity for both RPA and software application tests. From RPA perspective the tasks have been 
analysed and presented to an extent include course workspace, file manipulation and book-keeping 
related tasks that seem to repeat with almost each individual student submission. While the tasks 
themselves may not be too time-consuming individually, it has been established by the expert 
interview that the cumulative benefits of automating such labour my end up saving a considerable 
number of manual workhours per instance of DTEK2040. The opportunities of automation testing 
were also determined to exist but mostly at the architecture and content testing realms for 
performing functional system testing. To support the assessment process these automated tests 
should target the verification and validation of assignment requirements. 
42 
 
While feedback will be considered in more detail with the next sub-question Q2.2, it can be 
mentioned that separation between traditional RPA and test automation can be made in terms of 
their benefit for collecting and providing feedback: while test automation can provide the basis for 
determining the content of feedback, RPA is able to support with the labour related to formulating 
and delivering that content to the student by, for example, gathering a concise summary of feedback 
based on test suite results and automating the process of reporting assessment results for the student 
to the course workspace. 
3.4.2 Q2.2: What kind of feedback should be gathered from the student solutions to 
assignments? 
The importance and the guiding principles of feedback were explored during Section 3.3. Within 
this section it was found that in general feedback should allow for student growth in self-regulation, 
clear confusions regarding the expected learning outcomes and performance details, as well as 
encourage and enhance the student's motivation to uphold effective learning process. 
From the source material it was observable that the general guiding principles for effective, self-
regulation enhancing feedback have plenty of common surface with the main feedback types 
currently seen to exist - and to some degree implemented - within today’s automated assessment 
and feedback solutions for programming exercises. Based on the review sources of automated 
feedback generation for programming exercises it was additionally notable that automation most 
often produces the feedback based on results of automated functional testing and the result is 
inclined to be more summative form of feedback rather than qualitative verbal commenting. 
Modern automated assessment solutions have been noted to increasingly show interest in advancing 
the ability to provide “knowledge about how to proceed” type of feedback. The main challenge with 
other than functionalities targeting feedback for automation had traditionally been rooted in the 
open-endedness of programming solutions; it has been difficult to formulate personalized feedback 
for non-functional issues in an automated fashion since, for example, model-based evaluation can 
be hard to implement when there are no strict expectations for a detailed submission. 
To answer the sub-question Q2.2, it is perhaps proper at first to introduce Table 9 [pp. 43] and 
attempt to present certain aspects to look for within an assignment submission. Afterwards the focus 
can be shifted on other details such as issues related to delivering the feedback properly. 
  
43 
 
Table 9: Issues to consider when gathering, constructing and providing feedback. 
Points of interest from feedback perspective 
Issue to note from submission Rationale Based on 
Lack of understanding the 
learning goals 
Guide the student towards 
mastering the relevant concepts 
within the course’s scope. 
Principle #1, 
Knowledge about task 
constraints, 
Knowledge about concepts 
Lack of understanding the 
evaluation criteria 
Reduce the potential negative 
effects of mismatched expectations 
to student’s motivation and 
receptiveness to feedback. 
Principle #1, 
Knowledge about task 
constraints 
Strong understanding of taught 
concepts 
Acknowledging strengths is part of 
high-quality feedback. Enhances 
learning motivation. 
Principle #3, 
Knowledge about concepts 
Weak performance of taught 
concepts 
Giving constructive feedback and 
corrective advice regarding 
observed weak performance 
provides the student learning 
opportunities. 
Principle #3, 
Knowledge about concepts, 
Knowledge about how to 
proceed 
Failures to implement a required 
feature 
The assessment should be 
transparent in a sense that it 
provides the student exact 
knowledge about performed 
mistakes to focus on overcoming 
these issues and learn the 
concepts. 
Knowledge about mistakes 
Failures that prevent functional 
assessing or issues related to 
quality aspects 
Some requirements may be implicit, 
such as the expectation that a 
submitted web application should 
be successfully deployable. 
Student should also receive 
feedback on technical errors and 
quality aspects of their products, 
i.e., coding style or solution style. 
These will help the student grow 
relevant skills even outside the 
course scope. 
Knowledge about mistakes 
 
As for the delivery of feedback, a few important observations were extractable from the research 
material: (1) effective feedback should favour comments over grades, (2) feedback process should 
happen in multiple cycles throughout the course instead of fewer high-stake formats, (3) high 
quality feedback should be provided even if the solution is correct, (4) providing the opportunity for 
the student to response to the given feedback increases engagement, decreases the chances of 
feedback being ignored due to misunderstandings and increases the positive impact and, (5) 
feedback should also act as data for potentially improving the learning process of the course as well. 
44 
 
While the sub-question what kind of feedback should be gathered from the student solutions to 
assignments could perhaps be presented an answer to with Table 9 [pp. 43], it is also worth noting 
that automation may be able to support feedback providing in many other ways than simply just 
assisting in gathering and noting feedback-worthy issues. During the exploration of related theory, 
it was quite implicitly mentioned also that automation is considered as a tool to possibly shorten the 
feedback loop: this in turn would mean more impactful – properly timed – feedback. Automation is 
also able to provide the opportunity for the student to self-regulate and independently subject his or 
her products for assessment, as it seems to be the case already with many modern programming 
courses today2. Some of these solutions of course go beyond simple automation, also requiring 
additional capabilities from the study platform itself.  
 
2 This can been observed to apply to courses where programming basics are studied with certain languages as well as 
intermediate courses that still contain, at least partly, assignments that are less freeform. This notion can also be 
considered from the perspective of how many fully or almost fully automated programming courses are being offered 
today as MOOCs or independently studyable online courses. Examples of such would include Ohjelmointi C-kielellä 
(https://fitech.io/en/studies/ohjelmointi-c-kielella/) offered by LUT university,  Functional programming courses 
(https://fitech.io/en/studies/functional-programming-1/ and https://fitech.io/en/studies/functional-programming-2/) 
offered by TUNI and even the earlier mentioned web development course by Aalto university. 
45 
 
4 Combining test automation and RPA to assess assignments 
4.1 Formulating a design 
As it has already been established, Eggert’s four stages of engineering design process [Table 2, pp. 
4] guide the creation of an automated solution to be used for supporting the assessment efforts of 
DTEK2040 student submissions. Understanding of the problem – and aspects of potential solutions 
– has already been gathered by covering the relevant theory in Section 2 and Section 3. Based on 
the presented theory some constraints and considerations have been presented while formulating the 
answers to the sub-questions respective to their sections, but two very important aspects of Eggert’s 
first stage remain yet to be established clearly: requirements and performance targets. 
Requirements can be divided into various categories. However, while formulating the design for the 
proposed automated support system, three guiding categories have been identified through literature 
review: (1) general guidelines and automation targets, (2) guidelines for supporting feedback and 
(3) assessment guidelines for the exercises. Guiding principles that fit under the first two categories 
are gathered based on the domain theory presented throughout this thesis and the expert interview. 
Metrics for assessing the success of meeting these criteria are also somewhat qualitative and thus 
the assessment of the achieved end- product is also based on the opinions gathered from the 
DTEK2040 personnel. 
Principles fitting the third category on the other hand are solely extractable from the course material 
and assignment tasks. While the scoring and the guidelines used for assessment are not always 
explicitly available from the material, anonymized student assignments and their results will be 
used as metrics for the quality of automated tests. The created test cases will also be iteratively 
developed by requesting feedback of the test cases and details to tasks presented in the course 
material from the course personnel. 
4.1.1 General guidelines and automation targets 
General automation guidelines for the prototype are crafted by combining the description of RPA 
from Section 3 together with the identified potential for automation within the DTEK2040 
assessment process [Table 8, pp. 31; Appendix A]. However, not all potential will be 
implementable as dictated by the scope and delimitations of this thesis. For example, the solution 
will not emphasize automating the interfacing with Moodle, but it may present observations of the 
usability of such platform from the view of automation processes. 
46 
 
Based on the source material – and especially the expert interview – five general guidelines for the 
prototype system are identified: 
1) System must save workhours; 
2) System must be maintainable; 
3) Human must be kept in the loop; 
4) Automation flow begins after the submissions have been manually fetched; 
5) Automation flow covers (a) assessment preparation, (b) numerical assessment, (c) feedback 
discovery and (d) summary reporting of assessment and feedback results. 
The first guideline quite simply aims for time resource benefit stemming from organizing, testing 
and various other now manually performed tasks such as score and feedback book-keeping. The 
metrics used to determine whether this guideline is followed is quite simply the average time spent 
on each student and submission per exercise throughout the automation pipeline. 
The second guideline is born from a couple of relevant points regarding the nature of DTEK2040 
and programming courses in general: (1) As the course evolves and its exercise contents also 
possibly change as time goes on, the assessment support system needs to be modifiable so that the 
test cases can be updated to stay relevant, for example. (2) Intent at keeping the system as 
generalizable as possible has also been established in the beginning of this thesis for the system to 
potentially lend itself to other courses outside of DTEK2040 as well. Metric for measuring 
fulfilment of this guideline is admittedly somewhat subjective but has been determined to be 
documentation coverage: system architecture, main functionalities, all the test cases and any custom 
libraries should be documented so that the system may be adopted with relative ease. 
The third guideline is also two-fold. The need for human in the loop has been identified due to 
accepting the fact that this sort of an automated system will most likely not be 100% flawless after 
its initial deployment and once fielded it will most likely stir new ideas for iterative improvements. 
However, since the system performed assessment will potentially have real impact on aspects such 
as an individual student’s overall study performance, it will be sensible to not hand-over the whole 
assessment chain to automation without a human checking and either accepting or modifying the 
results. On the other hand, keeping the course personnel in the loop will also provide visibility to 
the inner workings of the system and stress the documenting of the reasoning and traceability 
47 
 
behind individual assessments as a part of the end summary. In this sense the third guideline will 
also be able to build trust towards the system. 
The last two guidelines are quite simply guidelines for which steps from the current identifiable 
assessment process should be covered with automation. To meter the success of following these 
guidelines, the process presented within Appendix A will be used to determine successful fulfilment 
of steps 3 – 6 mentioned within that appendix. 
Certain requirements conforming to these guidelines are identified from available source materials 
that are tied to metrics to gauge the success of each individual requirement. Metrics are also 
accompanied by target values that the system aims to fulfil to determine acceptable implementation 
of any given requirement. General requirements and their proposed metrics and target values are 
concisely presented in Table B1 [Appendix B] contained within thesis appendices. 
4.1.2 Guidelines for supporting feedback 
Guidelines for feedback support are mainly created from the basis of Section 3.3. Expert interview 
provided very little in the form of requirements for formulation of feedback but did, however, 
provide feedback models in the form of current operating procedures. The four guidelines identified 
for feedback support are: 
1) System must be able to formulate numerical feedback and aid with qualitative feedback; 
2) Feedback must be discovered from positive cases as well as negative cases; 
3) Feedback discovered must be categorizable into one or more feedback types described by 
Keuning et al.; 
4) Feedback discovered must follow one or more of the seven principles of feedback by Nicol 
and Macfarlane-Dick 
The first guideline for feedback support is justified by the expert interview where it was mentioned 
that a half-automated solution that would, for example, annotate or tag qualitative feedback issues 
for further inspection by the course personnel would already be a meaningful step towards 
automating this part of the process. The second guideline closely ties into this by practically guiding 
the discovery of feedback points and test case designs to cover more than just the happy paths 
within the exercises. 
48 
 
Metrics for all the above guidelines are tied to the availability of both numerical feedback and tags 
for qualitative feedback issues marked for further inspection. Since the amount, tone, and practical 
discovery of feedback is respective to each individual student submission, hard targets for amounts 
or types of feedback to discover are in general challenging to determine. However, the following 
targets have been determined to provide some guidance and criteria to potentially consider when 
implementing automation regarding feedback: 
- Final assessment summary contains a score for each exercise subtask. Summary will 
also always highlight qualitative notes from source or gather such points into a student 
specific document. 
- Deduction in numerical score must be traceable to test case and reason included in a 
student specific document. Submission should always result in numerical and verbal 
feedback, even if submission is “perfect”. 
- Implemented system logic for qualitative feedback tagging can be justified by referring 
to feedback types / principles and the justification is documented in the code, 
architecture, or instructional materials. 
Discussion and analysis regarding how these guidelines could be implemented in an automated 
assessment system prototype will be presented later during Section 5 and Section 6. 
4.1.3 Assessment guidelines for the exercises 
The proposed assessment guidelines aim to detail what can be commonly assumed from subjects 
under test and provide common guidelines for the test basis and test case design. These 
requirements are formulated by reading into and analysing the taskers of all DTEK2040 exercises 
within the web materials [38]. Guidelines for assessing exercises are: 
1) Each assignment is assessed as if it was the complete submission for the given assignment; 
2) Non-functional aspects mentioned in assignment specific course material must be 
considered when forming test cases for assignment; 
3) Automated assessment must provide visibility to the argument behind the numerical score 
and maintain traceability between individual parts of an assignment and the final scoring. 
The first requirement is corresponding to the current manual processes. While the exercise material 
contains taskers for individual steps, these steps lead up to a complete solution that is then 
49 
 
submitted and eventually assessed. RF does provide means to run individual test cases from a suite 
by implementing run tags to each case and in this sense it would be possible to create a system that 
supports also assessment of exercise partials. The metric for this guideline is the step coverage of an 
exercise with test cases. Target for coverage is 100%, meaning each individual functionality or a 
separate task within an exercise must be covered with a test case. 
Detailing the metrics and targets for the second guideline is the most challenging of the three. Non-
functionalities themselves are extractable from the exercise specific web materials but hard targets – 
such as testing for the use of session storage when creating a single page application, for example – 
will be defined within the test suites themselves in the form of test cases and corresponding 
documentation. 
The final guideline is partly overlapping with other guidelines from previous subsections, Section 
4.1. One such example is keeping the human in the loop or the requirements related to feedback 
discovery. A soft metric for this assessment guideline is the amount of traceable test case results 
and the target for traceability is that each test case and its result should be tied to (a) exercise, (b) 
task within exercise and (c) in the case of point deduction / remark the reason for said action. 
Besides the soft metric, more important is to hold a human in the gatekeeping role for any results 
produced by the prototype at this stage of development. Thus, traceability needs to be available for 
the course personnel and – if requested – possibly for the student who did the submission. 
4.2 Generating a design 
In Section 3 the suggested implementation levels to consider when creating new automation 
systems were mentioned to be (1) due diligence, (2) risk identification, (3) bot creation and dry run, 
and finally (4) execution and maintenance. Detailing the general contents for the first three levels 
here is an attempt to complete Eggert’s second stage of engineering design process and generate a 
design that also considers the requirements set forth during the previous Section 4.1. The fourth 
level dealing with execution and maintenance will be omitted as the contents within will be later 
presented in the form of system documentation and execution related instructions corresponding to 
the actual implementation. 
When gathering the information on stages for Eggert’s proposed implementation of engineering 
design process it was mentioned that this specific step should have a goal of producing alternative 
design candidates for later analysis and evaluation [Table 2, pp. 4]. However, within the context of 
this study only one proposed design will be generated in this Section. Potential alternative 
50 
 
implementations and design choices will be part of the critical discussion presented in the final 
Section 6 of this thesis. 
4.2.1 Due diligence 
Due diligence for the design has been largely performed by delimiting the technology choices 
within Section 1.3, analysing the flow of the current manual assessment process and then analysing 
automation viability of the process [Table 8, pp. 31; Appendix A]. Based on the sequential 
implementation levels of automation [Table 7, pp. 30], due diligence should on top of these also 
include determining the return on investment and a technical feasibility assessment with a proof-of-
concept. Determining the return on investment nor the performing of a technical feasibility study 
are not included in this sub-section but rather covered by the descriptions and results reviewed 
within Section 5. 
As for the tools to be used to create an automated solution for assessment support, one major choice 
that was established early on in this thesis: RF for test automation and RPA purposes. The 
framework is by its maintainers described to be Python-based and keyword-driven automation 
framework for acceptance testing and robotic process automation (RPA). RF at its core does the 
work of running the tests and tasks that are written but interactions with the target under automation 
are handled by functionalities from various imported libraries. The representation of RF architecture 
as detailed by Robot Framework foundation is shown in Figure 3. [32] 
 
 
Figure 3: Robot Framework architecture by Robot Framework Foundation. Source: [32]. 
 
At a high-level the acceptance and keyword-driven nature of RF allows for clear, natural-language 
like presentation of automated tests and RPA tasks while still allowing for precisely coded 
51 
 
functionalities to be written and executed behind the said keywords. Also, because the framework is 
an open-source solution it is well maintained, transparent and has plenty of available libraries 
created by the community for a multitude of use cases. From generalization perspective, RF offers a 
low-code automation framework that is relatively easy to learn and can be widely applied to static 
and dynamic manipulation of different application types from traditional desktop applications to 
mobile and web applications. The framework is also compatible with all major operating systems 
and supports integration with Java and .NET based platforms by providing support to Jython and 
IronPython respectively. [32] 
Maintainability is an aspect that could perhaps be described to cover concepts such as modularity, 
understandability, reusability and refactorability. Considering the framework from this perspective, 
the open-source tool may provide both opportunities and challenges. As an open-source framework, 
the framework does provide plenty of visibility and opportunity for the afore mentioned concepts. 
Keyword oriented nature also allows to easily understand and follow the process flows even without 
diving too deeply into the source codes or imported library methods if the solutions are coded with 
maintainability in mind. However, depending on the number of libraries used or custom libraries 
written, the amount of content to study to effectively maintain the created solution may quickly turn 
out to be quite excessive if libraries are incorporated haphazardly. From this perspective each used 
library within the created automation solution will have to go through an evaluation whether it is 
truly worth using or is it just an extra burden in terms of maintainability. 
Aside from the framework another technological choice is to support the use of Docker3, allowing 
the automation can be executed in an isolated environment, thus reducing the need for setting up 
dedicated workstation environments for running the automation process. Containerization will also 
produce additional security benefits as student submissions can be dynamically manipulated in an 
isolated container instance, which potentially reduces the effects of malicious intent if there would 
happen to be any, for example, infested URLs or files within a submission package.  
Within this due diligence process it is noted that the current or possibly future course personnel of 
DTEK2040 do not necessarily have prior deep experience of the framework, Python or Docker. It 
is, however, reasonable to expect that the personnel have a grasp of programming - or perhaps even 
are experienced programmers - and that they are technically inclined in general. As such, it is 
 
3 Docker is a service that packages an application into a virtual container so it can be executed on various operating 
systems of choice. It is also able to offer flexibility in terms of where an application could be hosted; containers can be 
hosted on-premises or in cloud. Docker containers are technically hosted by software that is called Docker Engine, but 
the service and the underlying engine are, depending on use-purpose, available for free from https://www.docker.com/. 
52 
 
accepted that maintaining the created automation solution will require some familiarization with the 
applied technologies to maintain the solution. 
4.2.2 Risk identification 
The risk identification level of RPA implementation was determined to consist of identifying three 
aspects from business processes to determine the overall suitability for automation: (1) stability, (2) 
repetitiveness and (3) level of organization or standardization. To an extent this risk identification 
has been done throughout this thesis and condensed in the form of tables [Table 7, pp 30; Table 8, 
pp. 31]. 
The current process of manually assessing student submissions has been deemed stable, as in 
largely unchanging and routine-like, based on the conducted expert interview. As such the steps 
mentioned in general requirements GR-5 and GR-6 [Table B1, B-1] can be viewed as good 
candidates for automation. It is recognized that outside of these steps there may be potential for less 
stability as student study platforms or administrative tools can change or evolve, which may or may 
not cause changes in the future. The automation solution will attempt to keep this aspect in mind by 
assuming as stable of a starting and ending context for the requirement specified steps. 
Repetitiveness is something that the solution views as an inbuilt quality within the context of 
assessing student works in a programming course: for mandatory courses even the basic business 
process needs to be repeated hundreds of times to assess each individual submission. Of course, 
additional repetitiveness is created from the number of exercises in the DTEK2040 course 
curriculum which effectively creates its own multiplier when each student will be submitting 
multiple assignments within the span of the course. Based on the interview any given exercise may 
receive 100 – 120 student submissions each: in Spring 2022 for example this meant an estimated 
total of 500 - 6004 student submission to assess and provide feedback on throughout the span of the 
course. 
The level of organization of the business process is perhaps the most controversial aspect of this 
level from the three. The very basic business process of receiving a submission, assessing it, 
providing feedback, and registering a score is notably very standardized. From a test automation 
 
4 DTEK2040 has assignments for five different study parts: Exercise 0, Exercise 1, Exercise 2, Exercise 3 and Exercise 
4. Student numbers are based on estimates extracted from the expert interview and the total submissions number is 
assuming each student would return a complete set of assignments for each exercise. In reality the submission numbers 
might even be higher due to the course allowing for submissions of separate sub-tasks as well before the final complete 
submission. 
53 
 
perspective, however, the exercises themselves are in a way very softly standardized. The course 
material is crafted so that it guides the student towards certain kinds of solutions which for most of 
the so-called perfect submissions means that they are expected to be quite identical regardless of the 
student. This in turn means that test cases created based on these submissions will be able to test 
and assess the majority. On the other hand, the soft guidance means that there is also potential for 
alternative ways for the student to reach an acceptable solution and in these cases the student should 
not be punished for doing so. Such an alternative way can from testing perspective be seen as an 
edge case of sorts which is not necessarily though of when developing test cases that assess a given 
exercise. 
All in all, the current process is seen as a good candidate for automation, but it is also identified to 
contain risks in terms of test automation coverage and standardization within task descriptions. 
Thus, it is accepted that using the most recent anonymized student submissions material to craft test 
automation cases can still leave room for development in the future if edge cases from student 
submission in future course iterations are found. To further mitigate this risk, the automated 
solution will have to create sufficient documentation of each assessment process to provide the 
system maintainers enough information to act upon. This will also be considered when 
implementing general requirements GR-2 and GR-3 [Appendix B]. 
4.2.3 Bot creation and dry run 
The overall process for assessment has been identified from expert interview and the steps 
represented within Appendix A. The steps and tasks to automate have been further dictated in 
general requirements for the automated system by the requirements GR-4 and GR-5. To break these 
steps down further, the overall robot flow is represented in Table 10. 
Table 10: Steps and tasks outline for robot implementation. 
Prototype development steps and tasks 
Step Tasks 
1) Process starts from a state where the system is 
provided with packages of each individual student 
submission for an exercise 
a) Verify directory structure  
b) Verify content is provided 
2) Robot will prepare the submitted files for general 
assessment 
a) Create student specific directories to track 
submissions 
b) Extract submission package  
3) Create summary template a) Create summary sheet template to keep track of 
scores and feedback issues 
 (to be continued) 
54 
 
Table 10 (continues)  
4) Robot will perform the static assessment a) Search for static page files / web app source files 
b) Execute test cases that assert from source code 
c) Update summary sheet as required based on 
test results 
d) Save test log into student directory 
5) Robot will prepare the submission for dynamic 
assessment 
a) Open a static page in a browser instance or run 
web app and open it into a browser instance 
6) Robot will perform dynamic assessment a) Execute test cases that assert from static page / 
web app 
b) Update summary sheet as required based on 
test results 
c) Save test log into student directory 
7) Robot will create a summary containing test 
results, numerical score, and feedback issues 
a) Calculate final score of the exercise for each 
student submission 
b) Collect additional feedback from student specific 
test logs and attach content to summary 
c) Collect and organize summary, individual logs 
and other artifacts such as screenshots into a 
review directory 
8) Robot will output the summary files and artifacts 
for human review 
a) Output the review directory 
 
The step tasks consisting of scripts and test cases will be analysed in detail within Section 5 based 
on the actual implementation. 
The smoke testing – or dry running – of the robot will be performed in a two-tiered fashion: Firstly, 
the overall pipeline consisting of scripts that handle general tasks such as executing the RF files will 
be ran with a goal of determining that each required Robot file can be found and the environment 
dependencies are installed as required. Secondly, during the pipeline dry run the Robot files are 
executed with RF --dryrun option which causes library keywords to not execute but verifies that 
everything is syntactically correct and all imports within the Robot files can be resolved. 
55 
 
5 Implementation and results 
5.1 Architecture overview 
Solution applies the pipe-and-filter architecture pattern which is a popular choice, for example, 
among workflow engines and scientific computation systems that need to process large streams of 
data.  The pattern in general is based on building separate parts that process input from downstream 
to produce output to upstream for then to be used by the other parts present in the system. These 
filters often have very specified tasks which also adds to the ease of reasoning about the overall 
behaviour of the system, which in turn can be seen as a positive side in terms of learnability and 
maintainability.  Architecturally the pattern is also said to support aspects such as reusability, 
flexibility scaling and parallelization. [5] Applying this pattern to the prototype is represented by 
Figure 4. 
 
Figure 4: Pipe-and-filters implemented in prototype solution. 
 
The pipeline has been structured from eight main steps – or filters – that contain separate tasks 
extracted from the current manual working process of assessing student submissions of DTEK2040. 
These steps handle and manipulate different data as well as execute automation scripts to produce 
the final output in the form of assessment artifacts: the numerical results gathered by running 
56 
 
submitted material against automated test cases as well as general feedback that has been formed by 
different tools and methods. 
5.2 Environment 
5.2.1 Development environment 
To lay a baseline for the results to be presented later in the analysis section, it is necessary to 
mention specifications of the workstation that was used for development of the solution as well as 
gathering of result data. These specifications are displayed in Table 11. 
Table 11: Development workstation specifications. 
Development workstation specifications 
Component Manufacturer Model Additional information 
Motherboard MSI MSI Z170A GAMING 
M7 
MS-7976 
CPU Intel i5-6600K CPU @ 3.50GHz 
(4 CPUs), ~3.5GHz 
Memory Micron Technology DDR4 SDRAM Clock rate 1200MHz; 
4 * 4GB modules, 16GB 
in total 
GPU NVIDIA GeForce GTX 1070 Graphics clock 
1506MHz 
Processor clock 
1683MHz 
Memory 8GB GDDR5 
 
From performance point of view the GPU is not deemed as a crucial component for the overall 
results in terms of such results as execution time. In general, even those automation tests that 
dynamically test through GUI of a web application are not heavily GPU dependant since tests 
ideally are run in a headless mode. Instead, most important specifications are assumed to be 
processing power (CPU) and memory. 
The workstation used Windows 10 Home 64-bit (10.0, Build 19043) as its operating system, but the 
development and test run executions were done on Windows Subsystem for Linux. While no 
recorded comparisons were made in the scope of this study between the prototype performance 
when executed on Windows versus when executed on a WSL2 Linux distro, it is worth mentioning 
that the WSL2 seemed to give significant performance boost in terms of execution time. 
57 
 
5.2.2 Software 
The prototype was developed and tested on Debian GNU/Linux 11 Bullseye. Two aspects 
contributed to choosing this Linux distro for development OS. Firstly, it was the most readily 
available Debian distro for WSL2. Secondly, it allowed for debugging 3-slim-bullseye5 based 
Docker image.  3-slim-bullseye as the base image was chosen due to it providing the required 
Python versions and a good support for GUI based automation testing while keeping the Docker 
container size relatively small. Python included with 3-slim-bullseye at the time of development 
was Python version 3.10.7. 
The 3-slim-bullseye base image did not appear to contain absolutely everything that was required 
for development, a multitude of additional packages needed to be installed. These additional 
packages were mainly tied to RF libraries and their dependencies while some were simply tools that 
were required to perform certain bash scripted automated tasks. Packages mentioned in Table 12 
were deemed necessary for the implemented prototype to function. 
Table 12: Packages installed on top of base image. 
Required additional packages by the prototype during development 
Name Apt(*) / pip(**) / npm(***) 
package name 
Installed 
version 
Type / additional info 
Node.js nodejs (*) 18.7.0 JavaScript runtime environment 
Npm Installed along nodejs 8.18.0 Software registry, package manager and 
installer 
create-react-app create-react-app (***) 5.0.1 Package for creating react apps 
json-server json-server (***) 0.17.0 Node module for REST JSON services 
MongoDB mongodb-org (*) 6.0.1 Document database 
netcat netcat (*) 1.217-3 TCP / UDP utility 
lxml lxml (**) 4.9.1 Beautiful Soup dependancy 
html5lib html5lib (**) 1.1 Beautiful Soup dependancy 
Beautiful Soup beautifulsoup4 (**) 4.11.1 Python library; web scraping 
Robot 
Framework 
robotframework (**) 5.0.1 Python based automation framework 
Browser library robotframework-browser (**) 14.1.0 Robot Framework library; Playwright 
based browser / web GUI automation 
Requests library robotframework-requests (**) 0.9.3 Robot Framework library; HTTP api 
testing 
Excel library robotframework-excellib (**) 0.0.2 Robot Framework library; Excel file 
manipulation 
 
5 https://github.com/docker-library/python/blob/56d9977bf9a2e92882e71256dd288c8482233688/3.10/slim-
bullseye/Dockerfile 
58 
 
 
From packages mentioned in Table 12 [pp. 58], Netcat is mostly utilised to check Node backend 
services and Mongo database availability as required and pace the pipeline execution. Node.js and 
npm are also fundamentally required due to DTEK2040 relying in these tools; they are a natural 
choice to use in the dynamic testing of submitted assignments. Same logic applies to the decision of 
integrating create-react-app, json-server packages and the use of Mongo database. 
Beautiful Soup as a library specialises in web scraping and is partly relied upon in static testing. The 
library allows for pulling data from HTML content in a relatively straightforward manner by 
parsing source code into a parse tree. Lxml and html5lib are parsers that can be used by Beautiful 
Soup and they were chosen because they seemed to be the most popular choices based on 
supporting documentation and resources available. 
RF has a dual purpose in both automation utility and test automation. Excel library is used for 
supportive tasks related to record keeping and formulation of final score summary files, but the 
other two framework libraries are at the core of providing tools for dynamic testing. Browser library 
allows for interacting with the apps through GUI by providing Playwright engine6 based support for 
rendering engines and a ready set of keywords to be used after establishing a browser session. 
Requests library on the other hand provides a set of keywords for http requests which in turn allows 
for API testing alongside the GUI testing of web application functionalities. 
All in all, the final prototype version can be taken into use in any of the three ways: (1) by creating 
a Docker container using the included Dockerfile; (2) on a Windows workstation that supports 
WSL2 by installing Bullseye Debian distro and the required packages; (3) by installing and 
executing on a Linux workstation, preferably one with the same distro as the one used for the 
Docker image for this solution. 
5.3 Pipeline 
5.3.1 Logical structure and shell scripts 
The automation implementation consists of 17 separate shell scripts that are written in bash. Each 
script has their own role inside the architectural representation shown earlier in Figure 4 (pp. 55). 
 
6 Playwright is an open-source web testing and automation framework available at 
https://github.com/microsoft/playwright. It supports Chromium, Firefox and WebKit which means it is able to be used 
for automation testing web applications on all of the most popular browsers. 
59 
 
The scripts creating the overall system are divided into three separate logical layers, which aims to 
enhance the learnability and maintainability of the system. Separation is represented in Figure 5. 
 
Figure 5: Shell script logical layers. 
 
The layers as represented in Figure 5 above are: 
1) Pipeline orchestration: Responsible for maintaining global variables, checking the execution 
status of each main step as the pipeline is running and maintaining situational awareness of 
the overall pipeline execution. 
2) Filters: Contains main steps of the pipeline, each with their own processing responsibilities. 
3) Support: Scripts that execute certain processes, such as database manipulation or robot 
scripts, by the request of main step scripts. 
The logical layering is designed and implemented so that scripts belonging to a certain layer should 
only execute and provide input to other scripts belonging to the layer above but only provide output 
results to those below. This aims to provide easy-to-grasp concept for interactions between separate 
parts of the system. 
60 
 
In practice the first layer was implemented so that one bash script, the run_pipeline.sh, handles the 
triggering of all the primary step scripts. The decision to continue or terminate the pipeline is made 
based on the exit status received from each step; exit status of zero allows the pipeline to go forward 
while non-zero status will cause the pipeline to terminate. The main orchestrating script is also 
responsible for declaring and inserting values to global variables that it exports to be used by the 
step scripts. Examples of these variables are certain directory paths, tags to decide which tests 
should be executed as well as bookkeeping values such as exercise specific maximum score values. 
The second layer houses all the main steps of the pipeline and they are, as mentioned above, 
triggered in order by the orchestrating first layer script. Each step and their corresponding step 
script has its own process responsibility to complete, and they execute third layer support scripts as 
needed to complete the tasks demanded by their processes. 
In the first step the step_1.sh script is built to verify required files, directory structure and that any 
assessment data zip-packages are available to be processed for assessment purposes. The main 
reason for verifying the prototype system directory structure is to make sure the critical custom 
libraries, common resources or directories required by the applied tools – such as Mongo database – 
will not be a source of failure in case the pipeline continues executing. Some directories the script 
itself can create in case they are not found during the verification process but others will cause 
termination of the pipeline since they are expected to hold commonly used resources for the 
automation to function properly. For example, if in step 1 the directory containing automation test 
scripts is determined to not exist, it will cause the pipeline to terminate. The system files and 
directory structure are depicted in Appendix C. The appendix also described actions taken by the 
script depending on each file and folder state at the time of verification. 
If file and directory verification is successful in step one, the step exits with status zero and pipeline 
can proceed to the second step. The second step with its step_2.sh script is responsible for preparing 
the student submission contents. In practice this means extracting the zip package contents to a 
corresponding test subjects directory to be available from that path for later process steps. 
After extracting the submission contents to their respective test subject directories, the script finally 
verifies that the number of test subject directories is corresponding to the number of zip packages 
originally available inside the submission data directory. Overall successful completion leads to 
triggering the third step where step_3.sh executes the first robot scripts inside the pipeline: 
summary_template.robot and update_score.robot. The accomplishment of these robots fulfils the 
responsibility of creating initial summary template excel sheet with exercise specific test 
61 
 
identifications and student submission identifications as seen in Appendix E which depicts the 
template created for DTEK2040 Part 0 exercises. 
Fourth step inside the pipeline performs duties related to static testing of submissions. In practice 
step_4.sh first executes the static testing robot corresponding to the exercise currently being 
assessed and afterwards updates the summary template with score results for the student. This two-
part process is repeated for each student submission that needs to be assessed within the current 
pipeline run. 
After static testing, fifth and sixth steps of the pipeline are both related to dynamic testing of the 
submissions. Firstly, step_5.sh has the role of handling preparations required for dynamic testing. 
For the prototype developed during this study, the step handles tasks such as initial creation of 
system-under-test react application for exercises one, two and three as well as the setting up of 
Mongo database for exercise three. 
Once the services set up by step five are verifiably responding, then step six will use support scripts 
respective to each exercise that trigger the dynamic testing and score updating robots for each 
student submission. Support scripts are heavily utilized during step_6.sh execution because this 
allows for exercise specific variable inputs for the corresponding dynamic testing robots; for 
example, the DTEK2040 first exercise dealing with html tasks has no need for react server or 
Mongo database related variables to synchronizing the test efforts with backend services whereas 
the third exercise will require both. Thus, the use of support scripts here allows for not declaring 
irrelevant variables for robots executing dynamic tests. 
Finally, steps seven and eight have responsibilities related to wrapping up the pipeline execution 
and whatever artifacts have been created during the testing. During seventh step step_7.sh performs 
the calculation of total score for each student. This is done by using a pre-determined max score for 
the exercise being assessed and then triggering the update_score.robot to update the assessment 
summary template with student’s total score achieved over all the static and dynamic tests. Once 
step seven is complete, step_8.sh will perform pipeline teardown activities by killing active react 
and Mongo processes and perform tasks related to collecting artifacts as necessary. 
The supporting scripts contained within the logical boundary of the third layer were already slightly 
described in conjugation with step six. Aside from the step_6_ex#_support.sh scripts mentioned 
already, the other support scripts are very limited and specific in nature. These scripts have been 
62 
 
devoted to certain tool use, namely npm and Mongo, and they trigger actions such as starting a react 
app or connecting to Mongo test database and then waiting for these services to answer. 
5.3.2 Robot scripts 
RF scripts are files ending with .robot or .txt extensions that are constructed from four sections: (1) 
settings, (2) variables, (3) test cases or tasks and (4) keywords as shown in Code 1. 
 
Code 1: Example Robot Framework script contents. 
*** Settings *** 
Library     String 
Library     .${/}MyLibrary.py 
Variables   .${/}common_variables.py 
Resource    .${/}common_keywords.resource 
 
*** Variables *** 
${HELLO}  Hello World 
 
*** Test Cases *** 
First Example Test Case 
  [Tags]  tc1  full 
  Log Hello World From Test Case  1 
 
Second Example Test Case 
  [Tags]  tc2  full 
  Log Hello World From Test Case  2 
 
*** Keywords *** 
Log Hello World From Test Case 
  [Arguments]  ${test_case_number} 
  Log  ${HELLO} from test case ${test_case_number} 
 
Settings section is used to import external libraries and resources such as custom keyword sources 
or collections of custom determined variables detailed in an external file. Settings section also 
allows for declaring suite setup and teardown commands if needed. The robot scripts developed for 
the prototype take advantage of in the whole scope mentioned earlier and the robot files containing 
test cases perform setup and teardown processes as well as to avoid cases failing due to 
environment, or other than system-under-test related, reasons. 
While scripts can import variable collections from external sources, there is also the option to 
declare variables within the variables section. In both cases the variables declared will be treated as 
global variables. RF also supports a third way of providing global variables for the script by 
63 
 
declaring them through a command line interface when executing a robot script. An example of 
providing a global variable this way can be seen in Code 2 where the variables are declared with -v 
option. Alternatively, --variable can be used. 
 
Code 2: Example of executing the support_tasks.robot script with declared global variables. 
robot \ 
  -i "$et" \ 
  -d $RESULTS_DIR/"$ASSESSMENT_EX"/step_6/ex2_support_step/"$(basename "$sut")"/ \ 
  -v STUDENT_ID:"$(basename "$sut")" \ 
  -v EX_NUM:"$ASSESSMENT_EX" \ 
  -v DYNA_DIR:"$REACT_PROJ_DIR" \ 
  -v GLOBAL_ROBO_VARIABLES_DIR:"$GLOBAL_ROBO_VARIABLES_DIR" \ 
  -v RESOURCES_DIR:"$RESOURCES_DIR" \ 
  -v REPORTS_DIR:"$REPORTS_DIR" \ 
  -v TEST_SUBJECT_DIR:"$sut" \ 
  -v LIBRARIES_DIR:"$LIBRARIES_DIR" \ 
  $TASKS_DIR/support_tasks.robot 
 
The depicted way of providing variables is also very much utilized by the prototype as it proves to 
be a simple way to provide, for example, directory paths and tags for the robot scripts to use. 
Whichever way is used to declare a variable, it will in all the aforementioned cases be treated as a 
global variable and thus they are usable everywhere within the script. 
The test cases section contains the very core of any robot script. Test cases section can also be 
called the tasks section since this is only a semantical way of differing the purpose of a script in RF; 
if the script is more RPA than test automation in nature, the section is often titled “*** Tasks ***” 
instead of “*** Test Cases ***” but it will function all the same. For the purposes of describing 
robot scripts henceforth test cases and tasks can in this context be considered synonymous.  
In the test case section, a set of keywords can be declared under any given test case which will then 
be executed in order of appearance. Syntactically the test case name is declared first and the 
keywords that form the test logic are indented with a minimum of two spaces below it, as seen in 
Code 1 [pp. 62]. Test cases can be assigned multiple tags, which quite simply allows for creating 
collections. When the robot script is executed from command line it may be declared a tag option 
value and only tests corresponding to that tag will then be executed. Again, to take advantage of 
Code 1 as an example, one could perform a command robot -i full script-one.robot to execute both 
test cases or alternatively use -i with value tc1 or tc2 to only run one or the other. Tags are also 
relied on in the prototype to create collections on both static and dynamic tests for any given 
64 
 
exercise set. This is because some of the DTEK2040 exercises are divided between two different 
web applications in which case they must also be tested with separate sets of tests. 
Keywords used in test cases can come from multiple sources. They may be keywords from basic 
libraries that are included with RF installation, keywords from imported libraries, keywords from 
imported external keyword resources or keywords that are declared within the script itself inside the 
keywords section. While keywords have the potential to make the script contents easier to follow, 
they can also be a source of confusion from maintainability perspective. Initially it is very hard to 
know where an externally declared keyword is coming from unless you have prior knowledge of the 
imported libraries. 
The prototype relies on 11 separate robot scripts which are divided categorically into automation 
tests and RPA tasks. Eight of these scripts are test automation in nature while three serve the 
purposes of RPA. Robot scripts are utilized by both layer two and three pipeline scripts and as such 
they are not additionally categorizable with the same logic as previously described pipeline shell 
scripts. Robot scripts that are responsible for test automation are organized into static and dynamic 
test robots and one of each exists for every exercise set of DTEK2040 as per the scope of this study. 
The static testing robot scripts, named as ex#-static-tests.robot respective of each exercise set, 
contain the test cases which only interact with the submitted source codes to assert certain 
expectations of the source code content. For example, one of the assignment tasks given to students 
in exercise set 0 that deals with html basics is to include a table within their static web page. Such a 
task is tested in a static manner by ex0-static-tests.robot in practice with two atomic test cases: first 
parsing all table elements from submitted raw source code and then (1) verifying that at least one 
table element exists and (2) verifying at least one whole table element is built according to the 
hierarchy rules of permitted content for such an element7. This sort of atomic division of tests in 
general allows for partially scoring tasks such as the one mentioned in the example, which in turn 
allows rewarding the student for partial successes. In principle the same practice is followed in 
static testing of all the web programming exercises of DTEK2040. 
The dynamic testing robots, named in the prototype following a convention of ex#-dynamic-
tests.robot, interface with the submissions through either graphical user interface on an established 
browser session or through API requests. Interacting through graphical user interface is a stable 
method among all the exercises in terms of dynamic testing methods. Interaction is for the large part 
 
7 Rules for allowed elements and their hierarchical relations have been gathered from https://developer.mozilla.org/en-
US/docs/Web/HTML/Element/table (16/10/2022). 
65 
 
done by utilizing the Browser library available for RF since it offers practically all the required 
functionalities out-of-the-box with its available keywords. API testing serve more of a purpose for 
the latter two exercise sets of DTEK2040 where tasks and requirements for details such as API 
paths and requests are extractable from tasks and course material. It should also be mentioned that 
while RF offers the core functionalities for dynamic testing, to meaningfully execute these robots 
require the pipeline to setup services such as React servers and / or Mongo databases locally to 
create the backends and synchronize their availability in relation to test execution. This part is 
handled by the layer two and three shell scripts as described earlier. 
Aside from the eight test automation robots, the prototype also contains three more robots that serve 
roles more devoted to RPA tasks, as was briefly mentioned earlier. These robots are 
summary_template.robot, update_score.robot and support_tasks.robot. The first one, as the name 
suggests, is responsible for initially creating and preparing a master summary template for each 
pipeline execution run. Tasks include removing the old exercise specific template, if any is 
available, and creating a fresh new one to be filled with submission ids of those students who are 
part of the current run. 
Record keeping on the template is handled by update_score.robot once the template has been 
created. The tasks it performs are divided logically and by script tags to pre-filling submission 
names into a prepared template, score keeping for identifications already existing in the template 
and calculating the total score from the current state of the template. Observing chronologically, the 
first task of filling a freshly created template with the ids of those submission included in the 
current pipeline execution is done so that for this purpose the robot uses the names of the 
submission zip packages, i.e. the name of the package will be used in the template and any test 
results will be tied to it. Once the ids have been filled into the template, this robot is responsible for 
updating the score state after every execution of a static or dynamic test set. It does so by parsing 
and extracting test results from the result logs of the test robots and marking these into the template 
for score keeping purposes. Once all the testing during the pipeline is done, the robot calculates a 
total score for each submission based on the amount of passed tests and exercise specific max score 
value that can be declared as a global variable by the pipeline user. 
The third RPA robot, support_tasks.robot, performs preparative tasks to submissions contents as 
they come up for static or dynamic testing. These responsibilities include tasks such as copying test 
automation resources and submission content into system-under-test project directory before 
dynamic testing so that submission content can be used to setup and execute React backend to test 
66 
 
the subject against. For the prototype this robot handles certain tasks to harmonize certain parts of 
submission source code. For example, to test database interactions in DTEK2040 exercise number 
three, this robot replaces the original Mongo server connection strings with a test database 
connection string to use the one created during pipeline execution and test each submission against 
an equal database. 
5.4 Test cases 
Test cases implemented in the prototype were extracted from the programming exercise 
descriptions of DTEK2040. In practice the requirements engineering part of automated test cases 
were done quite lightly, relying only on the understanding of the researcher. Even though some 
specifics were additionally discussed with the course personnel, requirements engineering was not a 
focus-point in this study and as such no iterative methods of refining test cases was done, for 
example. However, 43 individual test cases were created and documented as test case cards with 
individual identification designations that would tie the case to a specific exercise set, such as the 
one example shown in Figure 6 [pp. 67]. 
67 
 
 
Figure 6: Example of a test case card as a document. 
 
In this example the test case was derived from part 0 sub-exercise 0.1 HTML where the course 
material asks the student to design a HTML page with “at least an anchor (link), a table, a list, an 
image and a form with some input elements. – You can leave the “action” attribute of the form 
undefined.” [38] Similarly cases were also crafted for testing the implementation of other elements 
mentioned in this sub-exercise as well as the other exercises covering parts 0 – 3 of DTEK2040. 
As it was briefly discussed during Section 2.4 regarding test design and development, the goal 
while designing these tests was not to detect the most errors with the subject under test but rather to 
detect the most relevant errors in terms of meeting the learning goals. 
In terms of the prototype, following task descriptions often meant the tests were designed to verify 
quite straightforwardly any mentioned functionalities while also relying to part specific example 
68 
 
material to gain an understanding of what was the expected style of execution for any given 
functionality. 
5.4.1 Part 0 – Basics of web applications 
DTEK2040 part 0 included in total five task sections, of which two dealt with coding tasks. The two 
coding sections of part 0 focused on HTML and CSS respectively. In general, the student was asked 
to design a web page that would fulfil a given set of requirements and then additionally style it 
using CSS in a way that would again conform to given styling rules. 
Test case designs for these tasks favoured the static method of testing. One major factor that 
contributed to the decision of designing static tests was that dynamic testing appeared to be too 
forgiving: once the student submitted source code was handled by a HTML parser to be displayed 
in a browser, many of the errors in the raw source code were in fact corrected. For example, a web 
page that was scraped for elements through a browser session would always pass verification of 
correct HTML anatomy since a parser would add any missing elements. The same observation also 
applied to table and list anatomy: child elements could be left undeclared and parsers would appear 
to fix the hierarchy when source code was viewed through a dynamic session. 
Out of the 17 automated tests for part 0 exercises, 14 fall under the static testing methods. These 
tests were written from almost a unit level testing perspective, and they seek to verify the presence 
and implementation of specific HTML elements and CSS stylings as required and guided by the 
course material or explicitly linked external sources. Due to the already mentioned issues in using 
HTML parsers when scraping student submission source codes, many of the test cases could not 
take advantage of Beautiful Soup to perform required source code scraping with readily available 
tools. Instead, regular expressions were heavily utilized throughout static tests because RF can 
support the usage through libraries such as BuiltIn and String. 
Part 0 is unique amongst the exercises in a sense that it had plenty of explicit references to an 
external resource: MDN Web Docs8. Thus, test assertions were formed by cross-referencing the 
course study material for Part 0, the explicitly linked resources and requirements given by exercise 
tasks. Basis are presented alongside each static test on the next page in Table 13 [pp. 69]. 
  
 
8 MDN Web Docs is a repository that contains a lot of documentation regarding web standards as well as developer 
guides. Available at https://developer.mozilla.org/en-US/ (16/10/2022). 
69 
 
Table 13: Static automation tests implemented for DTEK2040 exercise 0. 
Exercise 0 static test cases 
ID Test case name Basis From Course Material 
E0-T1-1 Verify Html Anatomy Part 0 study material, “Traditional web application”, 
example source code. 
Part 0 exercises, Section 0.1 HTML: explicitly 
provides a link to Mozilla HTML tutorial that 
describes the anatomy of an HTML document. 
E0-T1-2 Page Contains A Valid Anchor Element Part 0 exercises, Section 0.1 HTML: requires that 
the page must contain an anchor (link) and explicitly 
links to <a> element on mdn web docs. Documents 
mention a hyperlink is created in combination with 
<a> and its href attribute. 
E0-T1-3-1 Page Contains A Table Element Part 0 exercises, Section 0.1 HTML: requires that 
the page must contain a table and explicitly links to 
<table> element on mdn web docs. 
E0-T1-3-2 Page Contains A Table Element With 
Proper Hierarchy 
Mdn web docs reference clearly dictates permitted 
content for <table> and element hierarchy. 
E0-T1-4-1 Page Contains A List Element Part 0 exercises, Section 0.1 HTML: requires that 
the page must contain a list and explicitly links to 
<ul> element on mdn web docs. 
E0-T1-4-2 Page Contains A Valid List Element Mdn web docs reference dictates permitted content 
and that at least one <li> must be included as a 
child. 
E0-T2-1 CSS File Exists And It Is Referred In The 
Index Html File 
Part 0 study material, “CSS”, teaches the method of 
separating CSS from HTML and referencing it in 
<head>. 
Part 0 exercises, Section 0.2 CSS: Explicit link to 
Mozilla’s CSS tutorial, which also teaches the same 
method. 
E0-T2-2 Hover Style Is Defined Part 0 exercises, Section 0.2 CSS: requires a style 
rule for anchor element on hover action. 
E0-T2-3-1 List Style Is Defined Part 0 exercises, Section 0.2 CSS: requires a style 
rule for lists. 
E0-T2-3-2 Table Style Is Defined Part 0 exercises, Section 0.2 CSS: requires a style 
rule for tables. 
E0-T2-5-1 Id Selector Is Used Part 0 exercises, Section 0.2 CSS: requires the use 
of id selector to define styles. 
E0-T2-5-2 Class Selector Is Used Part 0 exercises, Section 0.2 CSS: requires the use 
of class selector to define styles. 
E0-T2-6-1 Element Position Options Are Used Part 0 exercises, Section 0.2 CSS: asks to 
experiment with ways to position elements. 
E0-T2-6-2 Element Sizing Options Are Used Part 0 exercises, Section 0.2 CSS: asks to 
experiment with ways to adjust the size of 
elements. 
 
Dynamic testing was applied in test cases that would still be able to successfully fulfil their 
verification goal even without the mentioned prettification of source code by a HTML parser. 
70 
 
Essentially the dynamic tests for Part 0 exercise could have been created as static tests as well using 
the regular expression approach, but from development perspective dynamic versions were not only 
more straightforward but also faster to create. This was due to the utilization of Browser library and 
the ability to, for example, integrate asserts to keywords that would also interact with the web page.  
All in all, three dynamic cases were identified, and they are briefly introduced with their basis in 
Table 14. 
Table 14: Dynamic automation tests implemented for DTEK2040 exercise 0. 
Exercise 0 dynamic test cases 
ID Test case name Basis From Course Material 
E0-T1-5-1 Page Contains An Image Element Part 0 exercises, Section 0.1 HTML: requires that 
the page must contain an image and explicitly links 
to <img> element on mdn web docs. 
E0-T1-5-2 Page Contains A Valid Image Element Mdn web docs dictate that src attribute is required.  
E0-T1-6 Page Contains A Form With Input 
Elements 
Part 0 exercises, Section 0.1 HTML: dictates that 
the page must contain a form and explicitly links to 
<img> element on mdn web docs. 
 
Decision to use dynamic testing in these cases was additionally affected by the perception that 
doing so would in fact increase maintainability. Utilization of library offered keywords without 
extensive regular expression variables means that if something in tasks or course material changes, 
there are less variables such as regular expressions that would need to be re-engineered to fix any 
broken tests. 
5.4.2 Part 1 – React and JavaScript 
Part 1 exercises introduce the student to React and JavaScript. Exercise consisted of 10 sub-tasks 
that were divided between the student developing two projects: (1) Courses application and (2) 
Feedback application. With Part 1 the test cases were numerically designed to separate into static 
and dynamic tests quite evenly. However, since the tasks regarding Courses application had plenty 
of code examples available in the exercise material, this reinforced the decision to favour static 
testing for these tasks. 
As it was noted in Section 2.3 of this study static testing of source code should always start with 
reducing diversity. In this case, it seemed reasonable to hypothesize that such abundance of 
examples would naturally lead reduced diversity in student submitted source codes for these tasks, 
which would work well for static assessment of objectives such as code structure. Though, 
71 
 
Feedback application also had a few code structure related requirements extractable from the tasks, 
as can be seen in Table 15, which presents the collection of static test cases. 
Table 15: Static automation tests implemented for DTEK2040 exercise 1. 
Exercise 1 static test cases 
ID Test case name Basis From Course Material 
E1-T1-1 Course App Consists Of Three Main 
Components 
Part 1 exercises, Section 1.1 Refactoring 
components and using props: requires the 
application to consist of Header, Contents and Total 
components. 
E1-T1-2 Course App Contents Consist Of Part 
Components 
Part 1 exercises, Section 1.2 More components: 
requires the Contents component to consist of three 
Part components. 
E1-T1-3 Course App Data In Objects Part 1 exercises, Section 1.4 Objects in an array: 
provides an example code for variable data and 
requires the application to be modified accordingly. 
E1-T1-4 Course App Props Are Passed Directly Part 1 exercises, Section 1.4 Objects in an array: 
requires that objects should not be passed between 
components but rather be passed directly using an 
array. Provides an explanative example code. 
E1-T2-3 Feedback App Contains Several 
Components 
Part 1 exercises, Section 1.8 Feedback app, part 3: 
Requires the application to contain at least Button, 
Statistics and Statistic components. 
 
As might be expected from the prior description of static cases covering most of the Courses 
application tasks, the opposite is true for dynamic test cases. While Courses has only one dynamic 
test case dedicated to it to test a baseline requirement for any web application, bulk of the dynamic 
tests cover the Feedback application functionalities. Designing the tests in Table 16 leaned towards 
black box approach and applied Myers’ heuristics [pp. 17-18] to identify relevant equivalence 
classes and extract inputs from task descriptions. Part 1 exercises offered the chance to apply state 
transition testing with test cases E1-T2-2 and E1-T2-4. 
Table 16: Dynamic automation tests implemented for DTEK2040 exercise 1. 
Exercise 1 dynamic tests 
ID Test case name Basis From Course Material 
E1-T1-5 Browsing The Course App Expert interview, discussion regarding baseline 
requirements: application must fulfil the basic 
requirement of compiling in case of a react app and 
it must open in browser. 
E1-T2-1 Feedback App Contains Three Buttons Part 1 exercises, Section 1.6 Feedback app, part 1: 
requires the application to include three buttons. 
E1-T2-2 Feedback App Statistics Are Hidden 
When Empty 
Part 1 exercises, Section 1.9 Feedback app, part 4: 
requires no statistics are initially displayed. 
(to be continued) 
72 
 
Table 16 (continues) 
E1-T2-4 Feedback App Displays Statistics Part 1 exercises, Section 1.7 Feedback app, part 2: 
requires the application to display statistics based 
on the given feedback (button choices). 
E1-T2-5 Feedback App Statistics Contents Are In 
HTML Table 
Part 1 exercises, Section 1.10 Feedback app, part 
5: requires the statistics to be shown as an HTML 
table. 
 
Overall, both test case types were quite easily implemented with available RF libraries and did not 
require extensive efforts to craft custom keywords to achieve test goals. The static test cases in Part 
1 exercises again had to rely a lot on utilization of regular expressions while still managing to stay 
in relatively low numbers of lines of code per test case. This is most likely due to the reliance on 
low diversity of source code for submitted test objects. Also, most of the assert operations in static 
test cases were possible to be implemented by mainly relying on libraries Builtin and String, which 
were stable sources of keywords in Part 0 static tests as well. This allowed the amount of custom 
coded keywords to stay low, reducing the total lines of code for the tests as well.  
5.4.3 Part 2 – Communication with server 
Part 2 extended the study of topics covered during Part 1, but also introduced web application 
interaction with server. Part 2 exercises covered in total 10 sub-tasks that were again split between 
two different projects: (1) continuing the Courses application from Part 1 and (2) Phonebook-front 
application that also included the interaction between frontend and server. 
As for the test cases for Part 2, static and dynamic tests were not as disproportionately divided 
between applications as they perhaps were with previous exercise, which was probably due to both 
projects in Part 2 having tasks focusing on functional aspects rather than code form or content 
details. While assessing content structure was still implemented as a goal for majority of static tests 
in Part 2, as a method it also usable for generally fulfilling the set requirement of verifying 
requirements set upon certain files external to the core application itself in the Phonebook-front 
project. Static tests for part 2 are presented in Table 17 [pp. 73]. 
  
73 
 
Table 17: Static automation tests implemented for DTEK2040 exercise 2. 
Exercise 2 static tests 
ID Test case name Basis From Course Material 
E2-T1-1 Content Is Structured Correctly Part 2 exercises, Section 2.1 Course contents: 
gives a required component structure of the 
application. 
E2-T1-3 Module Separation Is Implemented Part 2 exercises, Section 2.3 Separate module: 
requires the component Course to be implemented 
as a separate module that is imported by App. 
E2-T2-2 App Is Divided Into Several Components Part 2 exercises, Section 2.7 Telephone directory, 
part 4: requires the application to be refactored and 
separating a minimum of two components from the 
root component. 
E2-T2-4 Initial App State Is Stored Into File Part 2 exercises, Section 2.8 Telephone directory, 
part 5: requires that initial entries are stored into 
db.json file and file placed into root directory of the 
project. Json format of entries is given as an 
example. 
 
As can be seen in Table 17, some of basis for static tests in Part 2 regarding contents and structure 
are already familiar from Part 1. Designing these test cases thus followed same principles as earlier. 
However, initiating these static tests for automation proved to be a lot more challenging than 
previously due to the file and directory structure varying a lot between student submissions. 
Dynamic test cases presented in Table 18 relied on a database state especially crafted so that the 
content could be predicted and counted on for test assertions and only the submitted applications 
interactions with the server would be under test. 
Table 18: Dynamic automation tests implemented for DTEK2040 exercise 2. 
Exercise 2 dynamic tests 
ID Test case name Basis From Course Material 
E2-T1-2 Exercises Total Number Is Calculated 
And Displayed 
Part 2 exercises, Section 2.2 Number of exercises: 
requires the total number of exercises be added to 
the application and displayed as shown in an 
example image that is provided. 
E2-T2-1 Prevent Adding Already Existing Name 
UI 
Part 2 exercises, Section 2.5 Telephone directory, 
part 2: requires that application prevents adding the 
proposed entry if the name exists. 
E2-T2-3-1 Contact Can Be Deleted UI Part 2 exercises, Section 2.10 Telephone directory, 
part 7: requires that an entry can be removed with a 
button attached to each directory and provides an 
example image. 
(to be continued) 
  
74 
 
Table 18 (continues) 
E2-T2-3-2 Contact Can Be Deleted API Part 2 exercises, Section 2.10 Telephone directory, 
part 7: requires that an entry is deleted from the 
server with an HTTP DELETE request to 
localhost:3001/persons/:id 
E2-T2-5 Initial State Is Fetched From Server Part 2 exercises, Section 2.8 Telephone directory, 
part 5: requires that application to return entries as 
a list with HTTP GET request to 
localhost:3001/persons. 
Part 2 exercises, Section 2.9 Telephone directory, 
part 6: requires that application state is 
synchronized with server state. 
 
Part 2 exercises offered chances to further apply black box techniques in terms of designing tests 
presented in Table 18. E2-T2-1 and E2-T2-3-1 as well as E2-T2-3-2 were prime candidates for 
cause-effect graphing while mapping rule sets with expected outputs in a decision table. Testing the 
interactions between frontend and server also presented the opportunity to apply condition coverage 
dynamic white box technique while designing E2-T2-1 and E2-T2-3-2. Though, it must be said that 
using them in this context perhaps would technically fall under so the called grey box technique as 
it cannot be known for certain in detail what the student submissions implementation of API 
interactions or UI design is, which in turn may cause the test cases to be unreliable if assumptions 
of submission diversity are wrong. 
5.4.4 Part 3 – Web application with database 
Part 3 built on top of the prior parts by introducing basics of document databases and switching 
from interacting between frontend and a local JSON-server to interacting between frontend and 
Mongo database. Part 3 exercises contained 12 sub-tasks in total which also included a single task 
about deploying an application to a cloud application platform called Heroku. However, verifying 
the cloud application deployment had to be excluded from the scope of test automation for this 
prototype due to so many case study materials containing either old or invalid Heroku application 
addresses, no address at all or assumably an old or invalid Mongo backend connection for their 
cloud application. 
Unlike other exercises of DTEK2040, a decision was made to approach test case design for Part 3 
purely from dynamic testing approach. The main reason for this decision was that the exercise tasks 
set very few strict guidelines or code-level detailed hard expectations for student implementations. 
In other words, while Part 3 obviously has quite specific requirements for application 
functionalities, it still leaves the student room to manoeuvre with implementation details and as 
75 
 
such designing static tests to compliment verification of required functionalities was seen as too 
much of a challenge in terms of return on invested time resources. Thus, all the required 
functionalities are tested by the seven dynamic test cases presented on the next page in Table 19. 
Table 19: Dynamic automation tests implemented for DTEK2040 exercise 3. 
Exercise 3 dynamic tests 
ID Test case name Basis From Course Material 
E3-T1-1 Json Array Is Returned From A Localhost 
Server 
Part 3 exercises, Section 3.1 Phone directory 
backend, part 1: requires that entries are returned 
from localhost:3001/api/persons as a json array. 
Provides an example image of entries returned in 
json format. 
Part 3 exercises, Section 3.10 Phone directory and 
database, part 1: requires that the entries are 
fetched from database. 
E3-T1-2-1 Single Id Request Successful Part 3 exercises, Section 3.2 Phone directory 
backend, part 2: requires that a single entry is 
returned from localhost:3001/api/persons/:id. 
E3-T1-2-2 Single Id Request Unsuccessful Part 3 exercises, Section 3.2 Phone directory 
backend, part 2: requires that if id is not in directory, 
must respond with appropriate status code. 
Part 3 study material, “Fetching a single resource”, 
mentions the use of status code 404 in such cases 
as an example. 
E3-T1-3 Contact Can Be Deleted Using DELETE Part 3 exercises, Section 3.3 Phone directory 
backend, part 3: requires that a HTTP DELETE 
request to localhost:3001/api/persons/:id removes 
the entry with given id. 
Part 3 exercises, Section 3.12 Phone directory and 
database, part 3: requires that removal must also 
affect database contents. 
E3-T1-4 Contact Can Be Added Using POST Part 3 exercises, Section 3.3 Phone directory 
backend, part 3: requires that a new entry can be 
added with HTTP POST request to 
localhost:3001/api/persons. 
Part 3 study material presents HTTP POST request 
examples that are sent with content data in json 
format. 
E3-T1-5-1 Entry Adding Error Handling UI Part 3 exercises, Section 3.5 Phone directory 
backend, part 5: requires that adding a new entry 
should not be allowed if a) name or number is 
missing or b) name already exists in directory. 
Provides an example error response. 
E3-T1-5-2 Entry Adding Error Handling API Part 3 exercises, Section 3.5 Phone directory 
backend, part 5: requires that adding a new entry 
should not be allowed if a) name or number is 
missing or b) name already exists in directory. 
Provides an example error response. 
 
76 
 
Test designing was approached like in Part 2 with emphasis on black box techniques. To be able to 
trust and make certain expectations regarding the database connection and contents of the document 
database, a decision was made to use a local Mongo database as the test backend. For this to work 
the preparation process leading to dynamic testing needed to parse and replace proper database 
connection strings into each submission currently coming up as a system-under-test. The 
preparation process also needed to include verification of other minute details such as which ports 
were defined for frontend and backend applications. 
Preparative tasks required prior to executing these test cases rely on certain heuristics to attempt and 
locate parts of the submitted material that needs to be edited to get all the connections working. 
During the design and development process it was often noticeable that discovering such a heuristic 
to reliably work for every submission currently provided for case study purposes would be a 
challenge. Due to this the prototype can function with certain kinds of submission directory and file 
structures but cannot guarantee to work with all the possible variations. 
5.5 Analysing and evaluating the design 
5.5.1 Meeting the set requirements 
Earlier in this study a few requirements for automation were crafted. These requirements are 
presented in detail within Appendix B, but the targets of these requirements were: 
• GR-1: Processing takes less than 5 minutes per submission. 
• GR-2: System contains documented and explanative general architectural overview; test 
case descriptions and custom keywords are 100% commented; custom library methods are 
100% commented. 
• GR-3: Course personnel can access the final summary document and change individual 
scores if needed; summary provides reasoning for individual task scores. 
• GR-4: System can successfully cover step number 3 of Appendix A: Prepare the fetched 
submission for assessment. 
• GR-5: System can successfully cover steps number 3 – 6 of Appendix A from technical 
perspective. 
77 
 
The metric used to observe whether the prototype would meet GR-1 was the time it took in seconds 
to execute the whole pipeline from start to finish. Total time of a single execution from start to 
finish was determined by integrating the measurement into the pipeline: when pipeline started 
execution, time current time was captured with command date -u +%s9 and before the final exit 0 
command date -u +%s was used again and the difference between end time and start time 
calculated and logged by appending the result into an external log file. Static and dynamic test 
steps, the total execution time for both steps four and six, were recorded the same way to gather 
results of test set execution times as well. 
Each exercise set was executed 10 times and the execution times logged as described. More detailed 
results of these runs are available in Appendix D, but on average the execution times were observed 
to be as represented in Table 20. 
Table 20: Average execution times per submission. 
 Average execution time in seconds per submission 
 Static test step Dynamic test step Other processes Pipeline total 
Exercise 0 3,10 26,36 1,36 30,82 
Exercise 1 1,77 30,51 1,81 34,09 
Exercise 2 8,51 43,06 2,12 53,69 
Exercise 3 No tests 35,06 2,20 37,26 
 
 
Based on the observed execution times, the prototype was able to clearly fulfil GR-1 [Appendix B]. 
Exercise 2 was observed to have the longest total running time on all measured categories with a 
total execution still managing to stay below one minute when averaged for a single submission. 
It must be considered that when averaging times for a single submission of a batch run that includes 
multiple submissions, some of the pipeline steps are repeated for each while some are performed 
only once. Overall performance benefits from multiple submission batch runs as then steps 1, 2, 3, 
5, 7 and 8 are only executed once in total and steps 4 and 6 for each submission. Thus, there is 
relatively less time spent on other-than-testing processes for each submission. 
The average times as presented in Table 20 can also only be used for certainty to provide evidence 
that there is potential to perform assessment, and especially functional assessment, of student 
submissions in a considerably faster paced manner than the suggested 5 minutes on average per 
submission estimate raised in expert interview, not including the gathering and providing of 
 
9 Command results into a value presenting seconds (%s) since 1970-01-01 00:00:00 UTC. 
78 
 
qualitative feedback to the student. This claim ties into the quality of the currently implemented 
prototype test cases, which will be analysed later in a sub-section of their own. 
The documentation requirements of GR-2 [Appendix B] are fulfilled with the materials produced 
through this study. The general architecture overview is included as contents of Section 5 with any 
relevant supporting material. As for the comment requirements for source codes, the prototype 
project is available online10 with custom keywords and custom library content commented to 100% 
coverage. Common keywords resource and the custom library are also added as Appendix F and 
Appendix G of this thesis. Coverage to measure was selected to be method description coverage 
where a custom keyword or a method included inside a custom library was determined to be 
covered when comments described (1) what does the keyword / method do, (2) what does it require 
as input and (3) what does it produce as output. 
GR-3 is fulfilled by the pipeline managing to produce a summary file that includes PASS / FAIL 
results of each submission at the level of individual test cases. Test cases themselves aim to test 
individual features or aspects of features so that the results can be used as guidance for feedback. 
Additionally for dynamic tests, whenever necessary, the system gathers screenshots that are also 
included as final artifacts. These screenshots of student submitted applications in action can be used 
to verify if the system has seemingly made a just call in passing or failing the subject-under-test for 
any given dynamic test. 
GR-4 and GR-5 are fulfilled as described in the prior sub-sections of Section 5, especially the sub-
section 5.3 where individual steps of the pipeline are recognizably corresponding to the respective 
steps mentioned in Appendix A. However, it is arguable that the prototype receives a pass with 
comments in terms of these general requirements. Since the prototype is built to assume certain 
kinds of submissions it is not able to cover for all the possible variety among submissions. 
5.5.2 Quality of the test cases 
As mentioned during this study, test cases were designed by extracting requirements from the 
course assignments and study materials and then developed so that the provided case study 
materials could be used as test subjects. Due to the diversity of submissions in various aspects, 
some of the provided case study material had to be rejected because attempting to cover every 
 
10 Prototype repository is available on UTU GitLab (https://gitlab.utu.fi/tesalo/salomaa-thesis), GitLab 
(https://gitlab.com/tomisalomaa/salomaa-utu-thesis-prototype) and GitHub (https://github.com/tomisalomaa/salomaa-
utu-thesis-prototype). GitLab repositories have a working example of the CI-implementation. Main branches are frozen 
to present the state of the implementation described in this thesis. 
79 
 
possible student approach to any given task simply was not realistic. Realisation of this during the 
development process also spoke for the notion discovered from background theory that 
standardization of the target process and its aspects is a precondition for successful RPA 
implementation. 
To gain insight into the usability of test cases as they are, result artifacts from 10 pipeline 
executions per each exercise were compared to manual results provided with the case study data. 
Subjects that received no points at all for a given exercise during the 10 pipeline executions were 
discarded from the comparison since the failure to gain any points at all was in these cases observed 
to be the result of an unorthodox submission that deviated from the standard assumed. In total, 
results for seven submissions of exercise 3 had to be removed from the comparison. Reasons for 
removal included 
a) three cases of missing component files; 
b) two cases of unorthodox directory structure causing the prototype not to be able to find 
and/or repair submission contents to run the application; 
c) one case of using JSX syntax extension instead of JS; 
d) one case of implementing material UI library resulting in application crash due to pipeline 
not supporting this. 
 
Table 21: Comparison of average prototype results versus manual results. 
    
Points available 
Proto versus manual average result 
difference 
Test set Submissions Course Prototype Points received Percentage points 
Exercise 0 96 5 2 0,13 6% 
Exercise 1 104 10 10 -3,40 -34% 
Exercise 2 68 10 10 -1,28 -13% 
Exercise 3 47 12 12 -2,57 -21% 
 
Based on the measured comparisons presented in Table 21, the Exercise 0 test set managed to score 
quite close to the manually scored results, granting on average 0,13 points per submission more 
than the provided manual results. Exercise 0 of DTEK2040 consists of two coding tasks and three 
80 
 
non-coding tasks which causes the total points in manual results to receive a maximum value of 5. 
As the prototype only considers and tests coding tasks, the maximum point was set to value of 2.  
The case study data did not break down the manually given total score per subject and this must be 
considered when attempting to analyse these results; unless a student has received a perfect score 
from manual assessment, it would be impossible to tell for certainty how successful were the 
submitted coding tasks. If all the subjects scoring less than the maximum from manual assessment 
were removed to be sure comparison is done against subjects that received full score from coding 
tasks, and then the results compared again, the average points received from prototype assessment 
of Exercise 0 test set is -0,15 points or -8% percentage points. This would then be in line with rest 
of the test sets, as they commonly resulted into worse total scores on average than their manual 
comparisons. 
From the score results alone, it seems like dynamic tests contribute to greater difference between 
manual and current prototype assessment scores. This is most likely since the automated tests cover 
one type of a solution whereas manually assessing the submissions allows for granting points for a 
variety of ways to solve the given task. For example, the course assignments dictate API paths to 
use in exercises 2 and 3 and these have been used precisely in the design and development of 
automated tests. If a submission then happens to use request paths such as /persons/:id instead of 
/api/persons/:id this would always result in a failed results as the automated tests would fail their 
API requests whereas a human manually assessing the submission might be inclined to not consider 
this a problem as long as the functionality is correct. 
Static tests seemed to also be a bit less reliable for exercises 2 and 3 since source code contents had 
more variance in component and overall variable naming. While the course material provided some 
examples in the assignments of these exercises, the actual functionalities are of course achievable 
without using the exact component content structure and naming as in the examples. Currently the 
prototype, however, is not able to cover all such cases and it might fail a test case for a submission 
simply because a regular expression phrase used in a static test does not cover the exact form a 
student has used for a variable in his or her source code, for example. 
All in all, the current prototype test cases are very strict in terms of interpretating the course 
assignments and the individual tasks contained within. However, the assignments themselves are 
not at times standardized or detailed enough in terms of dictating strict implementation, which 
leaves students some room for manoeuvring when it comes to their source code. In fact, the 
assignments seem to drive student implementations towards a development style one might describe 
81 
 
as example-driven development. From a pedagogical standpoint allowing the student to express 
oneself and even perhaps attempt alternative approaches is not necessarily a bad thing, especially 
for the latter exercises, but it also creates considerable challenges for automation. 
5.5.3 Answering the main research questions 
The first research question presented at the beginning of this thesis was Q1: How to support the 
assessment of web application programming assignments with test automation? To build upon the 
foundation already formed by answering the related supporting research questions in Section 2.6, by 
also integrating the results and data gathered from prototype executing, the following answer to Q1 
is collected: 
1) Course assignments must be designed or adapted to support the idea of using test automation 
to gain significant benefits; building test automation on top of an already existing course 
material is a possibility but the existing assignments may allow for too much submission 
variance in terms of submission content or the process of returning said submissions. 
2) In general, the test design strategy of starting with cause-effect mapping of potential test 
input combinations and proceeding with supplementary boundary value analysis and 
equivalence classes was verified empirically to work well. Further enhancing automated test 
sets through experience-based error-guessing technique is seen as a viable continuation for 
this strategy. This strategy should be preceded by gathering the test basis together with the 
relevant personnel who are familiar with the requirements set for assessing submissions 
during the manual process to ensure test assertions are in line with what is currently being 
tested; extracting requirements from assignment descriptions alone may lead to very strict 
asserting where tests are failing even if the desired functionality exists. 
3) The focus of automated testing can be kept in functional testing, which naturally lends itself 
to applying dynamic testing approaches with black box techniques while developing test 
cases. While the functional testing can be further supported with complimentary tests taking 
the static testing approach, this focus can also be seen to free up manual testing efforts and 
allow the course personnel to focus more on other quality aspects of the submission. 
The second research was Q2: How to support the assessment and feedback process of assignment 
assessing with RPA? This question was also based by answering its respective sub-questions in 
Section 3.4. The full answer, however, could be collected as follows: 
82 
 
1) First step is to identify the overall process that is currently being followed to assess the 
assignments. This will allow for discovering automation potential within the process and 
determine overall whether the process is a good candidate for RPA; the automation 
candidate should ideally have high transaction volumes, be highly standardized, have well-
defined implicit logic and be mature enough so that the process itself will not be 
dynamically changing as the automation is being implemented. 
2) Within the context of this thesis RPA was observed to be successful in performing file 
manipulation tasks in terms of assessment record keeping and tracking, manipulating the 
student submissions file and directory structures to perform test preparations, verifying 
submission contents, and summarizing quantitative feedback. 
3) RPA potential for discovering and presenting qualitative feedback was not empirically 
proven within the scope of this thesis, but it was theoretically researched and discussed. As 
an aspect it is something that could be integrated by the means of RPA to support the 
feedback process while also offering potential to integrate novel technologies to make it 
happen. 
4) RPA has big potential to shorten the feedback loop if the course assignments are built to 
allow RPA support. For example, by integrating RPA together with test automation into a 
study platform such as the Web and Mobile Programming website, the technology would 
perhaps allow for partial submissions to be reviewed and assessed instantly while also 
providing feedback to the student as he or she is working on the assignment solution. 
Overall, the collected answers for Q1 and Q2 are written through what additional observations were 
made through analysing the results gathered from the design, development and testing process of 
the prototype. As the empirical observations did not contradict the answers and the discussion 
related to the respective sub-questions, it is noteworthy to mention that the details discovered in 
those answers are also relevant even if they were not repeated here. 
5.5.4 Suggestions and further potential 
As it was demonstrated, in general the pipeline and automation is able to meet the requirements set 
for the prototype in the context of this study. The pipeline itself, however, can surely be further 
optimised in terms of performance by introducing parallelization, for example. The system could 
perhaps also be further extended to cover more ground by integrating it to either Moodle 
workspace, which is currently used as a platform for students to submit their assignment works and 
83 
 
course personnel to mark assessments and provide feedback, or the DTEK2040 course website. 
Integration with the course website could even provide opportunities to automate assessment of 
partial assignment submissions and thus allow the students to receive feedback while they are 
implementing their full assignment submission. 
The static and dynamic test cases themselves are also an obvious area of potential improvement, as 
can be seen from the average results displayed earlier. The fact that presenting their potential and 
possibility to integrate them into a pipeline was at the centre of design and development work rather 
than the usability also means that they cannot be suggested to be taken into use without further 
development work. One way to make the tests more usable and able to serve the needs of 
DTEK2040 – or any other course for that matter – would be to first focus on the requirements 
engineering of these tests together with course personnel and then start iteratively improving them. 
Preferably course assignments and their content would also be modified so that the assignment 
contents could possibly be further standardized or detailed from test automation perspective given it 
does not degrade respective assignment’s pedagogical worth. 
As far as pedagogy is concerned, the ability to help provide the student formative feedback is an 
area to improve the prototype in. Attempting to integrate the theory discussed in Section 3.3 could 
even provide interesting venues to integrate novel technologies from the field of artificial 
intelligence to discover and formulate qualitative feedback points from student submitted materials. 
Other, perhaps a more realistic first step, would be to integrate static analysis methods and tools 
such as so-called “linters”11 to flag programming errors and smells from submitted source codes and 
then further transform these into more constructive feedback to be added as part of the summary 
artifacts at the end of the pipeline. 
 
11 A linter is often described as a static analysis tool that specifically analyses code quolity by discovering redundant 
code, inefficient pattens, anti-patterns, smells and much more. These tools are commonly found in many IDEs and often 
integrated within CI pipelines in professional software development. Using linters could prove useful in discovering 
points for qualitative feedback, and many such tools exist for JavaScript. Some examples include ESLint 
(https://eslint.org/) , JSLint (https://www.jslint.com/) and Standard JS (https://standardjs.com/#javascript-style-guide-
linter-and-formatter). 
84 
 
6 Conclusions and future work 
This thesis studied test automation, RPA and the principles of valuable feedback in the context of 
automated assessment tools for programming assignments. The main research methods applied 
during the study were literature review, expert interview and design science. Thesis verified the test 
automation and RPA research empirically by using available anonymized student submissions from 
the course DTEK2040: Web and Mobile Programming 2022 as case study material. As a design 
science artifact, a prototype capable of performing programming assignment submission assessment 
within the boundaries of set requirements was developed. 
In total, the thesis answered to two main research questions and five supporting research questions. 
The first question, Q1: How to support the assessment of web application programming 
assignments with test automation, was answered by first researching which testing levels and 
techniques would be most suitable in general to test programming assignment submissions. 
Afterwards research was done on test design and development to implement the prototype 
automation test cases. The study discovered that unit tests and system tests were the most popular 
levels to focus on due to the heavy emphasis on functional testing in other similar systems. This was 
further supported by the expert interview, based on which it appeared to be the case with the 
currently manually performed assessment work as well. Tied to the focus on functional testing, 
research also suggested that dynamic testing performed with black box techniques would be the 
most likely approach to designing test cases while static testing would assume the more minor role 
of complimenting them. 
The second question, Q2: How to support the assessment and feedback process of assignment 
assessing with RPA, was answered by first discovering what manual labor related to the assessment 
and feedback process there potentially is to automate and then extending the research to the 
pedagogical side of formulating feedback for programming assignments. To answer this question 
the study found out that the current processes of DTEK2040 contain elements such as (1) 
interacting with the learning platforms and workspaces, (2) file manipulation in terms of assessment 
record keeping and tracking, (3) manual testing of student submitted applications for structure, (4) 
functionalities and content, (5) gathering and representing quantitative feedback, (6) noting issues 
within submission for qualitative feedback and (7) maintaining feedback traceability were all 
potential targets for automation. As for the quality of the feedback itself, it was discussed that 
automation would be able to improve aspects such as shortening the feedback cycle and providing 
feedback even for the so called perfect submissions, which currently might be left with little 
85 
 
attention. A challenge for automation, however, was noted in formulating formative feedback that 
would appropriately target student’s knowledge of concepts, made mistakes, expansion possibilities 
of current skill sets, capabilities of self-assessment and understanding of what is being expected of 
him or her i.e., knowledge about task constraints. All these feedback aspects were determined to be 
important when it came to effectively supporting student’s growth in programming. 
The main research questions were verified, and answers expanded upon, by empirically designing 
and implementing a prototype that integrated RPA and test automation. The prototype was able to 
process batches of student submissions continuously and reliably for each exercise set in timespans 
averaging less than a minute per submission. A total execution of the automated pipeline consisted 
of record keeping RPA tasks, preparative tasks for the static and dynamic tests, automation tests and 
finally summarization of results. Unfortunately, the prototype was not able to extensively integrate 
the pedagogical aspects of feedback formulation as it only focused on providing score summaries 
and highlighting failed functionality implementations within a given subject-under-test by including 
relevant parts of test logs into the final summary file. The analysis of the prototype did, however, 
provide insight into how more quality feedback could be integrated into the system as future 
suggestions. Overall, the most notable expansion to the answers already presented was that in order 
to effectively support assignment assessment with test automation and RPA, the use of automation 
needs to be considered already when designing course assignments; building automated solutions 
on top of already existing case study proved that while the course contents lead students to certain 
kinds of programming implementations, the submission content variance still proved to be a 
challenge for the prototype. 
For future work two main aspects related to the created prototype were identified: (1) improving the 
test cases and (2) integrating a more extensive feedback ability. The first suggestion is to perform 
requirements engineering and iterative development processes with the personnel of DTEK2040, or 
any other course wishing to apply the prototype, to convert manual test cases into automated ones 
and take advantage of experiences in assessing submissions of that course for error-guessing tests. 
The second suggestion is that research be done onto possibilities of integrating static analysis tools 
into the current prototype to gather and provide more code quality feedback from submissions. This 
aspect most likely offers novel research potential in terms of attempting to integrate AI solutions for 
feedback formulation. Aside from improvements to an assessment support system, a third 
suggestion could also be proposed from another perspective: research on how a programming 
course is able to support and be ready to integrate automation tools for better learning quality. 
 
 
References 
[1] Ala-Mutka, K. M. (2005). A survey of automated assessment approaches for programming 
assignments. Computer Science Education, 15(2), 83-102. 
[2] Amman, P., & Offutt, J. (2017). Introduction to Software Testing (2nd edition ed.). Cambridge 
University Press. 
[3] Arora, A., & Sinha, M. (2012). Web Application Testing: A review on Techniques, Tools and 
State of Art. International Journal of Scientific & Engineering Research, 3(2). 
[4] Baresi, L., & Pezzè, M. (2006). An introduction to Software Testing. Electronic Notes in 
Theoretical Computer Science, 148, 89-111. 
[5] Bass, L., Clements, P., & Kazman, R. (2013). Software Architecture in Practice (3rd ed.). 
Addison-Wesley Professional. 
[6] Bruns, A., Kornstädt, A., & Wichmann, D. (2009). Web Application Tests with Selenium. IEEE 
Software, 26(5). 
[7] Daka, E., & Fraser, G. (2014). A Survey on Unit Testing Practices and problems. 2014 IEEE 
25th International Symposium on Software Reliability Engineering. IEEE. 
[8] Di Lucca, G. A., & Fasolino, A. R. (2006). Web Application Testing. In E. &. Mendes, Web 
Engineering (pp. 219–260). Springer. doi:https://doi.org/10.1007/3-540-28218-1_7 
[9] Doguc, O. (2020). Robot Process Automation (RPA) and Its Future. In U. Hacioglu, Handbook 
of Research on Strategic Fit and Design in Business Ecosystems (pp. 469-492). IGI Global. 
doi:https://doi.org/10.4018/978-1-7998-1125-1.ch021 
[10] Douce, C., Livingstone, D., & Orwell, J. (2005). Automatic test-based assessment of 
programming: A review. Journal on Educational Resources in Computing, 5(3), 4. 
[11] Eggert, R. J. (2010). Engineering Design (Second edition ed.). Meridian, Idaho: High Peak 
Press. 
[12] Garousi, V., & Elberzhager, F. (2017). Test Automation: Not Just for Test Execution. IEEE 
Software, 34, 90-96. doi:10.1109/MS.2017.34 
[13] Garousi, V., & Mäntylä, M. V. (2016). A systematic literature review of literature reviews in 
software testing. Information and Software Technology, 80, 195-216. 
[14] Gupta, S., & Dubey, S. K. (2012). Automatic Assessment of Programming assignment. 
Computer Science & Engineering An International Journal, 2(1). 
[15] Hambling, B., Samaroo, A., & Morgan, P. (2010). Software Testing - An ISTQB-ISEB 
Foundation Guide (2nd edition ed.). 
[16] Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 
77(1), 81-112. doi:10.3102/003465430298487 
[17] Hollingsworth, J. (1960). Automatic graders for programming classes. Communications of the 
ACM, 3(10), 528–529. 
 
 
[18] Ihantola, P., Ahoniemi, T., Karavirta, V., & Seppälä, O. (2010). Review of recent systems for 
automatic assessment of programming assignments. Koli Calling '10: Proceedings of the 10th 
Koli Calling International Conference on Computing Education Research, (pp. 86-93). 
[19] International Software Testing Qualifications Board. (2021). Certified Tester Foundation Level 
(CTFL) Syllabus. Version 2018 v3.1.1. 
[20] Jha, N., Prashar, D., & Nagpal, A. (2021). Combining Artificial Intelligence with Robotic 
Process Automation - An Intelligent Automation Approach. In K. R. Ahmed, A. E. Hassanien, 
& (eds), Deep Learning and Big Data for Intelligent Transportation. Studies in Computational 
Intelligence, vol 945. Springer, Cham. doi:https://doi.org/10.1007/978-3-030-65661-4_12 
[21] Keuning, H., Jeuring, J., & Heeren, B. (2018). A systematic literature review of automated 
feedback generation for programming exercises. 19, 1–43. 
[22] Lehto, J., & Papalitsas, J. (2022, April 20). Expert interview. (T. Salomaa, Interviewer) 
[23] Myers, G. J., Badgett, T., & Sandler, C. (2012). Art of Software Testing (3rd Edition) .  John 
Wiley & Sons. 
[24] Nicola, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: 
a model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 
199-218. doi:10.1080/03075070600572090 
[25] Nidhra, S., & Dondeti, J. (2012). Black box and white box testing techniques - A literature 
review. International Journal of Embedded Systems and Applications (IJESA), 2(2). 
[26] Ouadoud, M., Nejjari, A., Chkouri, M. Y., & El-Kadiri, K. E. (2018). Learning management 
system and the underlying learning theories. Proceedings of the 2nd Mediterranean Symposium 
on Smart City Applications (pp. 732–744). Springer. 
[27] Paiva, J. C., & Leal, J. P. (2022). Automated Assessment in Computer Science Education: A 
State-of-the-Art Review. ACM Transactions on Computing Education. 
[28] Polo, M., Reales, M., & Ebert, C. (2013). Test Automation. Software, 30, 84-89. 
[29] Rahman, K. A., & Nordin, M. J. (2007). A review on the static analysis approach in the 
automated programming assessment systems. National conference on programming.  
[30] Ricca, F., & Stocco, A. (2021). Web Test Automation: Insights from the Grey Literature. 
SOFSEM 2021: Theory and Practice of Computer Science: 47th International Conference on 
Current Trends in Theory and Practice of Computer Science, (pp. 472–485). 
doi:https://doi.org/10.1007/978-3-030-67731-2_35 
[31] Robocorp. (2022). Basic concepts of Robot Framework. Retrieved from 
https://robocorp.com/docs/languages-and-frameworks/robot-framework/basics 
[32] Robot Framework ry. (2022). Robot Framework User Guide. Retrieved from 
https://robotframework.org/robotframework/latest/RobotFrameworkUserGuide.html 
[33] Romli, R., Sulaiman, S., & Zamli, K. Z. (2010). Automatic programming assessment and test 
data generation: a review on its approaches. Int. Symp. in Information Technology, 1186-1192. 
 
 
[34] Stepien, B., & Peyton, L. (2009). Integration Testing of Web Applications and Databases Using 
TTCN-3. Innovation in an Open World, 4th International Conference 2009. Ottawa, Canada: 
MCETECH . 
[35] Thummalapenta, S., Sinha, S., Singhania, N., & Chandra, S. (2012). Automating test 
automation. 2012 34th International Conference on Software Engineering (ICSE), 881-891. 
doi:10.1109/ICSE.2012.6227131 
[36] Umar, M., & Chen, Z. (2019). A Study of Automated Software Testing: Automation Tools and 
Frameworks. International Journal of Computer Science Engineering (IJCSE), 8, 217-225. 
doi:https://doi.org/10.5281/zenodo.3924795 
[37] University of Turku. (2022). Study Guide 2020-2022: DTEK2040 Web and Mobile 
Programming, 5 ECTS. Retrieved from 
https://opas.peppi.utu.fi/en/course/DTEK2040/8780?period=2020-2022 
[38] University of Turku. (2022). Web and mobile programming 2022. Retrieved from 
https://tech.utugit.fi/education/webprog/web-material/ 
 
A-1 
 
Appendix A: DTEK2040 assessment process automation potential 
Table A1: Automation potential within steps extracted from the manual assessment process. 
 Step 
Manual tasks for 
assessor 
Automation potential Automation risks 
1 Submit a solution None   
2 
Fetch a 
submission  
1) Log into learning 
platform 
2) Navigate to course 
workspace 
3) Navigate to 
assignments and 
choose submission(s) 
to download 
4) Assign the 
submission to yourself 
Automation of tasks such 
as logging in, navigating 
and fetching content are 
among the most common 
website related RPA. The 
whole chain can be 
automated. 
- The platform, in this case 
Moodle, may experience 
updates that break the 
robotic process; simple 
modifications to website 
element attributes such as 
id may cause navigation 
failures. 
- The platform may not 
allow RPA interaction, i.e. it 
attempts to block actions 
with the use of CAPTCHA 
or setting limits to requests. 
 
3 
Prepare the 
fetched 
submission for 
assessment 
1a) Extract the 
compressed 
submission package, 
and/or 
1b) Fetch submission 
from GitLab based on 
the link provided in a 
text file. 
2) Set up environment 
and run the dynamic 
parts of the 
submission 
3) Open-source files 
with proper tools for 
static analysis  
- Recursive extraction of 
compressed packages is 
automatable. 
- Environmental set up 
can be performed with 
the help of technology 
such as Docker, which in 
fact increases security of 
assessing dynamic parts 
of the submission. 
- Automated static 
assessment requires less 
tools from preparation 
standpoint 
- While the submissions are 
instructed to be returned in 
a certain way, there may 
still be variance between 
the contents and structure 
of different submissions for 
same assignment. 
- Relying on container 
technology also requires 
maintenance and 
knowledge of said 
technology. 
 
4 
Perform static 
assessment 
1) Assess how the 
source code 
corresponds to the 
course subject 
contents 
2) Search source 
code for known “anti-
patterns” that 
commonly present 
themselves 
3) Keep notes for 
scoring and feedback 
- Automation can make 
static analysis and 
assessment more 
consistent and less 
reliant on assessor’s 
experience. 
- Experience-based 
observations of anti-
patterns per assignment 
may prove to be easily 
automated for notating / 
flagging. 
- If any assignment 
requirements are clearly 
statically verifiable, they 
are also simple to score 
and note for feedback. 
- Scripting the rules for 
more intermediate and 
advanced level static 
analysis may prove to be 
challenging and not 
necessarily be a good 
return on investment. 
- Providing in-depth 
feedback in fully automated 
manner may prove to be 
difficult without advanced 
features such as 
introduction of AI solutions. 
(to be continued) 
A-2 
 
Table A1 (continues) 
5 
Perform dynamic 
assessment 
1) Open / navigate to 
submission page / site 
2) Test and assess 
required functionalities 
3) Keep notes for 
scoring and feedback 
- The assignments often 
lead to certain 
expectable solutions. 
- Functional assignment 
requirements can be 
verified with test 
automation. 
- If any assignment 
requirements are clearly 
dynamically verifiable, 
they are also simple to 
score and note for 
feedback. 
- To truly achieve consistent 
behaviour from automated 
test cases, additional 
instruction for assignments 
may be needed. 
- Automated tests need to 
be maintained as course 
contents, such as 
assignments or technology 
stacks, are updated. 
6 Form feedback 
1) Gather a total score 
based on static and 
dynamic assessment 
2) Provide written 
feedback based on 
errors and missed 
points 
- This step contains 
administrative tasks such 
as forming a summary of 
results from previous 
steps; this is what RPA is 
generally considered to 
target most often in a 
business sense as well. 
- The summarized contents 
may be likely to require 
manual validation before 
continuing forward from 
here. 
7 
Post results to 
learning platform 
1) Make sure the just 
assessed submission 
is organized and filed 
as assessed 
2) Mark score for the 
student submission in 
learning platform 
3) Provide written 
feedback (if any) in 
learning platform 
- Organizing the 
assessed workloads is 
something that helps 
manual working; 
automation may get rid of 
this task all-together or 
take a more archiving 
approach to it. 
- Automating website 
interactions to fill in 
submission scores and 
feedback from a source 
is quite straight-forward. 
- See platform related risks 
from step number two. 
 
B-1 
 
Appendix B: General requirements for DTEK2040 automated 
assessment system 
Table B1: General system requirements built based on theory sections. 
General requirements for DTEK2040 automated assessment system 
ID Requirement Metric 
GR-1 System must save human workhours 
in assessment and related 
bookkeeping tasks 
Time spent per student submission. With batch runs the total 
time may simply be divided by the number of processed 
submissions. 
Target: Processing takes less than 5 minutes per submission 
GR-2 System must be maintainable by 
course personnel 
Documentation and instruction coverage. 
Targets:  
1) System contains documented and explanative general 
architectural overview; 
2) test case descriptions and custom keywords are 100% 
commented; 
3) custom library methods are 100% commented. 
GR-3 Human must be kept in the 
assessment loop 
Assessment transparency and result modifiability. 
Targets:  
1) Course personnel can access the final summary document 
and change individual scores if needed; 
2) summary provides reasoning for individual task scores. 
GR-4 Automation flow begins from the work 
step when packaged submissions 
have been fetched from course 
platform 
Coverage of assessment process steps identified for 
DTEK2040. 
Target: System can successfully cover step number 3: 
Prepare the fetched submission for assessment. 
GR-5 Automation flow covers preparation, 
numerical assessment, discovering 
issues for feedback and generation of 
summary report from the processed 
submissions 
Coverage of assessment process steps identified for 
DTEK2040. 
Target: System can successfully cover steps number 3 – 6 
from technical perspective. 
 
C-1 
 
Appendix C: Prototype directory and file structure tree 
 
Figure C1: Prototype system file and directory structure. 
 
The image represents files and directories found within the prototype system as they are by default. 
When verification of internal structure by the prototype is executed during pipeline step 1, the 
missing files or directories will have a certain effect to the continuation of the pipeline. Colour 
codes for files and directories in this image indicate how the system will act if the respective file or 
directory is not found when pipeline is executing the verification during step 1.  
D-1 
 
Appendix D: Pipeline execution times 
Table D1: Pipeline execution times in seconds. 
  Execution iteration   
 
 
1 2 3 4 5 6 7 8 9 10 average average per submission 
E
x
e
rc
is
e
 0
 total 2974 2907 2884 2894 2903 3002 3005 3004 3005 3005 2958,3 30,82 
static 336 280 269 271 274 309 309 309 309 309 297,5 3,10 
dynamic 2498 2494 2487 2491 2501 2566 2567 2565 2568 2567 2530,4 26,36 
other processes 140 133 128 132 128 127 129 130 128 129 130,4 1,36 
E
x
e
rc
is
e
 1
 total 3520 3540 3578 3558 3558 3521 3529 3554 3536 3561 3545,5 34,09 
static 194 182 189 185 182 181 181 182 181 182 183,9 1,77 
dynamic 3127 3172 3195 3178 3191 3154 3164 3185 3170 3194 3173 30,51 
other processes 199 186 194 195 185 186 184 187 185 185 188,6 1,81 
E
x
e
rc
is
e
 2
 total 3658 3645 3568 3575 3651 3722 3616 3727 3706 3641 3650,9 53,69 
static 583 584 574 580 583 577 580 570 567 589 578,7 8,51 
dynamic 2917 2915 2850 2853 2922 3003 2896 3016 2999 2911 2928,2 43,06 
other processes 158 146 144 142 146 142 140 141 140 141 144 2,12 
E
x
e
rc
is
e
 3
 
total 1894 1961 1893 1925 1962 1973 2143 2080 2136 2155 2012,2 37,26 
dynamic 1769 1844 1776 1803 1844 1857 2026 1962 2018 2036 1893,5 35,06 
other processes 125 117 117 122 118 116 117 118 118 119 118,7 2,20 
 
Table displays recorded execution times for all four exercises. Each exercise pipeline was executed ten times. Recording was done so that the orchestrating 
script run_pipeline.sh saved a running time value at the beginning and at the end of each step and before fully completing the pipeline, calculated the (1) total 
execution time, (2) static tests execution time and (3) dynamic tests execution time. These values were logged and appended into a log file automatically. 
Submission numbers used to calculate average per submission are: 96 for exercise 0, 104 for exercise 1, 68 for exercise 2 and 54 for exercise 3.
E-1 
 
Appendix E: Example summary template 
 
Figure E1: Example representing summary template with results from ex2. 
F-1 
 
Appendix F: Common keywords 
Code F1: common_keywords.resource. 
# Common keywords contain keywords that are used or have potential to be used 
# by multiple robot files now or in the future development. 
 
*** Settings *** 
Library  ..${/}libraries${/}MyLibrary.py 
Variables  ..${/}variables${/}common_variables.py 
 
*** Keywords *** 
# Opens the summary report file using common variables. 
# Using the keyword does not require inputs but the values for report path and 
# summary file name in common_variables.py need to respect the project structure. 
# File needs to be found for the keyword to PASS. 
# @output: summary file is opened with id 'doc01' 
Open Report File 
  Open Excel Document  ${REL_REPORTS_PATH}${/}${SUMMARY_FILE_NAME}  doc01 
 
# Iterates through a list of files and compares their line count. 
# @input:   ${files}        a list of file paths 
# @returns: ${result_file}  file path of the file containing most lines 
Get File With Most Lines 
  [Arguments]  ${files} 
  ${line_count}  Set Variable  ${0} 
  ${result_file}  Set Variable  ${files}[0] 
  FOR  ${file}  IN  @{files} 
    ${file_content}  Get File  ${file} 
    ${new_line_count}  Get Line Count  ${file_content} 
    IF  ${line_count} < ${new_line_count} 
      ${line_count}  Set Variable  ${new_line_count} 
      ${result_file}  Set Variable  ${file} 
    END 
  END 
  [Return]  ${result_file} 
 
# Searches a directory by using a keyword defined in MyLibrary. 
# The search returns all .html file paths from that directory. 
# From the returned list of file paths, the html file with most lines is chosen. 
# @input:   ${submission_dir}     directory path containing the wanted html file 
# @returns: ${html_file}        file path of the html file containing most lines 
Search Local HTML Main Page Location 
  [Arguments]  ${submission_dir} 
  ${html_file_locations}  Search File With Extension  ${submission_dir}  html 
  ${html_file}  Get File With Most Lines  ${html_file_locations} 
  [Return]  ${html_file} 
 
# Respects the current design of results summary sheet and inserts the given 
submission id 
# into the first column of the template. 
# @input: ${student_row}    row number of the submission in summary sheet, 
#         ${id}             id given to the submission 
# @output: writes id into the summary sheet cell at column=1, row=given 
(to be continued) 
F-2 
 
Code F1 (continues) 
Insert Submission Id To Results Summary Sheet 
  [Arguments]  ${student_row}  ${id} 
  Write Excel Cell  row_num=${student_row}  col_num=1  value=${id} 
 
# Searches in a given directory for files with a given extension that contain given 
keywords. 
# All the declared keywords must be found from the contents of a file in order to 
determine that 
# the file be returned. If a even a single word is not found from contents, the 
file will be discarded. 
# Keywords are treated as patterns. This means, for example, that given an input 
keyword 'light', 
# a match is found from contents including 'lightning', 'blight', 'lighter', ... 
# Patterns are also treated case insensitively. 
# @input:     ${parent_search_dir}    directory path to search files from 
#             ${required_keywords}    a list of keywords to be pattern matched 
within file contents 
#             ${file_extension}       extension to determine what files should be 
considered while searching, 
#                                     example: js html 
# @returns:   ${result_files}         a list of files that have positive matches 
from each and every input keyword 
Find Files With Content Containing Keywords 
  [Arguments]  ${parent_search_dir}  ${required_keywords}  ${file_extension} 
  ${keywords_amount}  Get Length  ${required_keywords} 
  @{result_files}  Create List 
  ${files}  Search File With Extension  ${parent_search_dir}  ${file_extension} 
  FOR  ${file}  IN  @{files} 
    ${total_found}  Set Variable  ${0} 
    ${file_contents}  Get File  ${file} 
    FOR  ${word}  IN  @{required_keywords} 
      ${matches_found}  Get Regexp Matches  ${file_contents}  (?i:.*?${word}*.*?) 
      ${amount_found}  Get Length  ${matches_found} 
      IF  ${amount_found} == ${0} 
        BREAK 
      ELSE 
        ${total_found}  Evaluate  ${total_found}+1 
      END 
      IF  ${total_found} == ${keywords_amount} 
        Append To List  ${result_files}  ${file} 
      END 
    END 
  END 
  [Return]  ${result_files} 
 
# Uses 'Get Modified Time' from OperatingSystem library to compare which of the 
input files 
# is most recently modified. Keyword is used with 'epoch' option which means the 
modification 
# time of a file is returned in seconds after the UNIX epoch. 
# Use of this keyword required the robot file to import the OperatingSystem 
library. 
# @input:     ${list_of_files} a list of file paths to compare 
# @returns:   ${recent_file} file path of the most recently modified file 
(to be continued) 
F-3 
 
Code F1 (continues) 
Determine Most Recently Modified 
  [Arguments]  ${list_of_files} 
  ${base_time}  Set Variable  ${0} 
  ${recent_file}  Set Variable  None 
  FOR  ${file}  IN  @{list_of_files} 
    ${time}  Get Modified Time  ${file}  epoch 
    IF  ${time} > ${base_time} 
      ${base_time}  Set Variable  ${time} 
      ${recent_file}  Set Variable  ${file} 
    END 
  END 
  [Return]  ${recent_file} 
 
# See 'Find Files With Content Containing Keywords' and 
# 'Determine Most Recently Modified' keywords above. 
# Uses the results of afore mentioned keyword as input for the latter. 
# @input:   ${location}             directory path to search files from 
#           ${keywords}             a list of keywords to be pattern matched within 
file contents 
#           ${file_extension}       extension to determine what files should be 
considered while searching, 
#                                   example: js html 
# @returns:   ${most_recent_file}   file path of the most recently modified file 
Find Most Recent File Based On Keywords 
  [Arguments]  ${location}  ${keywords}  ${file_extension} 
  ${files}  Find Files With Content Containing 
Keywords  ${location}  ${keywords}  ${file_extension} 
  ${most_recent_file}  Determine Most Recently Modified  ${files} 
  [Return]  ${most_recent_file} 
 
# See 'Find Files With Content Containing Keywords' and 
# 'Determine Most Recently Modified' keywords above. 
# Uses the results of afore mentioned keyword as input for the latter. 
# Further filters the results by only considering files that fit the name 
requirement. 
# @input:   ${location}             directory path to search files from 
#           ${keywords}             a list of keywords to be pattern matched within 
file contents 
#           ${name}                 expected file name without an extension 
#           ${file_extension}       extension to determine what files should be 
considered while searching, 
#                                   example: js html 
# @returns:   ${most_recent_file}   file path of the most recently modified file 
Find Most Recent File Based On Keywords And Name 
  [Arguments]  ${location}  ${keywords}  ${name}  ${file_extension} 
  ${all_files}  Find Files With Content Containing 
Keywords  ${location}  ${keywords}  ${file_extension} 
  @{files}  Create List 
  FOR  ${file}  IN  @{all_files} 
    ${directory}  ${file_name}  Split String From Right  ${file}  /  1 
    ${file_name}  Convert To Lower Case  ${file_name} 
    IF  '${file_name}' == '${name}.${file_extension}' 
      Append To List  ${files}  ${file} 
     
(to be continued) 
F-4 
 
Code F1 (continues) 
    END 
  END 
  ${most_recent_file}  Determine Most Recently Modified  ${files} 
  [Return]  ${most_recent_file} 
 
# The keyword takes parent element:child element key:value pairs as input and goes 
through 
# each key to verify that child elements are as dictated by web standards. 
# If at any point a child element is found to be incorrectly under a certain parent 
element, 
# the keyword determines FAIL status and acts as a test case would when keyword 
fails. 
# This keyword can be used together with 'Run Keyword And Return Status' to catch 
the status 
# as a variable without failing a task / test. 
# @input:   ${parent_child_dict}    a dictionary where keys are parent elements and 
values the child elements found under that parent 
# @output:  PASS / FAIL status 
Verify Table Element Hierarchy 
  [Arguments]  ${parent_child_dict} 
  FOR  ${relations}  IN  &{parent_child_dict} 
    IF  '${relations}[0]' == '<table>' 
      Should Not Contain  ${relations}[1]  <table> 
    ELSE IF  '${relations}[0]' == '<caption>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tr>  <th> 
 <td>  <tfoot> 
    ELSE IF  '${relations}[0]' == '<colgroup>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tr>  <th> 
 <td>  <tfoot> 
    ELSE IF  '${relations}[0]' == '<thead>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <td>  <th> 
 <tfoot> 
      Should Contain  ${relations}[1]  <tr> 
    ELSE IF  '${relations}[0]' == '<tbody>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tfoot> 
    ELSE IF  '${relations}[0]' == '<tr>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody> 
    ELSE IF  '${relations}[0]' == '<th>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tr> 
    ELSE IF  '${relations}[0]' == '<td>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tr> 
    ELSE IF  '${relations}[0]' == '<tfoot>' 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <td>  <th> 
 <tfoot> 
      Should Contain  ${relations}[1]  <tr> 
 
(to be continued) 
F-5 
 
Code F1 (continues) 
    ELSE 
      Should Not Contain 
Any  ${relations}[1]  <table>  <caption>  <colgroup>  <thead>  <tbody>  <tr>  <th> 
 <td>  <tfoot> 
    END 
  END 
 
# Uses regular expression to verify that a table element does not contain other 
than hierarchically 
# correct child elements. The keyword can take a list of found table elements. This 
keyword does not produce 
# a PASS / FAIL status but rather the number of verifiably correct tables. 
# @input:     ${table_elements}       a list of table elements 
# @output:    ${proper_table_amount}  number of correctly constructed table 
elements 
Verify Table Elements 
  [Arguments]  ${table_elements} 
  ${table_elem_regex}  Set 
Variable  table|caption|colgroup|thead|tbody|tr|th|td|tfoot 
  ${proper_table_amount}  Set Variable  ${0} 
  FOR  ${table}  IN  @{table_elements} 
    @{elements}  Split String  ${table}  ${SPACE} 
    FOR  ${element}  IN  @{elements} 
      ${is_table_element}  Run Keyword And Return Status 
      ...  Should Match Regexp  ${element}  ${table_elem_regex} 
      IF  '${is_table_element}' != 'PASS' 
        Remove Values From List  ${elements}  ${element} 
      END 
    END 
    &{relations}  Parent Child Relations From List  ${elements} 
    ${verification_result}  Run Keyword And Return Status 
    ...  Verify Table Element Hierarchy  ${relations} 
    IF  ${verification_result} 
      ${proper_table_amount}  Evaluate  ${proper_table_amount} + 1 
    END 
  END 
  [Return]  ${proper_table_amount} 
G-1 
 
Appendix G: Custom library 
Code G1: MyLibrary.py. 
 
# Library to support various parsing tasks. 
# Each method is usable as a keyword when imported in a Robot file. 
# Robot framework logic: 
# - Method name translates into a keyword when underscores are left out. 
#   For example: def prepare_soup() in this library ==> 'Prepare Soup' in robot 
file 
# - When used as robot keywords after library is imported, 
#   keywords require input values unless a default value is declared for an input 
variable in this library 
# - Return in method means the keyword will return content when used 
# 
# Do note that whenever html parser is used on source content, a lot of syntactic 
errors present in the 
# raw source code will be corrected by the html parses in the process. If the 
purpose is to receive 
# raw content, regular expressions are recommended. 
 
# coding=utf-8 
import os 
import re 
import fnmatch 
import shutil 
import errno 
from bs4 import BeautifulSoup, Doctype 
 
class MyLibrary: 
    # While this will be available as a keyword when imported, 
    # this is not meant to be used as such. 
    # Used as a support method within this class to parse content 
    # with Beautiful Soup. 
    def prepare_soup(self, src_file, parser): 
        ''' 
        :param src_file: file path 
        :param parser: parser to use for parsing content from src_file; current 
prototype includes support for html5lib and lxml 
        ''' 
        with open(src_file) as src: 
            soup = BeautifulSoup(src, parser) 
        return soup 
 
    # Searches a given path contents for files matching a given file extension. 
    # The path given as input is a starting point, will also search from all sub-
directories. 
    def search_file_with_extension(self, path, file_extension): 
        ''' 
        Returns all file locations found with specified extension 
        from the given path. 
        :param path: Path that will be searched within. Must end with '/'. 
        :param file_extension: The extension (type) of the file being searched for. 
            Must in form without '.' i.e. 'html' or 'css', 
                     (to be continued) 
G-2 
 
Code G1 (continues) 
NOT '.html' or '.css'. 
        ''' 
        file_extension = str.lower('*.' + file_extension) 
        file_locations = [] 
        file_location = '' 
        for root, dirs, files in os.walk(path, topdown=True): 
            for name in files: 
                if fnmatch.fnmatch(str.lower(name), file_extension): 
                    file_location = os.path.join(root, name) 
                    file_locations.append(file_location) 
        return file_locations 
 
    # Parses the source contents for all html elements containing the attribute 
'id'. 
    # Found elements form a bs4 result set (a list) of bs4 tags. 
    def find_all_ids_from_html(self, src, parser='html5lib'): 
        ''' 
        :param src: file path 
        :param parser: html5lib by default; if provided as input, 
                        make sure another parser library is supported. 
        ''' 
        soup = self.prepare_soup(src, parser) 
        found_ids = soup.find_all(id=True) 
        return found_ids 
 
    # Searches for specific element from the source contents. 
    # Search ends as soon as the very first matching instance is found. 
    def find_element_from_html(self, src, elem, parser='html5lib'): 
        ''' 
        :param src: file path 
        :param parser: html5lib by default; if provided as input, 
                        make sure another parser library is supported. 
        ''' 
        soup = self.prepare_soup(src, parser) 
        found_element = soup.find(elem) 
        return found_element 
     
    # See find_element_from_html(); this one does the same but 
    # returns all found elements as a list of bs4 tags. 
    def find_elements_from_html(self, src, elem, parser='html5lib'): 
        ''' 
        :param src: file path 
        :param parser: html5lib by default; if provided as input, 
                        make sure another parser library is supported. 
        ''' 
        soup = self.prepare_soup(src, parser) 
        found_elements = soup.find_all(elem) 
        return found_elements 
     
    # Finds elements that have a specific attribute. Attribute values are of no 
concern. 
    # Returns all matching elements as tags. 
    def find_elements_with_attribute(self, src, elem_tag, attr, parser='html5lib'): 
         
(to be continued) 
G-3 
 
Code G1 (continues) 
''' 
        :param src: file path 
        :param elem_tag: html element tag to search for, i.e. a, table, li, ul... 
        :param attr: attribute the elements should contain, i.e. name, id, class... 
        :param parser: html5lib by default; if provided as input, 
                        make sure another parser library is supported. 
        ''' 
        soup = self.prepare_soup(src, parser) 
        found_elements = soup.find_all(elem_tag, {attr:True}) 
        return found_elements 
     
    # Searches for and lists all child elements of a given bs4 tag. 
    def find_immediate_child_elements(self, src): 
        ''' 
        :param src: a bs4 html element tag 
        ''' 
        children = [child for child in src if child.name != None] 
        return children 
 
    # Searches for elements that have a given class. 
    def find_elements_by_class(self, src, elem, cls, parser='html5lib'): 
        ''' 
        :param src: file path 
        :param elem: html element tag to look for 
        :param cls: class that element should contain 
        ''' 
        soup = self.prepare_soup(src, parser) 
        elems = soup.select(f'{elem}.{cls}') 
        return elems 
 
    # Finds elements from source contents. Does not fix the raw content as html 
parsers do 
    # due to not using Beautiful Soup and html parsers. 
    # Returns a multi-line string containing child elements from all the found 
matches. 
    def find_elements_from_raw_source(self, src, elem): 
        ''' 
        :param src: file path 
        :param elem: element to look for; case is ignored 
        ''' 
        element_results = [] 
        with open(src) as src_open: 
            source_raw = src_open.read() 
        regex = rf'(<{elem}.*?>.*?<\/{elem}>)' 
        elems = re.findall(regex, source_raw, re.IGNORECASE | re.DOTALL) 
        regex_clean_elements = r'<([^\/]\s*[a-zA-Z]*).*?>' 
        regex_clean_content = r'>(.*?)<' 
        for elem in elems: 
            elem = re.sub(regex_clean_elements, r'<\1>', elem, flags=re.IGNORECASE 
| re.DOTALL) 
            elem = re.sub(r'<\s*([aA]).*?>', r'<\1>', elem, flags=re.IGNORECASE    
| re.DOTALL) 
            elem = re.sub(regex_clean_content, '> <', elem, flags=re.IGNORECASE    
| re.DOTALL) 
(to be continued) 
G-4 
 
Code G1 (continues) 
element_results.append(elem) 
        return element_results 
 
    # Expects a list of elements contained within html ul/ol/menu element. 
    # Forms a dictionary based on input list contents so that 
    # each key in the dictionary has child elements as values. 
    def parent_child_relations_from_list(self, str_list): 
        ''' 
        :param str_list:    a list of elements contained within a single html ul / 
ol / menu element; 
                            elements must be in string format. 
                            A proper input can be got from 
find_elements_from_raw_source(), 
                            for example. 
        ''' 
        length = len(str_list) 
        parent_child_dict = {} 
        met_parents = [] 
        for i in range(length): 
            if str_list[i][1] != '/': 
                if i-1 >= 0: 
                    if met_parents[0] in parent_child_dict: 
                        parent_child_dict[met_parents[0]].append(str_list[i]) 
                    else: 
                        parent_child_dict[met_parents[0]] = [str_list[i]] 
                met_parents.insert(0, str_list[i]) 
            else: 
                met_parents.pop(0) 
        return parent_child_dict 
 
    # Copies source directory contents recursively to destination. 
    def copy_directory_contents(self, src, dst): 
        ''' 
        :param src:    source directory path 
        :param dst:    destination path 
        ''' 
        try: 
            if os.path.exists(dst): 
                shutil.rmtree(dst) 
            shutil.copytree(src, dst) 
        except OSError as exc: 
            if exc.errno in (errno.ENOTDIR, errno.EINVAL): 
                shutil.copy(src, dst) 
            else: 
                raise