Modern Data Mining for Software Engineer, A Machine Learning PaaS Review
Loponen, Marko (2020-11-18)
Modern Data Mining for Software Engineer, A Machine Learning PaaS Review
Loponen, Marko
(18.11.2020)
Julkaisu on tekijänoikeussäännösten alainen. Teosta voi lukea ja tulostaa henkilökohtaista käyttöä varten. Käyttö kaupallisiin tarkoituksiin on kielletty.
avoin
Julkaisun pysyvä osoite on:
https://urn.fi/URN:NBN:fi-fe2020113098802
https://urn.fi/URN:NBN:fi-fe2020113098802
Tiivistelmä
Using data mining methods to produce information from the data has been proven to be valuable for individuals and society. Evolution of technology has made it possible to use complicated data mining methods in different applications and systems to achieve these valuable results. However, there are challenges in data-driven projects which can affect people either directly or indirectly. The vast amount of data is collected and processed frequently to enable the functionality of many modern applications. Cloud-based platforms have been developed to aid in the development and maintenance of data-driven projects. The field of Information Technology (IT) and data-driven projects have become complex, and they require additional attention compared to standard software development.
On this thesis, a literature review is conducted to study the existing industry methods and practices, to define the used terms, and describe the relevant data mining process models. We analyze the industry to find out the factors impacting the evolution of tools and platforms, and the roles of project members. Furthermore, a hands-on review is done on typical machine learning Platforms-as-a-Service (PaaS) with an example case, and heuristics are created to aid in choosing a machine learning platform. The results of this thesis provide knowledge and understanding for the software developers and project managers who are part of these data-driven projects without the in-depth knowledge of data science.
In this study, we found out that it is necessary to have a valid process model or methodology, precise roles, and versatile tools or platforms when developing data-driven applications. Each of these elements affects other elements in some way. We noticed that traditional data mining process models are insufficient in the modern agile software development. Nevertheless, they can provide valuable insights and understanding about how to handle the data in the correct way. The cloud-based platforms aid in these data-driven projects to enable the development of complicated machine learning projects without the expertise of either a data scientist or a software developer. The platforms are versatile and easy to use. However, developing functionalities and predictive models which the developer does not understand can be seen as bad practice, and cause harm in the future.
On this thesis, a literature review is conducted to study the existing industry methods and practices, to define the used terms, and describe the relevant data mining process models. We analyze the industry to find out the factors impacting the evolution of tools and platforms, and the roles of project members. Furthermore, a hands-on review is done on typical machine learning Platforms-as-a-Service (PaaS) with an example case, and heuristics are created to aid in choosing a machine learning platform. The results of this thesis provide knowledge and understanding for the software developers and project managers who are part of these data-driven projects without the in-depth knowledge of data science.
In this study, we found out that it is necessary to have a valid process model or methodology, precise roles, and versatile tools or platforms when developing data-driven applications. Each of these elements affects other elements in some way. We noticed that traditional data mining process models are insufficient in the modern agile software development. Nevertheless, they can provide valuable insights and understanding about how to handle the data in the correct way. The cloud-based platforms aid in these data-driven projects to enable the development of complicated machine learning projects without the expertise of either a data scientist or a software developer. The platforms are versatile and easy to use. However, developing functionalities and predictive models which the developer does not understand can be seen as bad practice, and cause harm in the future.