Hybrid Automated Machine Learning System for Big Data

Ighoroje, Lamogha (2018) Hybrid Automated Machine Learning System for Big Data. Doctoral thesis, University of Huddersfield.

[+][-]

Abstract

A lot of machine learning (ML) models and algorithms exist and in designing classification systems, it is often a challenge looking for and selecting the best performing ML algorithm(s) to use for a dataset in a short period of time. Often, one must learn thor-oughly about the data set structure and content, decide whether to use a supervised, semi-supervised or an unsupervised learning strategy, and then investigate, select or design via trial and error a classification or clustering algorithm that would work most accurately for that specific dataset. This can be quite a time consuming and tedious process. Additionally, a classification algorithm may not perform very well with a dataset as compared to using a clustering algorithm. Meta-learning (learning to learn) and automatic ML (autoML) are data mining-based formalisms for modelling evolving conventional ML functions and toolkit systems. The concept of modelling a decision tree-based combination of both formalisms as a Hybrid-AutoML toolkit extends that of traditional complex autoML systems.

In hybrid-autoML, single or multiple predictive models are built by combining a three-layered decision learning architecture for automatic learning mode and model selection, by engaging formal-isms for selecting from a variety of supervised or unsupervised ML algorithms and generic meta information obtained from varying multi-datasets. The work presented in this thesis aims to study, conceptualize, design and develop this hybrid-autoML toolkit. By extending in the simplest form, some existing methodologies for the model training aspect of autoML systems. The theoretical and experimental development focuses on the extension of autoWeka and use of existing meta-learning, algorithm selection and deci-sion tree concepts. It addresses the issue of efficient ML mode (supervised or unsupervised) and model selection for varying multi-datasets, learning methods representations of practical alternative use cases and structuring of layered decision ML un-folding, and algorithms for constructing the unfolding. The im-plementation aims to develop tools for hybrid-autoML based model visualization or evaluation, use case simulations and analysis on single or multi varying datasets. An open source tool called hybrid-autoML has been developed to support these functionali-ties. Hybrid-autoML provides a user-friendly graphical interface that facilitates single or multi varying datasets entry, sup-ports automatic learning mode or strategy selection, automatic model selection on single or multi-varying datasets, supports predictive testing, and allows the automatic visualization and use of a set of analytical tools for model evaluation. It is highly extensible and saves a lot of time.

Information

URI:

https://eprints.hud.ac.uk/id/eprint/35048

URI:

https://eprints.hud.ac.uk/id/eprint/35048

URI:

https://eprints.hud.ac.uk/id/eprint/35048

URI:

https://eprints.hud.ac.uk/id/eprint/35048

URI:

https://eprints.hud.ac.uk/id/eprint/35048

URI: