Baadel, Said (2019) A Machine Learning Clustering Technique for Autism Screening and Other Applications. Doctoral thesis, University of Huddersfield.

Clustering is one of the challenging machine learning techniques due to its unsupervised learning nature. While many clustering algorithms constrain objects to single clusters, K-means overlapping partitioning clustering based methods assign objects to multiple clusters by relaxing the constraints and allowing objects to belong to more than one cluster to better fit hidden structures in the data. However, when datasets contain outliers, they can significantly influence the mean distance of the data objects to their respective clusters, which is a drawback. Therefore, most researchers address this problem by simply removing the outliers. This can be problematic especially in applications such as autism screening, fraud detection, and cybersecurity attacks among others.

In this thesis, an alternative solution to this problem is proposed that captures outliers and stores them on the fly within a new cluster, instead of discarding. The new algorithm is named Outlier-based Multi-Cluster Overlapping K-Means Extension (OMCOKE). The algorithm addresses an issue previously ignored by other work in overlapping clustering and therefore benefits various stakeholders as these outliers could have real-life applications. The proposed solution has been evaluated on a crucial behavioural science problem called screening of autistic traits to improve the performance of detecting autism spectrum disorder (ASD) traits and reduce features redundancy. OMCOKE was integrated as a learning algorithm with a semi-supervised ML framework approach called Clustering based Autistic Trait Classification (CATC) in Chapter 5. Based on the experimental results obtained on real datasets related to autism screening OMCOKE was able to identify potential autism cases based on their similarity traits as opposed to conventional scoring functions used by ASD screening tools. Moreover, the empirical results obtained by OMCOKE on different datasets involving children, adolescents, and adults were compared to other results produced by common ML techniques. The results showed that our semi-supervised framework offers models with higher predictive accuracy, sensitivity, and specificity rates than those of other intelligent classification approaches such as Artificial Neural Network (ANN), Random Forest, and Random Trees, and Rule Induction. These models are useful since they are exploited by diagnosticians and other stakeholders involved in ASD screening besides highlighting the most influential features. The chapters in this thesis have been disseminated or are under review in various reputable journals and in refereed conference proceedings.

FINAL THESIS - Baadel.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview


Downloads per month over past year

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email