Saad, Elmak (2018) Optimizing E-Management Using Web Data Mining. Doctoral thesis, University of Huddersfield.

Today, one of the biggest challenges that E-management systems face is the explosive growth of operating data and to use this data to enhance services. Web usage mining has emerged as an important technique to provide useful management information from user's Web data. One of the areas where such information is needed is the Web-based academic digital libraries. A digital library (D-library) is an information resource system to store resources in digital format and provide access to users through the network. Academic libraries offer a huge amount of information resources, these information resources overwhelm students and makes it difficult for them to access to relevant information. Proposed solutions to alleviate this issue emphasize the need to build Web recommender systems that make it possible to offer each student with a list of resources that they would be interested in. Collaborative filtering is the most successful technique used to offer recommendations to users. Collaborative filtering provides recommendations according to the user relevance feedback that tells the system their preferences. Most recent work on D-library recommender systems uses explicit feedback.

Explicit feedback requires students to rate resources which make the recommendation process not realistic because few students are willing to provide their interests explicitly. Thus, collaborative filtering suffers from “data sparsity” problem. In response to this problem, the study proposed a Web usage mining framework to alleviate the sparsity problem. The framework incorporates clustering mining technique and usage data in the recommendation process. Students perform different actions on D-library, in this study five different actions are identified, including printing, downloading, bookmarking, reading, and viewing the abstract. These actions provide the system with large quantities of implicit feedback data. The proposed framework also utilizes clustering data mining approach to reduce the sparsity problem. Furthermore, generating recommendations based on clusters produce better results because students belonging to the same cluster usually have similar interests.

The proposed framework is divided into two main components: off-line and online components. The off-line component is comprised of two stages: data pre-processing and the derivation of student clusters. The online component is comprised of two stages: building student's profile and generating recommendations. The second stage consists of three steps, in the first step the target student profile is classified to the closest cluster profile using the cosine similarity measure. In the second phase, the Pearson correlation coefficient method is used to select the most similar students to the target student from the chosen cluster to serve as a source of prediction. Finally, a top-list of resources is presented. Using the Book-Crossing dataset the effectiveness of the proposed framework was evaluated based on sparsity level, and Mean Absolute Error (MAE) regarding accuracy. The proposed framework reduced the sparsity level between (0.07% and 26.71%) in the sub-matrices, whereas the sparsity level is between 99.79% and 78.81% using the proposed framework, and 99.86% (for the original matrix) before applying the proposed framework. The experimental results indicated that by using the proposed framework the performance is as much as 13.12% better than clustering-only explicit feedback data, and 21.14% better than the standard K Nearest Neighbours method. The overall results show that the proposed framework can alleviate the Sparsity problem resulting in improving the accuracy of the recommendations.

Elmak Saad FINAL THESIS.PDF - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB) | Preview


Downloads per month over past year

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email