Thabtah, Fadi, Cowling, Peter and Hammoud, Suhel (2006) Improving rule sorting predictive accuracy and training time in associative classification. Expert Systems With Applications, 31 (2). pp. 414-426. ISSN 0957-4174

Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n+1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.

Restricted to Registered users only

Download (301kB)
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email