Computing and Library Services - delivering an inspiring information environment

Improving rule sorting predictive accuracy and training time in associative classification

Thabtah, Fadi, Cowling, Peter and Hammoud, Suhel (2006) Improving rule sorting predictive accuracy and training time in associative classification. Expert Systems With Applications, 31 (2). pp. 414-426. ISSN 0957-4174

[img] PDF
Restricted to Registered users only

Download (301kB)


Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n+1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.

Item Type: Article
Additional Information: UoA 23 (Computer Science and Informatics) Copyright © 2007 Elsevier B.V.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Schools: School of Computing and Engineering
References: Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rule. Proceedings of the 20th international conference on very large data bases pp. 487–499. Ali, K., Manganaris, S., & Srikant, R. (1997). Partial classification using association rules. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the third international conference on knowledge discovery and data mining, 115–118. Antonie, M., & Zaı¨ane, O. (2004). An associative classifier based on positive and negative rules. Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (pp. 64–69). Paris, France. Antonie,M., Zaı¨ane, O., & Coman, A. (2003). Associative classifiers for medical images. Lecture notes in artificial intelligence 2797, mining multimedia and complex data. New York: Springer pp. 68–83. Baralis, E., & Torino, P. (2002). A lazy approach to pruning classification rules. Proceedings of the 2002 IEEE ICDM’02 pp. 35. Brin, S., Motwani, R., Ullman, J., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. Proceedings of the 1997 ACM SIGMOD international conference on management of data pp. 265–276. CBA (1998). Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man–Machine Studies, 27(4), 349–370. Cohen, W. (1995). Fast effective rule induction. Proceedings of the 12th international conference on machine learning. Los Altos, CA: Morgan Kaufmann pp. 115–123. Fayyad, U., & Irani, K. (1993). Multi-interval discretisation of continues-valued attributes for classification learning. Proceedings of IJCAI pp. 1022–1027. Frank, E., & Witten, I. (1998). Generating accurate rule sets without global optimisation. Proceedings of the fifteenth international conference on machine learning. Madison, WI: Morgan Kaufmann pp. 144–151. Furnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54. Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceeding of the 2000 ACM SIGMOD international conference on management of data (pp. 1–12). Dallas, TX, May. Huange, J., Zhao, J.,&Xie, Y. (1997). Source classification using pole method of AR model. IEEE ICASSP, 1, 567–570. Li, W. (2001). Classification based on multiple association rules, MSc thesis. Simon Fraser University, April. Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple-class association rule. Proceedings of the ICDM’01 (pp. 369–376). San Jose, CA. Liu, B., Hsu,W., &Ma, Y. (1998). Integrating classification and association rule mining. Proceedings of the KDD (pp. 80–86). New York, NY. Liu, B., Ma, Y., & Wong, C.-K. (2001). Classification using association rules: weakness and enhancements. In V. Kumar (Ed.), Data mining for scientific applications. Merz, C., & Murphy, P. (1996). UCI repository machine learning databases. Irvine, CA: University of California. Park, J., Chen, M., & Yu, P. (1995). An effective hash-based algorithm for mining association rules. Proceedings of the ACM SIGMOD (pp. 175–186). San Jose, CA. Pazzani, M., Mani, S., & Shankle, W. (1993). Beyond Concise and colourful: Learning intelligible rules. Proceeding of the KDD. Menlo Park, CA: AAAI Press pp. 235–238. Quinlan, J. (1987). Generating production rules from decision trees. Proceedings of the 10th international joint conferences on artificial intelligence. Los Altos, CA: Morgan Kaufmann pp. 304–307. Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for mining association rules in large databases. Proceedings of the 21st conference on very large databases VLDB ’95, Zurich, Switzerland, Septemper 1995 pp. 432–444. Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. Proceedings of the 22nd international conference on very large data bases (pp. 544–555). Bombay, India, September 1996. Wang, K., He, Y., & Cheung, D. (2001). Mining confidence rules without support requirements. Proceedings of the tenth international conference on Information and knowledge management (pp. 89–96). Atlanta, Georgia. WEKA (2000). Data mining software in Java: weka. Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with java implementations. San Francisco, CA: Morgan Kaufmann. Yin, X., & Han, J. (20030. CPAR: Classification based on predictive association rule. Proceedings of the SDM (pp. 369–376). San Francisco, CA. Zaki, M. (1999). Parallel and distributed association mining: A survey. In IEEE concurrency, special issue on parallel mechanisms for data mining. December, 1999, Vol. 7. no. 4, (pp14–25). Zaki,M., & Gouda, K. (2003). Fast vertical mining using diffsets. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 326–335). Washington, DC. Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for fast discovery of association rules. Proceedings of the third KDD conference pp. 283–286.
Depositing User: Sara Taylor
Date Deposited: 02 Jul 2007
Last Modified: 28 Aug 2021 14:18


Downloads per month over past year

Repository Staff Only: item control page

View Item View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©