Computing and Library Services - delivering an inspiring information environment

Improving rule sorting predictive accuracy and training time in associative classification

Thabtah, Fadi Abdeljaber, Cowling, Peter and Hammoud, Suhel (2006) Improving rule sorting predictive accuracy and training time in associative classification. Expert Systems With Applications, 31 (2). pp. 414-426. ISSN 0957-4174

[img] PDF
Restricted to Registered users only

Download (301kB)


Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n+1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.

Item Type: Article
Additional Information: UoA 23 (Computer Science and Informatics) Copyright © 2007 Elsevier B.V.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Schools: School of Computing and Engineering

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rule.
Proceedings of the 20th international conference on very large data bases
pp. 487–499.
Ali, K., Manganaris, S., & Srikant, R. (1997). Partial classification using
association rules. In D. Heckerman, H. Mannila, D. Pregibon, & R.
Uthurusamy (Eds.), Proceedings of the third international conference on
knowledge discovery and data mining, 115–118.
Antonie, M., & Zaı¨ane, O. (2004). An associative classifier based on positive and
negative rules. Proceedings of the 9th ACM SIGMOD workshop on Research
issues in data mining and knowledge discovery (pp. 64–69). Paris, France.
Antonie,M., Zaı¨ane, O., & Coman, A. (2003). Associative classifiers for medical
images. Lecture notes in artificial intelligence 2797, mining multimedia and
complex data. New York: Springer pp. 68–83.
Baralis, E., & Torino, P. (2002). A lazy approach to pruning classification rules.
Proceedings of the 2002 IEEE ICDM’02 pp. 35.
Brin, S., Motwani, R., Ullman, J., & Tsur, S. (1997). Dynamic itemset counting
and implication rules for market basket data. Proceedings of the 1997 ACM
SIGMOD international conference on management of data pp. 265–276.
CBA (1998).
Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules.
International Journal of Man–Machine Studies, 27(4), 349–370.
Cohen, W. (1995). Fast effective rule induction. Proceedings of the 12th
international conference on machine learning. Los Altos, CA: Morgan
Kaufmann pp. 115–123.
Fayyad, U., & Irani, K. (1993). Multi-interval discretisation of continues-valued
attributes for classification learning. Proceedings of IJCAI pp. 1022–1027.
Frank, E., & Witten, I. (1998). Generating accurate rule sets without global
optimisation. Proceedings of the fifteenth international conference on
machine learning. Madison, WI: Morgan Kaufmann pp. 144–151.
Furnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence
Review, 13(1), 3–54.
Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate
generation. In Proceeding of the 2000 ACM SIGMOD international
conference on management of data (pp. 1–12). Dallas, TX, May.
Huange, J., Zhao, J.,&Xie, Y. (1997). Source classification using pole method of
AR model. IEEE ICASSP, 1, 567–570.
Li, W. (2001). Classification based on multiple association rules, MSc thesis.
Simon Fraser University, April.
Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification
based on multiple-class association rule. Proceedings of the ICDM’01 (pp.
369–376). San Jose, CA.
Liu, B., Hsu,W., &Ma, Y. (1998). Integrating classification and association rule
mining. Proceedings of the KDD (pp. 80–86). New York, NY.
Liu, B., Ma, Y., & Wong, C.-K. (2001). Classification using association rules:
weakness and enhancements. In V. Kumar (Ed.), Data mining for scientific
Merz, C., & Murphy, P. (1996). UCI repository machine learning databases.
Irvine, CA: University of California.
Park, J., Chen, M., & Yu, P. (1995). An effective hash-based algorithm for
mining association rules. Proceedings of the ACM SIGMOD (pp. 175–186).
San Jose, CA.
Pazzani, M., Mani, S., & Shankle, W. (1993). Beyond Concise and colourful:
Learning intelligible rules. Proceeding of the KDD. Menlo Park, CA: AAAI
Press pp. 235–238.
Quinlan, J. (1987). Generating production rules from decision trees. Proceedings
of the 10th international joint conferences on artificial intelligence. Los
Altos, CA: Morgan Kaufmann pp. 304–307.
Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA:
Morgan Kaufmann.
Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for
mining association rules in large databases. Proceedings of the 21st
conference on very large databases VLDB ’95, Zurich, Switzerland,
Septemper 1995 pp. 432–444.
Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel
classifier for data mining. Proceedings of the 22nd international conference
on very large data bases (pp. 544–555). Bombay, India, September 1996.
Wang, K., He, Y., & Cheung, D. (2001). Mining confidence rules without
support requirements. Proceedings of the tenth international conference on
Information and knowledge management (pp. 89–96). Atlanta, Georgia.
WEKA (2000). Data mining software in Java:
Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools
and techniques with java implementations. San Francisco, CA: Morgan
Yin, X., & Han, J. (20030. CPAR: Classification based on predictive association
rule. Proceedings of the SDM (pp. 369–376). San Francisco, CA.
Zaki, M. (1999). Parallel and distributed association mining: A survey. In IEEE
concurrency, special issue on parallel mechanisms for data mining.
December, 1999, Vol. 7. no. 4, (pp14–25).
Zaki,M., & Gouda, K. (2003). Fast vertical mining using diffsets. Proceedings of
the ninth ACM SIGKDD international conference on knowledge discovery
and data mining (pp. 326–335). Washington, DC.
Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for
fast discovery of association rules. Proceedings of the third KDD conference
pp. 283–286.

Depositing User: Sara Taylor
Date Deposited: 02 Jul 2007
Last Modified: 15 Jan 2017 18:05


Downloads per month over past year

Repository Staff Only: item control page

View Item View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©