Computing and Library Services - delivering an inspiring information environment

Pruning techniques in associative classification: Survey and comparison

Thabtah, Fadi (2006) Pruning techniques in associative classification: Survey and comparison. Journal of Digital Information Management, 4 (3). pp. 197-202. ISSN 0972-7272

[img] PDF
Restricted to Registered users only

Download (182kB)


Association rule discovery and classification are common data mining tasks. Integrating association rule and classification also known as associative classification is a promising approach that derives classifiers highly competitive with regards to accuracy to that of traditional classification approaches such as rule induction and decision trees. However, the size of the classifiers generated by associative classification is often large and therefore pruning becomes an essential task.

In this paper, we survey different rule pruning methods used by current associative classification techniques. Further, we compare the effect of three pruning methods (database coverage, pessimistic error estimation, lazypruning) on the accuracy rate and the number of rules derived from different classification data sets. Results obtained from experimenting on different data sets from UCI data collection indicate that lazy pruning algorithms may produce slightly higher predictive classifiers than those which utilise database coverage and pessimistic error pruning methods. However, the potential use of such classifiers is limited because they are difficult to understand and maintain by the end-user.

Item Type: Article
Additional Information: © Reproduced by permission of Journal of Digital Information Management Published by Digital Information Research Foundation
Uncontrolled Keywords: Associative Classification, Association Rule, Classification, Data Mining, Rule Pruning
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Schools: School of Computing and Engineering
References: Adamo, J. (2006). Association Rule based Classifier Built via Direct Enumeration, Online Pruning and Genetic Algorithm based Rule Decimation. Artificial Intelligence and Applications 2006: 370-376. Agrawal, R., Srikant, R (1994). Fast algorithms for mining association rule. Proceedings of the 20th International Conferenceon Very Large Data Bases. p. 487-499. Baralis, E., Torino, P (2002). A lazy approach to pruning classification rules. Proceedings of the 2002 IEEE ICDM’02. p. 35. Clark, P., Boswell, R (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff, editor, Machine Learning - EWSL- 91, p. 151-163. Berlin, Springer-Verlag. Cohen, W. (1995). Fast effective rule induction. Proceedings of the 12th International Conference on Machine Learning, (pp. 115-123). Morgan Kaufmann, CA. Dong, G., Li., J (1999). Efficient mining of emerging patterns: Discovering trends and differences. Proceedings of the Int’l Conf. Of Knowledge Discovery and Data Mining, (pp. 43-52). Duda, R., Hart, P (1973). Pattern classification and scene analysis. John Wiley & son. Frank, E., Witten, I (1998). Generating accurate rule sets without global optimisation. Proceedings of the Fifteenth International Conference on Machine Learning, p. 144–151. Morgan Kaufmann, Madison, Wisconsin. Freitas, A (2000). Understanding the crucial difference between classification and association rule discovery. ACM SIGKDD Explorations Newsletter, 2 (1) 65-69. Li, W., Han, J., Pei, J (2001). CMAR: Accurate and efficient classification based on multiple-class association rule. Proceedings of the ICDM’01 p. 369-376). San Jose, CA. Liu, B., Hsu, W., Ma, Y (1998). Integrating classification and association rule mining. Proceedings of the KDD, (pp. 80-86). New York, NY. Liu, B., Hsu, W., Ma, Y (1999). Mining association rules with multiple minimum supports. Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.337-341. San Diego, California. Meretakis, D., Wüthrich, B (1999). Extending naïve Bayes classifiers using long itemsets. Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 165 – 174). San Diego, California. Merz, C., Murphy, P (1996). UCI repository of machine learning databases. Irvine, CA, University of California, Department of Information and Computer Science. Tan P-N, Steinbach M., Kumar V (2005). Introduction to data mining. Addison Wesley Thabtah, F., Cowling, P., Peng, Y (2005). MCAR: Multi-class classification based on association rule approach. Proceeding of the 3rd IEEE International Conference on Computer Systems and Applications p. 1-7. Cairo, Egypt. Thabtah, F., Cowling, P., Peng, Y (2004). MMAC: A new multi-class, multi-label associative classification approach. Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM ’04), (pp. 217-224). Brighton, UK. (Nominated for the Best paper award). Quinlan, J (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann. Quinlan, J (1987). Simplifying decision trees. International journal of man-machine studies, 27-(3) 221-248. Quinlan, J (1979). Discovering rules from large collections of examples: a case study. In: D. Michie, editor, Expert Systems in the Micro-electronic Age, p.168—201). Edinburgh University Press, Edinburgh. Snedecor, W., Cochran, W (1989). Statistical Methods, Eighth Edition, Iowa State University Press. Antonie, M., Zaïane, O., Coman, A. (2003) associative classifiers for medical images. Lecture Notes in Artificial Intelligence 2797, Mining Multimedia and Complex Data, (pp. 68-83). Springer-Verlag. Witten, I., Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. San Francisco: Morgan Kaufmann. Zaiane, O., Antonie, M (2005). Pruning and Tuning Rules for Associative Classifiers. Ninth International Conference on Knowledge-Based Intelligence Information & Engineering Systems (KES’05), (pp. 966-973). Melbourne, Australia, September 2005. CBA. Yin, X., Han, J (2003). CPAR: Classification based on predictive association rule. Proceedings of the SDM p. 369-376. San Francisco, CA. WEKA (2000). Data Mining Software in Java: http://
Depositing User: Sara Taylor
Date Deposited: 05 Jul 2007
Last Modified: 28 Aug 2021 14:18


Downloads per month over past year

Repository Staff Only: item control page

View Item View Item

University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©