Search:
Computing and Library Services - delivering an inspiring information environment

Pruning techniques in associative classification: Survey and comparison

Thabtah, Fadi Abdeljaber (2006) Pruning techniques in associative classification: Survey and comparison. Journal of Digital Information Management, 4 (3). pp. 197-202. ISSN 0972-7272

[img] PDF
Restricted to Registered users only

Download (178kB)

    Abstract

    Association rule discovery and classification
    are common data mining tasks. Integrating association
    rule and classification also known as associative
    classification is a promising approach that derives
    classifiers highly competitive with regards to accuracy to
    that of traditional classification approaches such as rule
    induction and decision trees. However, the size of the
    classifiers generated by associative classification is often
    large and therefore pruning becomes an essential task.
    In this paper, we survey different rule pruning methods
    used by current associative classification techniques.
    Further, we compare the effect of three pruning methods
    (database coverage, pessimistic error estimation, lazy
    pruning) on the accuracy rate and the number of rules
    derived from different classification data sets. Results
    obtained from experimenting on different data sets from
    UCI data collection indicate that lazy pruning algorithms
    may produce slightly higher predictive classifiers than
    those which utilise database coverage and pessimistic
    error pruning methods. However, the potential use of such
    classifiers is limited because they are difficult to
    understand and maintain by the end-user.

    Item Type: Article
    Additional Information: © Reproduced by permission of Journal of Digital Information Management Published by Digital Information Research Foundation
    Uncontrolled Keywords: Associative Classification, Association Rule, Classification, Data Mining, Rule Pruning
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    Q Science > QA Mathematics > QA76 Computer software
    Schools: School of Computing and Engineering
    References:

    Adamo, J. (2006). Association Rule based Classifier Built via Direct
    Enumeration, Online Pruning and Genetic Algorithm based Rule
    Decimation. Artificial Intelligence and Applications 2006: 370-376.
    Agrawal, R., Srikant, R (1994). Fast algorithms for mining
    association rule. Proceedings of the 20th International Conferenceon
    Very Large Data Bases. p. 487-499.
    Baralis, E., Torino, P (2002). A lazy approach to pruning classification
    rules. Proceedings of the 2002 IEEE ICDM’02. p. 35.
    Clark, P., Boswell, R (1991). Rule induction with CN2: Some recent
    improvements. In Y. Kodratoff, editor, Machine Learning - EWSL-
    91, p. 151-163. Berlin, Springer-Verlag.
    Cohen, W. (1995). Fast effective rule induction. Proceedings of the
    12th International Conference on Machine Learning, (pp. 115-123).
    Morgan Kaufmann, CA.
    Dong, G., Li., J (1999). Efficient mining of emerging patterns:
    Discovering trends and differences. Proceedings of the Int’l Conf.
    Of Knowledge Discovery and Data Mining, (pp. 43-52).
    Duda, R., Hart, P (1973). Pattern classification and scene analysis.
    John Wiley & son.
    Frank, E., Witten, I (1998). Generating accurate rule sets without
    global optimisation. Proceedings of the Fifteenth International
    Conference on Machine Learning, p. 144–151. Morgan Kaufmann,
    Madison, Wisconsin.
    Freitas, A (2000). Understanding the crucial difference between
    classification and association rule discovery. ACM SIGKDD
    Explorations Newsletter, 2 (1) 65-69.
    Li, W., Han, J., Pei, J (2001). CMAR: Accurate and efficient
    classification based on multiple-class association rule. Proceedings
    of the ICDM’01 p. 369-376). San Jose, CA.
    Liu, B., Hsu, W., Ma, Y (1998). Integrating classification and
    association rule mining. Proceedings of the KDD, (pp. 80-86). New
    York, NY.
    Liu, B., Hsu, W., Ma, Y (1999). Mining association rules with multiple
    minimum supports. Proceedings of the fifth ACM SIGKDD International
    Conference on Knowledge Discovery and Data Mining, p.337-341.
    San Diego, California.
    Meretakis, D., Wüthrich, B (1999). Extending naïve Bayes classifiers
    using long itemsets. Proceedings of the fifth ACM SIGKDD
    International Conference on Knowledge Discovery and Data Mining,
    p. 165 – 174). San Diego, California.
    Merz, C., Murphy, P (1996). UCI repository of machine learning
    databases. Irvine, CA, University of California, Department of
    Information and Computer Science.
    Tan P-N, Steinbach M., Kumar V (2005). Introduction to data mining.
    Addison Wesley
    Thabtah, F., Cowling, P., Peng, Y (2005). MCAR: Multi-class
    classification based on association rule approach. Proceeding of
    the 3rd IEEE International Conference on Computer Systems and
    Applications p. 1-7. Cairo, Egypt.
    Thabtah, F., Cowling, P., Peng, Y (2004). MMAC: A new multi-class,
    multi-label associative classification approach. Proceedings of the
    Fourth IEEE International Conference on Data Mining (ICDM ’04), (pp.
    217-224). Brighton, UK. (Nominated for the Best paper award).
    Quinlan, J (1993). C4.5: Programs for machine learning. San Mateo,
    CA: Morgan Kaufmann.
    Quinlan, J (1987). Simplifying decision trees. International journal of
    man-machine studies, 27-(3) 221-248.
    Quinlan, J (1979). Discovering rules from large collections of
    examples: a case study. In: D. Michie, editor, Expert Systems in the
    Micro-electronic Age, p.168—201). Edinburgh University Press,
    Edinburgh.
    Snedecor, W., Cochran, W (1989). Statistical Methods, Eighth Edition,
    Iowa State University Press.
    Antonie, M., Zaïane, O., Coman, A. (2003) associative classifiers
    for medical images. Lecture Notes in Artificial Intelligence 2797, Mining
    Multimedia and Complex Data, (pp. 68-83). Springer-Verlag.
    Witten, I., Frank, E. (2000). Data mining: practical machine learning
    tools and techniques with Java implementations. San Francisco:
    Morgan Kaufmann.
    Zaiane, O., Antonie, M (2005). Pruning and Tuning Rules for
    Associative Classifiers. Ninth International Conference on
    Knowledge-Based Intelligence Information & Engineering Systems
    (KES’05), (pp. 966-973). Melbourne, Australia, September 2005.
    CBA. http://www.comp.nus.edu.sg/~dm2/p_download.html
    Yin, X., Han, J (2003). CPAR: Classification based on predictive
    association rule. Proceedings of the SDM p. 369-376. San Francisco,
    CA.
    WEKA (2000). Data Mining Software in Java: http://
    www.cs.waikato.ac.nz/ml/weka.

    Depositing User: Sara Taylor
    Date Deposited: 05 Jul 2007
    Last Modified: 28 Jul 2010 19:20
    URI: http://eprints.hud.ac.uk/id/eprint/271

    Document Downloads

    Downloader Countries

    More statistics for this item...

    Item control for Repository Staff only:

    View Item

    University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©