Search:
Computing and Library Services - delivering an inspiring information environment

Improving rule sorting predictive accuracy and training time in associative classification

Thabtah, Fadi Abdeljaber, Cowling, Peter and Hammoud, Suhel (2006) Improving rule sorting predictive accuracy and training time in associative classification. Expert Systems With Applications, 31 (2). pp. 414-426. ISSN 0957-4174

[img] PDF
Restricted to Registered users only

Download (294kB)

    Abstract

    Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n+1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.

    Item Type: Article
    Additional Information: UoA 23 (Computer Science and Informatics) Copyright © 2007 Elsevier B.V.
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    Schools: School of Computing and Engineering
    References:

    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rule.
    Proceedings of the 20th international conference on very large data bases
    pp. 487–499.
    Ali, K., Manganaris, S., & Srikant, R. (1997). Partial classification using
    association rules. In D. Heckerman, H. Mannila, D. Pregibon, & R.
    Uthurusamy (Eds.), Proceedings of the third international conference on
    knowledge discovery and data mining, 115–118.
    Antonie, M., & Zaı¨ane, O. (2004). An associative classifier based on positive and
    negative rules. Proceedings of the 9th ACM SIGMOD workshop on Research
    issues in data mining and knowledge discovery (pp. 64–69). Paris, France.
    Antonie,M., Zaı¨ane, O., & Coman, A. (2003). Associative classifiers for medical
    images. Lecture notes in artificial intelligence 2797, mining multimedia and
    complex data. New York: Springer pp. 68–83.
    Baralis, E., & Torino, P. (2002). A lazy approach to pruning classification rules.
    Proceedings of the 2002 IEEE ICDM’02 pp. 35.
    Brin, S., Motwani, R., Ullman, J., & Tsur, S. (1997). Dynamic itemset counting
    and implication rules for market basket data. Proceedings of the 1997 ACM
    SIGMOD international conference on management of data pp. 265–276.
    CBA (1998). http://www.comp.nus.edu.sg/~dm2/p_download.html.
    Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules.
    International Journal of Man–Machine Studies, 27(4), 349–370.
    Cohen, W. (1995). Fast effective rule induction. Proceedings of the 12th
    international conference on machine learning. Los Altos, CA: Morgan
    Kaufmann pp. 115–123.
    Fayyad, U., & Irani, K. (1993). Multi-interval discretisation of continues-valued
    attributes for classification learning. Proceedings of IJCAI pp. 1022–1027.
    Frank, E., & Witten, I. (1998). Generating accurate rule sets without global
    optimisation. Proceedings of the fifteenth international conference on
    machine learning. Madison, WI: Morgan Kaufmann pp. 144–151.
    Furnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence
    Review, 13(1), 3–54.
    Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate
    generation. In Proceeding of the 2000 ACM SIGMOD international
    conference on management of data (pp. 1–12). Dallas, TX, May.
    Huange, J., Zhao, J.,&Xie, Y. (1997). Source classification using pole method of
    AR model. IEEE ICASSP, 1, 567–570.
    Li, W. (2001). Classification based on multiple association rules, MSc thesis.
    Simon Fraser University, April.
    Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification
    based on multiple-class association rule. Proceedings of the ICDM’01 (pp.
    369–376). San Jose, CA.
    Liu, B., Hsu,W., &Ma, Y. (1998). Integrating classification and association rule
    mining. Proceedings of the KDD (pp. 80–86). New York, NY.
    Liu, B., Ma, Y., & Wong, C.-K. (2001). Classification using association rules:
    weakness and enhancements. In V. Kumar (Ed.), Data mining for scientific
    applications.
    Merz, C., & Murphy, P. (1996). UCI repository machine learning databases.
    Irvine, CA: University of California.
    Park, J., Chen, M., & Yu, P. (1995). An effective hash-based algorithm for
    mining association rules. Proceedings of the ACM SIGMOD (pp. 175–186).
    San Jose, CA.
    Pazzani, M., Mani, S., & Shankle, W. (1993). Beyond Concise and colourful:
    Learning intelligible rules. Proceeding of the KDD. Menlo Park, CA: AAAI
    Press pp. 235–238.
    Quinlan, J. (1987). Generating production rules from decision trees. Proceedings
    of the 10th international joint conferences on artificial intelligence. Los
    Altos, CA: Morgan Kaufmann pp. 304–307.
    Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA:
    Morgan Kaufmann.
    Savasere, A., Omiecinski, E., & Navathe, S. (1995). An efficient algorithm for
    mining association rules in large databases. Proceedings of the 21st
    conference on very large databases VLDB ’95, Zurich, Switzerland,
    Septemper 1995 pp. 432–444.
    Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel
    classifier for data mining. Proceedings of the 22nd international conference
    on very large data bases (pp. 544–555). Bombay, India, September 1996.
    Wang, K., He, Y., & Cheung, D. (2001). Mining confidence rules without
    support requirements. Proceedings of the tenth international conference on
    Information and knowledge management (pp. 89–96). Atlanta, Georgia.
    WEKA (2000). Data mining software in Java: http://www.cs.waikato.ac.nz/ml/
    weka.
    Witten, I., & Frank, E. (2000). Data mining: Practical machine learning tools
    and techniques with java implementations. San Francisco, CA: Morgan
    Kaufmann.
    Yin, X., & Han, J. (20030. CPAR: Classification based on predictive association
    rule. Proceedings of the SDM (pp. 369–376). San Francisco, CA.
    Zaki, M. (1999). Parallel and distributed association mining: A survey. In IEEE
    concurrency, special issue on parallel mechanisms for data mining.
    December, 1999, Vol. 7. no. 4, (pp14–25).
    Zaki,M., & Gouda, K. (2003). Fast vertical mining using diffsets. Proceedings of
    the ninth ACM SIGKDD international conference on knowledge discovery
    and data mining (pp. 326–335). Washington, DC.
    Zaki, M., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for
    fast discovery of association rules. Proceedings of the third KDD conference
    pp. 283–286.

    Depositing User: Sara Taylor
    Date Deposited: 02 Jul 2007
    Last Modified: 28 Jul 2010 19:20
    URI: http://eprints.hud.ac.uk/id/eprint/270

    Document Downloads

    Downloader Countries

    More statistics for this item...

    Item control for Repository Staff only:

    View Item

    University of Huddersfield, Queensgate, Huddersfield, HD1 3DH Copyright and Disclaimer All rights reserved ©