In an era of knowledge explosion, the growth of data increases rapidly day by day. Since data storage is a limited resource, how to reduce the data space in the process becomes a challenge issue. Data compression provides a good solution which can lower the required space. Data mining has many useful applications in recent years because it can help users discover interesting knowledge in large databases. However, existing compression algorithms are not appropriate for data mining. In [1, 2], two different approaches were proposed to compress databases and then perform the data mining process. However, they all lack the ability to decompress the data to their original state and improve the data mining performance. In this research a new approach called Mining Merged Transactions with the Quantification Table (M2TQT) was proposed to solve these problems. M2TQT uses the relationship of transactions to merge related transactions and builds a quantification table to prune the candidate itemsets which are impossible to become frequent in order to improve the performance of mining association rules. The experiments show that M2TQT performs better than existing approaches.
{"references":["M. C. Hung, S. Q. Weng, J. Wu, and D. L. Yang, \"Efficient Mining of\nAssociation Rules Using Merged Transactions,\" in WSEAS Transactions\non Computers, Issue 5, Vol. 5, pp. 916-923, 2006.","M. Z. Ashrafi, D. Taniar, and K. Smith, \"A Compress-Based Association\nMining Algorithm for Large Dataset,\" in Proceedings of International\nConference on Computational Science, pp. 978-987, 2003.","U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, \"The KDD process for\nextracting useful knowledge from volumes of data,\" Communications of\nthe ACM, Vol. 39, pp. 27-34, 1996.","E. Hullermeier, \"Possibilistic Induction in Decision-Tree Learning,\" in\nProceedings of the 13th European Conference on Machine Learning, pp.\n173-184, 2002.","J. R. Quinlan, \"C4.5: programs for machine learning,\" Morgan Kaufmann\nPublishers Inc, 1993.","A. K. Jain and R. C. Dubes, Algorithm for clustering data: Prentice-Hall,\nInc., 1988.","R. Agrawal, T. Imielinski, and A. Swami, \"Mining Association Rules\nBetween Sets of Items in Large Databases,\" in Proceedings of the\nInternational Conference on Management of Data, pp. 207-216, 1993.","R. Agrawal and R. Srikant, \"Fast Algorithms for Mining Association\nRules,\" in Proceedings of the 20th International Conference on Very\nLarge Data Bases, pp. 487-499, 1994.","D. I. Lin and Z. M. Kedem, \"Pincer-search: an efficient algorithm for\ndiscovering the maximum frequent set,\" IEEE Transactions on\nKnowledge and Data Engineering, Vol. 14, pp. 553-566, 2002.\n[10] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, \"Dynamic Itemset\nCounting and Implication Rules for Market Basket Data,\" in Proceedings\nof the International Conference on Management of Data, pp. 255-264,\n1997.\n[11] A. Savasere, E. Omiecinski, and S. Navathe, \"An Efficient Algorithm for\nMining Association Rules in Large Databases,\" in Proceedings of the 21st\nInternational Conference on Very Large Data Bases, pp. 432-444, 1995.\n[12] J. Han, J. Pei, and Y. Yin, \"Mining Frequent Patterns without Candidate\nGeneration,\" in Proceedings of the International Conference on\nManagement of Data, pp. 1-12, 2000.\n[13] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, \"MAFIA: A\nmaximal frequent itemset algorithm,\" IEEE Transactions on Knowledge\nand Data Engineering, Vol. 17, pp. 1490-1504, 2005.\n[14] G. Grahne and J. Zhu, \"Fast algorithms for frequent itemset mining using\nFP-trees,\" IEEE Transactions on Knowledge and Data Engineering, Vol.\n17, pp. 1347-1362, 2005.\n[15] IBM Almaden Research Center, \"Synthetic Data Generation Code for\nAssociations and Sequential Patterns,\" URL:http://www.almaden.ibm.\ncom/software/quest/, 2006.\n[16] D. W. L. Cheung, S. D. Lee, and B. Kao, \"A general incremental\ntechnique for maintaining discovered association rules,\" in Proceedings\nof the 15th International Conference on Database Systems for Advanced\nApplications, pp. 185-194, 1997.\n[17] D. Xin, J. Han, X. Yan, and H. Cheng, \"Mining Compressed\nFrequent-Pattern Sets,\" in Proceedings of the 31st international\nconference on Very Large Data Bases, pp. 709-720, 2005."]}