Luận Văn Mining association rules with adjustable interestingness

Thúy Viết Bài · 5/12/13

T ABLE OF CONTENTS
Acknowledgements .i
Abstract ii
Table of contents iii
List of tables and figures .iv
CHAPTER 1: Introduction . 1
1.1. What is data mining? 1
1.2. Data mining versus query tools 2
1.3. Mining association rules . 3
1.4. Outline of the thesis 5
CHAPTER 2: Mining association rules with weighted items 6
2.1. Introduction 6
2.2. Problem definition 7
CHAPTER 3: Mining association rules with adjustable interestingness .10
3.1. Interestingness and interesting itemsets .10
3.2. Interestingness constraints 11
3.3. Motivation behind interesting itemsets and adjustable interestingness .12
CHAPTER 4: Algorithm for mining association rules with adjustable
interestingness (MARAI) .14
4.1. Motivation 14
4.2. Preliminaries .15
4.3. Basic properties of itemset-tidset pairs 18
4.4. MARAI: Algorithm design and implementation .20
4.5. Experimental Evaluation 25
CHAPTER 5: Conclusion .28
References a
Appendix b

ABSTRACT
Over the last several years, the problem of efficiently generating large numbers of
association rules has been an active research topic in the data mining community.
Many different algorithms have been developed with promising results. There are
two current approaches to the association rule mining problem. The first is to mine
the frequent itemsets regardless of their coefficients. The second is to assign
weights to the items to reflect their importance to the users. However, they both
rely on the using of the minimum support which may confuse us. Practically, we
may want to mine the best rules to our knowledge instead of those which satisfy a
certain threshold, especially if this threshold is an equation. To overcome this problem, we introduce the concept of adjustable interestingness and propose a novel approach in mining association rules based on adjustable interestingness. Our algorithm only works with the most interesting rules, thus reducing significantly search
space by skipping many uninteresting itemsets and pruning those that cannot generate interesting itemsets at the earlier stage. Therefore, the total time needed for
the mining is substantially decreased.

CHAPTER 1
INTRODUCTION
In this chapter, we introduce the concept of data mining, and explain why it is regarded as such important developments. As companies is the background of mining
association rules.
1.1. What is data mining?
There is confusion about the exact meaning between the terms ‘data mining’ and
‘knowledge discovery in databases (KDD)’. At the first international KDD conference in Montreal in 1995, it was proposed that the term ‘KDD’ be used to describe
the whole process of extraction of knowledge from data. An official definition of
KDD is: ‘the non-trivial extraction of implicit, previously unknown and potentially
useful knowledge from data’ [2]. The knowledge which is discovered must be new,
not obvious, and human must be able to use it for a particular purpose. It was also
proposed that the term ‘data mining’ should be used exclusively for the discovery
stage of the KDD process. The whole KDD steps include selection, preprocessing,
transformation, data mining and the interpretation or evaluation. Data mining has
been focused on as it is the most significant and most time-consuming among KDD
steps.
The sudden rise of interest in data mining can partly be explained by the following
factors [2]:
1. In the 1980s, all major organizations built infrastructural databases, containing
data about their clients, competitors, and products. These databases form a potential
gold-mine; they contain gigabytes of data with much ‘hidden’ information that
cannot easily be traced using SQL (Structure Query Language). Data mining algo-
2
rithms can find interesting regularities in databases, whereas, SQL is just a query
language; it only helps to find data under constraints of what we already know.
2. As the use of networks continues to grow, it will become increasingly easy to
connect databases. Thus, connecting a client’ s file to a file with demographic data
may lead to unexpected views on the spending patterns of certain population
groups.
3. Over the past few years, machine-learning techniques have expanded enormously. Neural networks, genetic algorithms and other simple, generally applicable
learning techniques often makes it easier to find interesting connections in databases.
4. The client/sever revolution gives the individual knowledge worker access to central information systems, from a terminal on his or her desk.
1.2. Data mining versus query tools
What is the difference between data mining and a normal query environment?
What can a data mining tool do that SQL cannot?
It is significant to realize that data mining tools are complementary to query tools.
A data mining tool does not replace a query tool but give a lot of additional possibilities [2]. Suppose that we have a large file containing millions of records that describe customers’ purchases in a supermarket. There is a wealth of potentially useful knowledge which can be found by trigger normal queries, such as ‘Who bought
butter and bread last week?’ , ‘Is the profit of this month more than that of last
month?’ and so on. There is, however, knowledge hidden in the databases that is
much harder to find using SQL. Examples would be the answers to questions such
as ‘What products were often purchased together?’ , or ‘What are the subsequent
purchases after buying a gas cooker?’ . Of course, these questions could be answered using SQL but proceeding in such a way could take days or months to solve
the problem, while a data mining algorithm could find the answers automatically in

REFERENCES
[1] R. Agrawal, T. Imielinski, and A. Swami, ‘Mining association rules between
sets of items in large databases’ . In Proc. of the ACM SIGMOD Conference
Management of Data, Washington D.C., May 1993.
[2] P. Adriaans, D. Zantinge, ‘Data mining’ , Addison-Wesley, 1999.
[3] J. Han, M. Kamber, ‘Data Mining: Concepts and Technique’ , University of
Illinois, 2002
[4] U. Fayyad, S. Chaudhuri, P. Bradley, ‘Data mining and its role in database
systems’ , 1999
[5] D. V. Thanh, P. T. Hoan, P. X. Hieu, N. T. Trung, ‘Khai phá lu WN WK SY L
K WU NK{QJ JL QJ QKDX¶ >0LQLQJ DVVRFLDWLRQ UXOHV ZLWK GLIIHUHQW VXpports], Conference of junior scientists of Vietnam Nat’l Univ. Hanoi, pages
475-483, 2002
[6] C. H. Cai, ‘Mining association rules with weighted items’ , Thesis for degree
of master, Chinese University of Hongkong, 1998
[7] M. J. Zaki, C. J. Hsiao, ‘CHARM: An efficient algorithm for closed itemset
mining’ , 2002
[8] L. A. Zadeh, Fuzzy sets, Informat. Control, 338-353, 1965.

Luận Văn Mining association rules with adjustable interestingness

Thúy Viết Bài New Member
Thành viên vàng

Các file đính kèm:

mining-association-rules-with-adjustable-interestingness-.pdf

Tải tài liệu

Diễn đàn

Chứng nhận bảo mật

Theo dõi chúng tôi

Tìm kiếm hữu ích

Luận Văn Mining association rules with adjustable interestingness

Thúy Viết Bài New Member Thành viên vàng

Các file đính kèm:

mining-association-rules-with-adjustable-interestingness-.pdf

Thúy Viết Bài New Member
Thành viên vàng