Luận Văn Mining association rules with adjustable interestingness

Thảo luận trong 'Ngôn Ngữ Học' bắt đầu bởi Thúy Viết Bài, 5/12/13.

  1. Thúy Viết Bài

    Thành viên vàng

    Bài viết:
    198,891
    Được thích:
    167
    Điểm thành tích:
    0
    Xu:
    0Xu
    T ABLE OF CONTENTS
    Acknowledgements .i
    Abstract ii
    Table of contents iii
    List of tables and figures .iv
    CHAPTER 1: Introduction . 1
    1.1. What is data mining? 1
    1.2. Data mining versus query tools 2
    1.3. Mining association rules . 3
    1.4. Outline of the thesis 5
    CHAPTER 2: Mining association rules with weighted items 6
    2.1. Introduction 6
    2.2. Problem definition 7
    CHAPTER 3: Mining association rules with adjustable interestingness .10
    3.1. Interestingness and interesting itemsets .10
    3.2. Interestingness constraints 11
    3.3. Motivation behind interesting itemsets and adjustable interestingness .12
    CHAPTER 4: Algorithm for mining association rules with adjustable
    interestingness (MARAI) .14
    4.1. Motivation 14
    4.2. Preliminaries .15
    4.3. Basic properties of itemset-tidset pairs 18
    4.4. MARAI: Algorithm design and implementation .20
    4.5. Experimental Evaluation 25
    CHAPTER 5: Conclusion .28
    References a
    Appendix b


    ABSTRACT
    Over the last several years, the problem of efficiently generating large numbers of
    association rules has been an active research topic in the data mining community.
    Many different algorithms have been developed with promising results. There are
    two current approaches to the association rule mining problem. The first is to mine
    the frequent itemsets regardless of their coefficients. The second is to assign
    weights to the items to reflect their importance to the users. However, they both
    rely on the using of the minimum support which may confuse us. Practically, we
    may want to mine the best rules to our knowledge instead of those which satisfy a
    certain threshold, especially if this threshold is an equation. To overcome this problem, we introduce the concept of adjustable interestingness and propose a novel approach in mining association rules based on adjustable interestingness. Our algorithm only works with the most interesting rules, thus reducing significantly search
    space by skipping many uninteresting itemsets and pruning those that cannot generate interesting itemsets at the earlier stage. Therefore, the total time needed for
    the mining is substantially decreased.


    CHAPTER 1
    INTRODUCTION
    In this chapter, we introduce the concept of data mining, and explain why it is regarded as such important developments. As companies is the background of mining
    association rules.
    1.1. What is data mining?
    There is confusion about the exact meaning between the terms ‘data mining’ and
    ‘knowledge discovery in databases (KDD)’. At the first international KDD conference in Montreal in 1995, it was proposed that the term ‘KDD’ be used to describe
    the whole process of extraction of knowledge from data. An official definition of
    KDD is: ‘the non-trivial extraction of implicit, previously unknown and potentially
    useful knowledge from data’ [2]. The knowledge which is discovered must be new,
    not obvious, and human must be able to use it for a particular purpose. It was also
    proposed that the term ‘data mining’ should be used exclusively for the discovery
    stage of the KDD process. The whole KDD steps include selection, preprocessing,
    transformation, data mining and the interpretation or evaluation. Data mining has
    been focused on as it is the most significant and most time-consuming among KDD
    steps.
    The sudden rise of interest in data mining can partly be explained by the following
    factors [2]:
    1. In the 1980s, all major organizations built infrastructural databases, containing
    data about their clients, competitors, and products. These databases form a potential
    gold-mine; they contain gigabytes of data with much ‘hidden’ information that
    cannot easily be traced using SQL (Structure Query Language). Data mining algo-
    2
    rithms can find interesting regularities in databases, whereas, SQL is just a query
    language; it only helps to find data under constraints of what we already know.
    2. As the use of networks continues to grow, it will become increasingly easy to
    connect databases. Thus, connecting a client’ s file to a file with demographic data
    may lead to unexpected views on the spending patterns of certain population
    groups.
    3. Over the past few years, machine-learning techniques have expanded enormously. Neural networks, genetic algorithms and other simple, generally applicable
    learning techniques often makes it easier to find interesting connections in databases.
    4. The client/sever revolution gives the individual knowledge worker access to central information systems, from a terminal on his or her desk.
    1.2. Data mining versus query tools
    What is the difference between data mining and a normal query environment?
    What can a data mining tool do that SQL cannot?
    It is significant to realize that data mining tools are complementary to query tools.
    A data mining tool does not replace a query tool but give a lot of additional possibilities [2]. Suppose that we have a large file containing millions of records that describe customers’ purchases in a supermarket. There is a wealth of potentially useful knowledge which can be found by trigger normal queries, such as ‘Who bought
    butter and bread last week?’ , ‘Is the profit of this month more than that of last
    month?’ and so on. There is, however, knowledge hidden in the databases that is
    much harder to find using SQL. Examples would be the answers to questions such
    as ‘What products were often purchased together?’ , or ‘What are the subsequent
    purchases after buying a gas cooker?’ . Of course, these questions could be answered using SQL but proceeding in such a way could take days or months to solve
    the problem, while a data mining algorithm could find the answers automatically in


    REFERENCES
    [1] R. Agrawal, T. Imielinski, and A. Swami, ‘Mining association rules between
    sets of items in large databases’ . In Proc. of the ACM SIGMOD Conference
    Management of Data, Washington D.C., May 1993.
    [2] P. Adriaans, D. Zantinge, ‘Data mining’ , Addison-Wesley, 1999.
    [3] J. Han, M. Kamber, ‘Data Mining: Concepts and Technique’ , University of
    Illinois, 2002
    [4] U. Fayyad, S. Chaudhuri, P. Bradley, ‘Data mining and its role in database
    systems’ , 1999
    [5] D. V. Thanh, P. T. Hoan, P. X. Hieu, N. T. Trung, ‘Khai phá lu WN WK SY L
     K  WU  NK{QJ JL QJ QKDX¶ >0LQLQJ DVVRFLDWLRQ UXOHV ZLWK GLIIHUHQW VXpports], Conference of junior scientists of Vietnam Nat’l Univ. Hanoi, pages
    475-483, 2002
    [6] C. H. Cai, ‘Mining association rules with weighted items’ , Thesis for degree
    of master, Chinese University of Hongkong, 1998
    [7] M. J. Zaki, C. J. Hsiao, ‘CHARM: An efficient algorithm for closed itemset
    mining’ , 2002
    [8] L. A. Zadeh, Fuzzy sets, Informat. Control, 338-353, 1965.
     

    Các file đính kèm: