Thạc Sĩ Nghiên cứu khả năng khái quát hóa của lập trình di truyền

Thảo luận trong 'THẠC SĨ - TIẾN SĨ' bắt đầu bởi Phí Lan Dương, 23/9/14.

  1. Phí Lan Dương

    Phí Lan Dương New Member
    Thành viên vàng

    Bài viết:
    18,524
    Được thích:
    18
    Điểm thành tích:
    0
    Xu:
    0Xu
    Contents
    Abstract ii
    List of Figures v
    List of Tables vii
    Abbreviations ix
    0.1 Motivations . 2
    0.2 Research Perspectives 3
    0.3 Contributions of the Thesis 3
    0.4 Organization of the Thesis . 4
    1 Backgrounds 6
    1.1 Genetic Programming . 6
    1.1.1 Representation, Initialization and Operators in GP . 7
    1.1.2 Some Variants of GP 17
    1.2 An Example of Problem Domain . 19
    1.3 Machine Learning 19
    1.4 GP and Machine Learning Issues . 21
    1.4.1 Search Space 21
    1.4.2 Bloat . 22
    ii
    1.4.3 Generalization and Complexity 22
    1.5 Generalization in GP 23
    1.5.1 Overview of Studies on GP Generalization Ability 24
    1.5.2 Methods for Improving GP Generalization Ability 25
    1.5.3 Problem of Improving Training Efficiency . 33
    1.6 Summary 35
    2 Early Stopping, Progressive Sampling, and Layered Learning 36
    2.1 Over-training, Early Stopping and Regularization 36
    2.1.1 Over-training 37
    2.1.2 Early Stopping . 37
    2.1.3 Regularization Learning 39
    2.2 Progressive Sampling 41
    2.2.1 Types of Progressive Sampling Schedule . 44
    2.2.2 Detection of Convergence . 47
    2.3 Layered Learning 49
    2.4 Conclusion 50
    3 An Empirical Study of Early Stopping for GP 51
    3.1 Method . 53
    3.1.1 Proposed Classes of Stopping Criteria . 53
    3.2 Experimental Settings . 57
    3.3 Results and Discussions 59
    3.3.1 Comparison with GP 59
    3.3.2 Comparison with Tarpeian . 76
    3.4 Conclusion 77
    iii
    4 A Progressive Sampling Framework for GP - An Empirical Approach 79
    4.1 The Proposed Method . 82
    4.1.1 The Sampling Schedule . 84
    4.1.2 The Initial Sample Set Size 84
    4.1.3 The Number of Learning Layers 86
    4.1.4 The Stopping Criterion for Each Layer 86
    4.2 Experimental Settings . 87
    4.2.1 Data Preparation 87
    4.2.2 GP Systems Configurations 87
    4.3 Results and Discussions 89
    4.3.1 Learning Effectiveness Comparison 89
    4.3.2 Learning Efficiency . 91
    4.3.3 Solution Complexity 92
    4.3.4 Distribution of Best Solutions by Layers . 93
    4.3.5 Synergies between System Components 94
    4.4 Conclusion 98
    5 Conclusions and Future Work 99
    5.1 Contributions 100
    5.2 Future Directions 101
    Bibliography 104
    iv
    List of Figures
    1.1 GP’s main loop [57] . 7
    1.2 GP expression tree representing max(x*x,x+3y). [57] . 8
    1.3 Creation of a full tree having maximum depth 2 (and therefore a total of
    seven nodes) using the Full initialisation method (t=time). [57] . 10
    1.4 Creation of a five node tree using the Grow initialisation method with a
    maximum depth of 2 (t=time). A terminal is chosen at t = 2, causing
    the left branch of the root to be terminated at that point even though the
    maximum depth has not been reached. [57] 12
    1.5 Example of subtree crossover 14
    1.6 Example of subtree mutation. [57] . 16
    2.1 Idealized training and validation error curves. Vertical axis: errors; horizontal axis:time [88] 38
    2.2 A real validation error curve. Vertical: validation set error; horizontal: time
    (in training epochs) [88] . 38
    2.3 Learning curves and progressive sample 44
    2.4 Regions of the Efficiency of Arithmetic and Geometric Schedule Sampling [89]. 46
    3.1 Distribution of stop time (last generation of the run) of the OF criterion
    and GPM 74
    v
    3.2 Distribution of stop time (last generation of the run) of the VF criterion
    and GPM 74
    3.3 Distribution of stop time (last generation of the run) of the GL criterion
    and GPM 75
    3.4 Distribution of stop time (last generation of the run) of the PQ criterion
    and GPM 75
    3.5 Distribution of stop time (last generation of the run) of the UP criterion
    and GPM 75
    4.1 PS framework for GP 84
    4.2 GP Evolution of each layer 87
    vi
    List of Tables
    1.1 Test Functions 20
    1.2 The real-world data sets 20
    3.1 Parameter settings for the GP systems 58
    3.2 Data Sets for Problems. For the synthetic problems, the notation [min, max
    ] defines the range from which the data points are sampled 59
    3.3 Generalization error and run time of VF, OF, True Adap stopping criterion
    and GP, Tar . 61
    3.4 Size of best individual of VF, OF and True Adap stopping criterion and GP,
    Tar 62
    3.5 p-value of run time and GE of VF, OF, True Adap stopping criterion and
    Tar vs GP 63
    3.6 p-value of complexity of VF, OF, True Adap stopping criterion and Tar vs
    GP 64
    3.7 Relative Generalization Errors and Run Time of OF, VF and True Adap
    stopping criterion (vs GP) . 65
    3.8 Generalization error and run time of GL, PQ and UP stopping criterion 66
    3.9 Size of the best individual of GL, PQ and UP stopping criterion 67
    3.10 p-value of GL, PQ and UP stopping criterion vs GP . 68
    vii
    3.11 p-value of size of best individual of comparison GL, PQ and UP stopping
    criterion with GP 69
    3.12 Relative Generalization Errors and Run Time of GL, PQ and UP stopping
    criterion (vs GP) 70
    3.13 p-value of GE, run time and size of best solution of True Adap, PQ vs Tarpeian 77
    4.1 Data Sets for the Test Functions 88
    4.2 Evolutionary Parameter Settings for the Genetic Programming Systems 89
    4.3 Mean and SD of Generalization Error, GPLL and GP (14 Problems) 90
    4.4 Relative Average Generalization Errors and p-values (GPLL vs GP) (14
    Problems) 90
    4.5 Mean Run Time, GPLL and GP (14 Problems) . 91
    4.6 Relative Run Times and p-values (GPLLs vs GP) (14 Problems) 92
    4.7 Mean Solution Size and p-values, GPLL vs GP (14 Problems) . 93
    4.8 Number of End-of-run Best Solutions First Found by GPLL in Each Layer 94
    4.9 Average Generalization Error on 14 Problems 95
    4.10 Average Run Time of All Systems . 96
    4.11 Average Size of Solutions Found by all Systems . 97
    viii
    Abbreviations
    Abbreviation Meaning
    ARMA AutoregressiveMoving-Average
    EA Evolutionary Algorithm
    DSS Dynamic Subset Selection
    GA Genetic Algorithms
    GGGP Grammar Guided Genetic Programming
    GP Genetic Programming
    GPLL Layered Learning Genetic Programming
    ILP Inductive Logic Programming
    KDD Knowledge Discovery and Data Mining
    LL Layered Learning
    ML Machine Learning
    MSE Mean Squared Error
    PDGP Parallel Distributed GP
    PS Progressive Sampling
    RST Random Sampling Technique
    ix
    Introduction
    Genetic programming (GP) paradigm was first proposed by Koza [53] can be viewed as the
    discovery of computer programs which produce desired outputs for particular inputs. A
    computer program could be an expression, formula, plan, control strategy, decision tree or
    a model depending on the type of problem. GP is a simple and powerful technique which
    has been applied to a wide range of problems in combinatorial optimization, automatic
    programming and model induction. The direct encoding of solutions as variable-length
    computer programs allows GP to provide solutions that can be evaluated and also examined
    to understand their internal workings. In the field of evolutionary and natural computation,
    GP plays a special role. In the Koza’s seminal book, the ambition of GP was to evolve in a
    population of programs that can learn automatically from the training data. Since its birth,
    GP has been developed fast with various fruitful applications. Some of the developments
    are in terms of the novel mechanisms regarding to GP evolutionary process, others were on
    finding potential areas on which GP could be successfully applied. Despite these advances
    in GP research, when it is applied to learning tasks, the issue of generalization of GP, as a
    learning machine, has not been taken the attention that it deserves. This thesis focuses on
    the generalization aspect of GP and proposes mechanisms to improve GP generalization
    ability.
    1
    0.1. MOTIVATIONS
    0.1 Motivations
    GP is one of the evolutionary algorithms (EAs). The common underlying idea behind all
    EA techniques is the same: given a population of individuals the environmental pressure
    causes natural selection (survival of the fittest) and this causes a rise in the fitness of the
    population. Given a quality function to be maximised we can randomly create a set of
    candidate solutions, i.e., elements of the function’s domain, and apply the quality function
    as an abstract fitness measure - the higher the better. Based on this fitness, some of the
    better candidates are chosen to seed the next generation by applying recombination and/or
    mutation to them. Recombination is an operator applied to two or more selected candidates
    (the so-called parents) and results one or more new candidates (the children). Mutation
    is applied to one candidate and results in one new candidate. Executing recombination
    and mutation leads to a set of new candidates (the offspring) that compete - based on
    their fitness (and possibly age) - with the old ones for a place in the next generation. This
    process can be iterated until a candidate with sufficient quality (a solution) is found or a
    previously set computational limit is reached. GP has been proposed as a machine learning
    (ML) method [53], and solutions to various learning problems are sought by means of an
    evolutionary process of program discovery. Since GP often tackles ML issues, generalization
    that is synonymous with robustness has been the desirable property at the end of the GP
    evolutionary process. The main goal when using GP or other ML techniques is not only to
    create a program that exactly cover the training examples (as in many GP applications so
    far), but also to have a good generalization ability: for unseen and future samples of data,
    the program should output values that closely resemble the underlying function. Although,
    recently, the issue of generalization in GP has received more attention than it deserves,
    the use of more traditional machine learning techniques has been rather limited. It is
    strange that generalization has been an old time studied topic in machine learning with
    2
    0.2. RESEARCH PERSPECTIVES
    many proposed practical techniques for improving learning machines (such as for decision
    trees, neural networks, etc). It is hoped that adapting these techniques to GP would help
    to improve GP generalization ability.
    0.2 Research Perspectives
    The focus of this thesis is on understanding the generalization of GP, the ways it can be
    improved and the role it plays in learning. Specifically, the generalization of GP is analysed
    to uncover key features. Based on that the thesis proposes two new learning mechanisms
    for GP, early stopping and progressive sampling. In the process of developing the better
    learning frameworks for GP, a clearer description of the scalability of the framework emerges
    to facilitate and motivate future enhancements. The current theoretical models of GP
    are limited in use and applicability due to their complexity. Therefore, the majority of
    theoretical work has been derived from experimentation. The approach taken in this thesis
    is also based on the careful designed experiments and the analysis of experimental results.
    0.3 Contributions of the Thesis
    This thesis makes the following contributions:
    1. An early stopping approach for GP that is based on the estimate of the generalization
    error as well as propose some new criteria apply to early stopping for GP. The GP
    training process will be stopped early at the time which promises to bring the solution
    with the smallest generalization error.
    2. A progressive sampling framework for GP, which divides GP learning process into
    several layers with the training set size, starting from being small, gradually get
    3
    0.4. ORGANIZATION OF THE THESIS
    bigger at each layer. The termination of training on each layer will be based on
    certain early stopping criteria.
    0.4 Organization of the Thesis
    The organization of the rest of the thesis is as follows:
    Chapter 1 first introduces the basic components of GP - the algorithm, representation,
    and operators as well as some benchmark problem domains (some are used for the experiments in this thesis). Two important research issues and a metaphor of GP search
    are then discussed. Next, the major issues of GP are discussed especially when the GP
    is considered as a machine learning system. It first overviews the approaches proposed
    in the literature that help to improve the generalization ability of GP. Then, the solution
    complexity (code bloat), in the particular context of GP, and its relations to GP learning
    performance (generalization, training time, solution comprehensibility) are discussed.
    Chapter 2 provides a backgrounds on a number of concepts and techniques subsequently
    used in the thesis for improving GP learning performance. They include progressive sampling, layer learning, and early stopping in machine learning.
    Chapter 3 presents one of the main contributions of the thesis. It proposes some criteria used to determine when to stop the training of GP with an aim to avoid over-fitting.
    Experiments are conducted to assess the effectiveness of the proposed early stopping criteria. The results emphasise that GP with early stopping could find solutions with at least
    similar generalization errors with standard GP but in much shorter training time and lower
    complexity of solutions.
    Chapter 4 develops a learning framework for GP that is based layer learning and progressive sampling. The results show that the solutions of GP with progressive sampling
    significantly reduce in size and the training time is much faster when compared to standard
    4
    0.4. ORGANIZATION OF THE THESIS
    GP while the generalization error of the obtained solutions is often smaller.
    Chapter 5 concludes the thesis summarizing the main results and proposing suggestions
    for future works that extend the research in this thesis.
     

    Các file đính kèm:

Đang tải...