New real coded crossover operators for genetic algorithms based on incomplete dominance and gene memory.
Genetic Algorithms (GA) are applied to solve the optimization problems like TSP, Bioinformatics, knapsack problem etc. They perform reasonably well in spite of the longer convergence time and the local minima problem. Crossover operators are considered to be the backbone of genetic algorithms (Spears, 1995). GA can work better if problem specific crossover operators are designed. Currently there are relatively few operators designed specifically for certain problems.
Real coded GA is a special group of GAs designed to handle applications that demand real encoding for chromosome representation. This forms a major sub-group of GAs and many crossover operators are designed for real coded applications. We have considered the Learning to Rank for IR (LTOR) problem which is a real coded application to evaluate our new crossover operators as it is an important sub area in the area of Search engines and research in this area is gaining momentum due the need to tackle the information overload problem.
Our contributions in this paper are twofold, one is the designing four new crossover operators specific to the real coded GAs called the "One-point averaging", "two-point averaging", "N-point averaging" and the "Uniform averaging" crossover operators which are based on the Natural incomplete dominance phenomenon (Gardner et al., 2006) where some features of the child are a mix of features of both the parents. Our second contribution is the novel use of gene memory (Genetic_memory) concept in crossover so that any previous good value for a gene is remembered through generations and helps in forming the best child.
The rest of the paper is organized as follows: Section 2 gives a brief description of the background of our work. Section 3 discusses in detail about our methodology. Section 4 discusses the evaluation of our work.
We discuss the relevant background details for our work in this section.
A genetic algorithm is a search heuristic that mimics the process of natural evolution. GA begins with solutions to the problem which are encoded into the population, a fitness function is applied for evaluating the fitness of each individual, after which a new generation is created through the process of selection, crossover and mutation. The cycle of forming the new generation and evaluating them continues till the termination condition is met. After the termination of genetic algorithm, an optimal solution is obtained. Real-coded GA is a special type of GA where the chromosome has real encoding.
Crossover is a process that reproduces clones of good strings from parents in their children, but does not create new ones. In general, Genetic algorithms have many different forms and, for each of them, several parameters influence their behavior. Many researchers have studied various forms of genetic algorithms (Spears, 1995), (Mitchell et al., 1991) and as a result, they conclude that there is no general best GA, and that each form of GA can be better than others, depending on the particular application it is used for. In this context, our work tries to improve the crossover operation. Various studies have shown that the employed form of crossover can determine the performance of the GA (Spears, 1995). Therefore, based on this idea, we have tried to improve the performance of the GA by improving the crossover operator.
Now, we discuss some of the other crossover operators. In the case of one-point, two-point and n-point crossover (De Jong, 1975), we randomly choose cut points and swap the alternate segments of both the parents to form two children. The restricted crossover operator is identical to the simple one, with the difference that the cross point can only be chosen between the first and the last position where the parents' chromosomes are different. The uniform crossover operator (Syswerda, 1989) consists in independently choosing, for each locus i from 1 through L - 1, if the parents genes will be swapped or not. This choice depends on a swap probability. The fusion operator (Beasley and Chu, 1996) produces only one child from two parents. For each gene, the child inherits the value from one or the other of the parents with a probability according to its performance.
Wright's (1991) Heuristic crossover had a bias towards the better offspring. Radcliffe's (1991) Flat Crossover generates offspring randomly between the genes of the parents. Michalewicz (1992) developed Arithmetical Crossover and its two variants: uniform arithmetical crossover and non-uniform arithmetical crossover. Eshelman and Schaffer (1993) suggested a blend crossover (BLX-a) operator for two parents. Muhlebein and Voosen (1993) introduced Extended Line Crossover and Extended Intermediate Crossover which is a special case of Blend crossover. Voigt et al. (1995) presented a fuzzy recombination operator. Deb and Agrawal (1995) developed the Simulated Binary Crossover (SBX). Ono and Kobayashi (1997) suggested a unimodal normally distributed crossover operator (UNDX). Herrera and Lozano (1996) introduced real coded cross over based on fuzzy connectives. Some extensions of fuzzy connective based crossover operator are suggested in Herrera et al. (1999).
Tsutsui et al. (1999) Simplex Crossover that works well on functions having multimodality and/or epistasis with a medium number of parents. Deb et al. (2002) proposed another real coded crossover called parent centric crossover operator (PCX). A modified PCX operator proposed by Sinha et al. (2005) depends heavily on a priori knowledge of the problem under consideration. However, it shows a remarkable improvement in performance. Recombination operators such as unimodal normal distribution crossover (UNDX), simplex crossover (SPX), and blend crossover (BLX) are mean-centric approaches, whereas the simulated binary crossover (SBX), fuzzy recombination and parent centric crossover (PCX) are parent-centric approaches. In this light, our operators are partially mean-centric approaches since the mean-centric approach is not used for all the genes.
While almost all the above said crossover operators are for some other applications, the Dissociated crossover operator that employs different crossover techniques for the two children (Vrajitoru, 1998) is a new crossover operator specifically designed for the Learning to rank for Information retrieval problem. Our work is similar to fusion operator and the heuristic operator in a way that only one offspring is produced as a result of the crossover operator. But, Fusion operator uses probability based on performance to choose the child's Gene while Heuristic crossover forms a child by randomly combining the genes of the parents, checking if the child is better than both of the parents, discarding if not, produce another child and so on till the child has a better fitness value and hence we consider these three operators as our baselines. An important reason for using the fusion crossover operator, heuristic crossover and the dissociated crossover operators and not the real-adapted onepoint, two-point, n-point and uniform crossover operators as the baselines to evaluate our work since it is proved that application specific crossover operators perform better than the traditional ones.
It is the norm of most of the existing crossover operators to choose either of the parents' genes to form an offspring. While this helps in maintaining the good genes across the generations, the same happens for the bad genes as well. Though mutation helps to introduce changes to the genes in the population, the effect is minimal since only a few genes undergo changes and there is a high possibility that the bad gene is not mutated due to the randomness in selecting the gene for mutation. Hence, we go in for a mix of both the parental genes as it occurs in the phenomenon of Incomplete Dominance (Gardner et al., 2006).
Crossover using Incomplete Dominance:
Unlike "Dominance", where strictly either red (homozygous and heterozygous) or white (homozygous) flowers should be produced, plants like the snapdragon and four O'clock plant where pure breeds (homozygous) have red and white color flowers and on mixing them, we get a heterozygous pink in their flowers. We use this concept for the real coded operators in the form of averaging the parents' gene values to produce the offspring i.e. normally, when two real coded chromosomes are crossed over, the offspring gets eit her parents' gene and our crossover gives the average of both the parents' gene value to the offspring. This also satisfies the important property of incomplete dominance which is giving the homozygous (both parents are red or white) property only when both the parents are having the same value (red or white) for the gene. Here, two real numbers that are almost close to each other produce another real number as an average which is also closer to the former two. Similarly, when the parents have heterozygous property (one parent is red and the other white), the offspring has a different value (pink) from that of the parents. Here, two real numbers that belongs to two extremes produce another real number as an average that is in no way near to either.
Fig. 1: Crossover using incomplete dominance. 1) Input: Parents p1=[f11,f12.... f1n] p2=[f21, f22 ... f2n] 2) Choose genes randomly for which averaging should be done 3) Do 'X' crossover [X=1P, 2P, NP or Uniform Crossover) where genes selected in step 2 alone undergo averaging 4) Output: Child c1=[c11,c12.... c1n] Child c2=[c21, c22.... c2n) Where, fji= fj1 or fj2 or (f)1+f)2)/2 [depends on steps 2 and 3] (i=1,2 ... n) (j=Child 1 or 2)
Figure 1. Shows how incomplete dominance is incorporated into real coded GAs. Accordingly, we choose the genes for which averaging should be done. Then as shown in step 3, choose one of the following crossover operators namely, one-point, two-point, n-point and uniform crossover operators and perform the selected crossover operation to form children whose genes are either from their parents or an average value of both the parents' genes.
Crossover using Gene Memory:
"Memory is present if the state of a biological system depends on its history in addition to present conditions. If this memory is recorded in the genetic material and stably inherited through cell division, it is genetic memory." (Genetic_memory) According to this theory, history is carried over along with genes. We use this genetic memory to store previous values for a gene that contributed the best to an individual's fitness.
As shown in Figure 2, memory is being maintained for every gene. The memory is responsible for storing the best value of a gene that contributes to the best fitness in the history (i.e., from among the predecessors). Here, the term 'contribution' measures the individual gene's participation in fitness calculation and not the fitness calculation that we do considering all the genes of a chromosome together. We form an offspring at the end of 'N' generations by selecting the best gene value for each gene from the memory of all the members of the population. This is a novel approach since all other existing crossover operators discard or retain the chromosome as a whole based on the combined fitness of all the genes of the chromosomes. However, our genememory based crossover considers each gene separately rather than the chromosome as a whole.
Fig. 2: Crossover using Gene memory. 1) Input: Parents p1=[f11/m11,f12/m12.... f1n/m1n] p2=[f21/m21, f22/m22.... f2n/m2n] 2) Choose genes randomly for which averaging should be done 3) Do 'X' crossover [X=1 P, 2P, NP or Uniform Crossover] where genes selected in step 2 alone undergo averaging 4) Output: Child c1=[c11/mc11,c12/mc12.... c1n/mc1n] Child c2=[c21/mc21,c22/mc22.... c2n/mc2n] Where, mcji= mj1 or mj2 or (mj1+mj2)/2 [depending on which is contributing to the highest fitness] (i=1,2.....n) (j=Child 1 or 2)
As we described earlier, the job of the gene memory is only to record the best ever gene value of a gene so far in history. Though the gene memory keeps track of the best gene value, it does not influence the gene selection for the offspring. This is the major difference of our crossover operators against the existing operators where best gene values are passed on to the offspring. This method cuts short the search space whereas, our method helps to widen the search space so that any good solution is not left out. In addition to this, an important property of the randomness in gene selection for offspring formation is that there is an equal chance of the good gene to be circulated among the generations and to be lost in the due course as well. Our aim is to allow the growth of the search space without losing the good genes and it is achieved by the crossover based on gene memory. We get the best results using this method. However, this improvement in performance is achieved at the cost of more space for memory and time for the calculation of each gene's contribution to the fitness value.
RESULTS AND DISCUSSIONS
Learning to rank for Information Retrieval deals with the construction of a ranking model, e.g., a ranking function, that achieves the best result on test data in the sense of optimization of a performance measure taking a collection of queries, corresponding retrieved documents, and their relevance judgments provided by human annotators as input. The LETOR dataset (Liu et al, 2007), contains a 136-length vector of feature representations for the documents corresponding to the queries for which they were retrieved, their relevance judgments for the respective queries that fetched them. The dataset has been released at the website of Microsoft Research. All the algorithms are implemented in java and the experiments were done on a PIV 2.8 GHz machine with 4 GB RAM under WINDOWS-7 platform. The necessary parameters for GA that we have considered are as follows: Initial population size: 50, Termination condition: reaching generation 50, Chromosome description: 134 length vector with real encoding, fitness function: Average P@5 and P@10.
We have evaluated the P@5 and P@10 values of the GA using the baselines and our proposed crossover operators. We have evaluated them under two different cases namely i) without gene memory and ii) with gene memory. The case 'with gene memory' is not applicable for the baselines fusion, heuristic and dissociated since we included the gene memory concept only for the averaging crossover operators that we have proposed. The same holds good for the P@5, P@10 evaluation and the number of generations required to converge to a solution as well.
Figure 3 shows the P@5 values obtained for the various crossover operators in the absence and presence of gene memory. As we can see, there is a remarkable improvement in the P@5 values of all the crossover operators when the gene memory concept is used along with invariably all the four crossover operators considered. It is also clear from the figure that the uniform averaging crossover operator performs better than its counterparts irrespective of whether the gene memory is used or not.
Figure 4 shows the P@10 values obtained for the various crossover operators in the absence and presence of gene memory. The P@10 values also follow a pattern similar to the P@5 value i.e. crossover operators used along with gene memory show good performance and the uniform averaging crossover performs better than the other crossover operators both in the absence and presence of Gene memory. In general, on comparing Figure 3 and figure 4, we can see that the P@5 values are better than the P@10 values.
Figure 5 shows the number of generations taken to converge to the solution. As we can see, the uniform averaging crossover operator takes more generations to converge to a solution when compared to the others. This is attributed to the power of the uniform crossover to explore a wider area of the search space. We can also conclude that the greater the number of generations is, the better the final solution is. As we can see in the figure, all the operators take 'N' generations to form the optimal solution in the presence of gene memory. As a result of this evaluation, we conclude that our methods provide the benefit of good performance but at the cost of space and time as explained in section 3.
In this paper we have designed four new real-coded crossover operators based on the principle of 'Incomplete Dominance' where instead of choosing a gene from either parent we also choose the mean of the gene values of both the parents. We have also used the Gene memory concept to enhance the performance of all these crossover operators to keep track of the best gene values across generations. Our evaluations show that our crossover operators performed better than the baselines.
Further work could be done to minimize the space and time requirements of our new crossover operators. i.e, new crossover operators that provide better results in less number of generations with less computational overhead can be designed.
Received 12 October 2014
Received in revised form 26 December 2014
Accepted 1 January 2015
Available online 25 February 2015
Beasley, J.E. and P.C. Chu, 1996. A Genetic Algorithm for the Set Covering Problem. European Journal of Operational Research, (94): 392-404.
De Jong, K.A., 1975. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph.D dissertation, University of Michigan.
Deb, K and R.B. Agrawal, 1995. Simulated Binary Crossover for Continuous Search Space. Complex Systems, pp: 115-148.
Deb, K., A. Anand and D. Joshi, 2002. A Computationally Efficient Evolutionary Algorithm for RealParameter Evolution. Evolutionary Computation Journal, 10(4): 371-395.
Eshelman, L.J. and J.D. Schaffer, 1993. Real-Coded Genetic Algorithms and Interval Schemata. In: D. L. Whitley., (Ed.), Foundation of Genetic Algorithms II, Morgan Kaufmann, San Mateo, CA, pp: 187-202.
Gardner, E.J., M.J. Simmons and D.P. Snustad, 2006. Principles of Genetics. 8th Ed, John Wiley & Sons, pp: 740.
Herrera, F. and M. Lozano, 1996. Adaptation of Genetic Algorithm Parameters Based on Fuzzy Logic Controllers. In: Genetic Algorithms and Soft Computing, pp: 95-125.
Herrera, F., M. Lozano and J.L. Verdegay, 1999. Dynamic and Heuristic Fuzzy Connectives based Crossover Operators for Controlling the Diversity and Convergence of Real-Coded Genetic Algorithms. International Journal of Intelligent System, 2: 1013-1041. http ://en.wikipedia. org/wiki/Genetic_memory_(biology)
Liu, T.Y., J. Xu, T. Qin, W. Xiong and H. Li, 2007. LETOR: Benchmarking Learning to Rank for Information Retrieval. In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, Amsterdam, Netherlands.
Michalewicz, Z., 1992. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, New York.
Mitchell, M., S. Forrest and J.H. Holland, 1991. The royal road for genetic algorithms: Fitness landscapes and GA performance. In Toward a practice of autonomous systems: proceeding of the first European conference on artificial life, Cambridge (MA): The MIT Press.
Muhlebein, H. and D. Schlierkamp-Voosen, 1993. Predictive Models for Breeder Genetic Algorithms in Continuous Parameter Optimization. Evolutionary Computation, 1(1): 25-49.
Ono, I. and S. Kobayashi, 1997. A Real-Coded Genetic Algorithm for Function Optimization using Unimodal Normal Distribution Crossover. In T. Back (Ed.)., Proceedings of the Seventh International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, pp: 246-253.
Radcliffe, N.J., 1991. Equivalence Class Analysis of Genetic Algorithms. Complex Systems, 2(5): 183-205.
Sinha, A., S. Tiwari and K. Deb, 2005. A Population Based Steady State Procedure for Real Parameter Optimization. KANGAL Technical Report no. 2005004.
Spears, W.M., 1995. Adapting Crossover in Evolutionary Algorithms. In Proceedings of the fourth Annual Conference on Evolutionary Programming, San Diego, CA, USA, pp: 367-384.
Syswerda, G., 1989. Uniform Crossover in Genetic Algorithms. In J. D. Schaffer (Ed.)., Proceedings of the third International Conference on Genetic Algorithms, San Mateo (CA): Morgan Kaufmann Publishers.
Tsutsui, S., M. Yamamura and T. Higuchi, 1999. Multi-Parent Recombination with Simplex Crossover in Real-Coded Genetic Algorithms. In: W. Banzhaf., J. Daida., A. Eiben., M. Garzon., V. Honavar., M. Jakiela., R. Smith (Eds.)., Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-1 1999), pp: 657-664.
Voigt, H.M., H. Muhlenbein and D. Cvetkovic, 1995. Fuzzy Recombination for the Breeder Genetic Algorithms. In: L. J. Eshelmen (Ed.)., Proceedings of the 6th International Conference on Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, pp: 104-111.
Vrajitoru, D., 1998. Crossover Improvement for the Genetic Algorithm in Information Retrieval. Information Processing & Management, 34(4): 405-415.
Wright, A.H., 1991. Genetic Algorithms for Real Parameter Optimization. In Foundations of Genetic Algorithms I, Morgan Kaufmann, San Mateo, pp: 205-218.
G. Pavai and T.V. Geetha
Research Scholar, Department of Computer Science and Engineering, Anna University, Chennai. Tamil Nadu. India
Corresponding Author: G. Pavai, Research Scholar, Department of Computer Science and Engineering, Anna University, Tamil Nadu, Chennai.
|Printer friendly Cite/link Email Feedback|
|Author:||Pavai, G.; Geetha, T.V.|
|Publication:||Advances in Natural and Applied Sciences|
|Date:||Jun 1, 2015|
|Previous Article:||Sign gesture representation using and-or tree.|
|Next Article:||The COLLID based intrusion detection system for detection against DDOS attacks using trust evaluation.|