Breeding Science 56 : 371–377 (2006)AntMap: Constructing Genetic Linkage Maps Using an Ant Colony Optimization AlgorithmHiroyoshi Iwata* and Seishi NinomiyaNational Agricultural Research Center, 3-1-1 Kannondai, Tsukuba, Ibaraki 305-8666, JapanHigh-throughput assay of molecular markers enables to utilize large amounts of markers in linkage mapping. To incorporate numerous markers into a linkage map, the development of a highly efficient method for link- age mapping is indispensable. When the number of loci is large, locus ordering is a major difficulty in link- age mapping, since the number of possible orders becomes very large. To address this problem, we developed a new algorithm for locus ordering and used it in a newly developed computer program called AntMap. The algorithm is based on ant colony optimization, which is a set of metaheuristic algorithms inspired by the co- operative behavior of real ants in finding the shortest path from their nest to a food source. Using this algo- rithm, AntMap seeks the linear order of loci that minimizes the sum of adjacent recombination fractions or that maximizes the log likelihood of locus order. Analyses based on simulated data sets indicated that our algorithm displayed a high efficiency level. The high performance of the algorithm enabled to save time and labor, and also to validate an estimated order by bootstrap tests. Our algorithm and AntMap should enable to construct high-throughput systems for linkage mapping. AntMap is available under a GNU general public license at http://cse.naro.affrc.go.jp/iwatah/antmap/index.html. Source codes and executables of AntMap can be obtained there.Key Words: ant colony optimization (ACO), linkage mapping, locus ordering, traveling salesman problem (TSP), metaheuristics, computer program, simulation analysis.IntroductionLinkage maps have become an essential tool for plant and animal breeding, since they provide a good mechanism for identifying key genes involved in targeted traits. Quanti- tative trait locus analysis enables to locate genes even for complex quantitative traits on linkage maps. Information about the location of genes can be further exploited in a breed- ing program through various approaches such as marker- assisted selection, marker-assisted introgression, and map- based cloning of targeted genes. In recent years, assays of molecular markers have become highly efficient, and in- creasing numbers of markers are available in many species. Future technological innovations should accelerate this trend. To incorporate the large amount of molecular markers into a linkage map, the development of highly efficient methods for constructing linkage maps is indispensable. When the number of loci increases, locus ordering be- comes the main difficulty in linkage mapping, since the sheer number of possible orders of loci becomes very large. With n linked loci, the number is n!/2. When n = 100, the number reaches a value of 4.7 × 10157. In such a case, it is very difficult to examine all possible orders to determine theoptimum one. Even when n = 30, we must use an efficient search algorithm that enables to identify a practically good order within a practical time period. The problem of searching for the best (i.e., optimum) locus order can be considered as a special case of the travel- ing salesman problem (TSP) (Liu 1998). In the TSP, given a set of cities and the distances between each pair of them, the objective is to identify a round trip with minimum total length in which each city is visited exactly once. In locus or- dering, loci and distances between loci, e.g., the recombina- tion fraction, can be considered to correspond to the cities and distances between cities, respectively, in the TSP. To solve the TSP, a number of efficient search algorithms, such as simulated annealing (Kirkpatrick et al. 1983), tabu search (Glover 1989) and genetic algorithms (Holland 1975), have been proposed. In recent years, in some methods for linkage mapping, heuristic algorithms for the TSP have been used to solve the problem of locus ordering (e.g. Schiex and Gaspin 1997, Jansen et al. 2001, Mester et al. 2003a, 2003b). In the present paper, we introduced an algorithm for locus ordering based on ant colony optimization (ACO) (Dorigo et al. 1996), which is a set of algorithms inspired by the cooperative behavior of real ants in finding the shortest path from their nest to a food source. In recent years, ACO has been used successfully to solve various types of discrete optimization problems (Dorigo and Stützle 2004), and it is one of the best algorithms for the TSP (Dorigo et al. 1996).Communicated by K. Yonezawa Received June 14, 2006. Accepted August 20, 2006. *Corresponding author (e-mail:

[email protected])Iwata and Ninomiya372To apply ACO to linkage mapping, we devised algorithms to handle the locus-ordering problem, to satisfy the conven- tional maximum likelihood criterion, and to be robust against missing marker-genotype data. We used our algo- rithm in a new computer program that we named AntMap. We conducted simulation studies to determine the perfor- mance of our algorithm. We describe here the details of our algorithm, the outline of the AntMap system, and the results of simulation studies. Finally, we examined several advan- tages of our system.AlgorithmACO is a metaheuristic algorithm in which a colony of artificial ants cooperates in finding a good solution to a dis- crete optimization problem such as the TSP. In the TSP, a given set of n cities has to be traversed so that every city is visited exactly once and the tour ends in the initial city. The optimization goal is to identify the shortest possible route. ACO can be applied to the TSP with the following steps: (1) Each of m ants begins a tour from an arbitrary city. (2) An ant moves to a city with a probability that is a function of the distance and the amount of pheromone already accumulated on the trail until it completes a tour. (3) Every ant secretes trail pheromone on its route, which evaporates with time. (4) The optimum tour found after iterating steps 1 to 3 T times is considered to correspond to the solution. At iteration t, an ant k in city i chooses city j to move to with the following probability:,(1)where τij(t) is the amount of pheromone on the trail between cities i and j at iteration t, dij is the distance between cities i and j, and Nk is the set of cities that ant k has not yet visited. Parameters α and β determine the relative influence of the pheromone and proximity (i.e., inverse of distance), respec- tively. After completing a tour, each ant deposits the phero- mone according to the following rule:,(2)where Q is a constant, Lk(t) is the length of the tour under- taken by the ant k at iteration t, and Tk(t) is the set of trails be- tween cities constituting the tour. The amount of pheromone on each trail is updated as follows:,(3)where ρ represents the persistence of the pheromone trails. In our system, loci and the absolute values of log likeli- hood (or the recombination fraction) between loci were considered to correspond to the TSP cities and distances be- tween cities, respectively. The locus-ordering problem, how- ever, differed from the conventional TSP in that the path was not closed (i.e., the “final leg of the journey” was not re-quired). To account for this condition, we modified the sys- tem as follows: (1) Ants secrete the pheromone also in the cities at the starting and end points of a tour. (2) Ants select the starting point not randomly but with a probability that is a function of the amount of pheromone. That is, at iteration t, an ant k starts from city i with the following probability:,(4)where τi(t) is the amount of pheromone at city i at iteration t. The rule for secreting the pheromone in a city is as follows:,(5)where Sk(t) is the set of starting and terminal cities of a tour. The amount of pheromone in each city is updated as follows:.(6)When a recombination fraction between loci is consid- ered to correspond to the distance between the TSP cities, our system seeks the order minimizingSARF,(7)where N is the number of loci, and θi,i+1 is the estimate of the recombination fraction between adjacent (i.e., i th and (i+1) th) loci in a given order. This criterion, called the sum of adjacent recombination fractions (SARF), is used as a criterion in locus ordering (Olson and Boehnke 1990, Falk 1992, Liu 1998, Mester et al. 2003b). When the absolute values of log likelihood (ALL) between loci are considered to correspond to the distance between the TSP cities, our system seeks the order minimizingALLni,i+1{θi,i+1logθi,i+1 + (1 − θi,i+1)log(1 − θi,i+1)}, (8)where ni,i+1 is the number of informative sample sizes for es- timating the recombination of adjacent (i.e., i th and (i+1) th) loci in a given order. The log likelihood always takes a neg- ative value because both θi,i+1 and (1 − θi,i+1) are lower than 1. Thus, the minimization of Eq. 8 is synonymous with the maximization ofLLni,i+1{θi,i+1logθi,i+1 + (1 − θi,i+1)log(1 − θi,i+1)}. (9)This is the log likelihood (LL) for a locus order on the as- sumption that there is no interference, i.e., recombination events occur independently for each interval. That is, in this case, the algorithm seeks the optimum order based on the conventional maximum likelihood criterion. An optimization algorithm searches for an optimum solution by iteratively transforming a current candidate so- lution into a new solution, and hopefully results in the iden- fitication of a global optimum. An algorithm, however, ispijkt ( )τijt ( )[]αdij[]β–τilt ( )[]αdil[]β– lNk∈∑------------------------------------------------=jNk∈∀∆τijkt ( )Q Lkt ( )⁄ 0=if i j ,()Tkt ( )∈if i j ,()Tkt ( )∉τijt1+()ρτijt ( )∆τijkm k1=t ( )∑+=pikt ( )τit ( )[]ατit ( )[] i∑α--------------------- -=∆τikt ( )Q Lkt ( )⁄ 0=if iSkt ( )∈if iSkt ( )∉τit1+()ρτit ( )∆τikm k1=t ( )∑+=θi i1+,i1=N1–∑=i1=N1–∑=i1=N1–∑=Linkage mapping based on ant colony optimization373sometimes trapped by a poor local optimum in which the cri- terion value of a solution is much worse than that of a global optimum solution. In our preliminary studies, the system de- scribed above sometimes converged to a poor local optimum when some observations were missing. That is, an order es- timated by the system sometimes showed a worse criterion value than a simulated (i.e., true) order. To enable our sys- tem to avoid such a situation, we introduced a mechanism of random and elite selections, as suggested by Nakamichi and Arita (2004). Random selection is a simple operation where- by a city is selected among unvisited cities with equal prob- ability and thus enables the system to escape from a poor local optimum. The random selection rate λ is the probability that random selection occurs each time an ant selects the next city or the starting point. That is, with the probability λ, an ant selects the next city or the starting point randomly re- gardless of the amount of pheromone, while otherwise (i.e., with the probability 1−λ), the ant selects the next city with Eq. 1 or the starting point with Eq. 4. Elite selection, which is opposite to random selection, places extra emphasis on the best tour found so far, and accelerates the convergence to an optimum. That is, the elite selection covers the shortfall in the convergence force caused by the random selection. The elite selection can be achieved by the modification of the pheromone-updating rules. That is, to introduce a mecha- nism of elite selection, the pheromone-updating rules are modified as follows:,(10),(11),(12),(13)where σ is a weighting parameter, L*(t) is the length of the best tour identified at iteration t, T*(t) is the set of trails be- tween cities that constitutes the best tour, and S*(t) is the set of the starting and terminal cities of the best tour. In our sys- tem, both random and elite selections are implemented in combination. As suggested by Nakamichi and Arita (2004), this system is expected to improve the performance of the ACO algorithm in the balance between diversification (i.e., exploration of the search space) due to random selection and intensification (i.e., exploitation of the previous solutions) due to elite selection. In summary, the algorithm with the random and elite selections proceeds as follows: (1) Each of the m non-elite ants selects the starting city randomly (with probability λ) or according to the probability described in Eq. 4 (with proba- bility 1−λ). (2) During a tour, each of the non-elite ants se- lects the next city randomly (with probability λ) or accord-ing to the probability described in Eq. 1 (with probability 1− λ). (3) After completing a tour, every non-elite ant secretes pheromone on its route according to Eqs. 2 and 5. σ elite ants also secrete pheromone on the route of the best tour accord- ing to Eqs. 11 and 13. The rules for updating the amount of pheromone are included in Eqs. 10 and 12. (4) After iterating steps 1 to 3 T times, the shortest tour is considered to corre- spond to the solution (i.e., an estimated locus order). In the algorithm without the random and elite selections, λ and σ are set to zero.Computer programAntMap consists of a graphical-user-interface (GUI)- based application for linkage mapping (Fig.1). AntMap is written in Java and runs on various operating systems (e.g., Windows, Linux, Solaris, or Mac OS) under the Java 2 Plat- form Standard Edition (J2SE) Java Runtime Environment (JRE). That is, AntMap can be run on any system in which J2SE JRE has been installed. AntMap enables to analyze data derived from proge- nies of several types of crosses, including F2 intercrosses, F2 backcrosses, recombinant inbred lines, and doubled haploid lines. The input file format of AntMap is identical to the *.raw files required by MAPMAKER (Lander et al. 1987). The current version of AntMap, however, does not support two types of crosses, viz., F3 intercross by self-mating (F3 self) and recombinant inbred lines by sib-mating (RI-sib), which are supported by MAPMAKER/EXP. Before locus ordering, a user can perform a segregation test and linkage grouping. In linkage grouping, two grouping methods, i.e., “nearest neighboring locus” and “all combi- nations”, can be selected. In the former, a group is formed by sequentially combining a locus which shows the lowest recombination value against it. This algorithm is used in MAPL (Ukai et al. 1991). In the latter, similar results to those of the “group” command of MAPMAKER can be ob- tained (Lander et al. 1987). In locus ordering, a user can choose two optimization criteria: LL (ALL) or SARF. As described above, AntMap will seek a locus order that maximizes LL (i.e., minimizes ALL) or minimizes SARF. The map distance between mark- ers is calculated from the order estimated by using Haldane’s map function (Haldane 1919) or Kosambi’s map function (Kosambi 1944). Finally, the estimated linkage map is graphed as lin