# The accuracy of phylogenetic estimation using the neighbor-joining method.

Abstract. - We studied the factors affecting the accuracy of the neighbor-joining (NJ) method for estimating phylogenies by simulating character change under different evolutionary models applied to twenty different 8-OTU tree topologies that varied widely with respect to tree imbalance and stemminess. The models incorporated three evolutionary rates - constant, varying among lineages, varying among characters - and three evolutionary contexts concerning patterns of character change relative to speciation events - phyletic, speciational, and punctuational. All combinations of the rate and context models were studied. In addition, three different absolute rates of change were investigated. To measure the accuracy, the strict consensus index was computed between the estimated tree and the tree topology along which the data had been generated. The results were analyzed by analysis of variance and compared to a previous study that evaluated UPGMA clustering and maximum parsimony (MP) as phylogenetic estimation techniques. We found evolutionary context and tree imbalance to be the most important factors affecting the accuracy of the NJ method. NJ was more accurate than UPGMA or MP in terms of the average strict consensus index over all treatments. However, no one method was more accurate than the other two for all combinations of treatments. Higher absolute rate of change generally resulted in higher accuracy for all three methods.Key words. - Accuracy, maximum parsimony, neighbor-joining, phylogenctic estimation, simulation, UPGMA.

The neighbor-joining (NJ) method is a phylogenetic estimation procedure proposed by Saitou and Nei (1987). We describe its algorithm in the Methods section below. The accuracy of NJ estimation has not been extensively studied. Most of the published studies have employed models based on molecular data. Saitou and Nei (1987) studied the NJ method using only two different tree topologies (the most balanced and the most unbalanced) for 8 OTUs. They compared the accuracy of NJ with UPGMA clustering (Sokal and Michener, 1958), the distance Wagner method (Farris, 1972), the modified Farris method (Tateno et al., 1982; Faith, 1985), Sattath and Tversky's 1977) method and Li's (1981) method. The data consisted of DNA sequences simulated along three different length variations of the original two tree topologies and one treatment representing varying rates of change in lineages. Saitou and Imanishi (1989) extended the study to include the Fitch-Margoliash method (Fitch and Margoliash, 1967), maximum parsimony (MP) for DNA (Eck and Dayhoff, 1966; Fitch, 1977), and the maximum-likelihood method (Felsenstein, 1981). The tree topologies used were the same as those of the previous study (Saitou and Nei, 19 8 7). Li et al. (1987) included the NJ method in their study of accuracy of phylogenetic estimation methods under unequal rates of evolution. However, they only examined 4-OTU trees and 3-OTU trees with 2-OTU outgroups in their study. Other work studying the accuracy of the neighbor-joining method has been reported (Nei, 1991; Jin and Nei, 1990; Sourdis and Nei, 1988), but the studies have not included the extensive models we examine here.

In our work we have applied phylogenetic estimation methods to multistate characters of the kind common in nonmolecular systematics. Recently, Rohlf et al. (1990) reported an extensive study examining the various factors that can affect the accuracy of phylogenetic estimation by maximum parsimony (MP) and UPGMA clustering. The data were obtained by simulating the evolution of multistate characters along 20 different 8-OTU trees covering a wide range of topologies. The characters were also simulated under different models of evolutionary character change (see Methods below). They calculated the accuracy of the estimated trees in terms of their closeness to the true tree topology, and then analyzed the results with respect to the treatments used to generate the characters. This resulted in identification of various evolutionary factors that affect the accuracy of these phylogenetic estimation methods.

The present study extends the earlier study of Rohlf et al. (1990) to include the NJ method. We employ the same datasets used by these authors, but add a new treatment and investigate the factors under which a particular method (NJ, MP, or UPGMA) is the best estimator of the true phylogenetic tree. The maximum parsimony method explicitly tries to maximize an optimality criterion, whereas NJ and UPGMA may be associated with optimality criteria, but do not explicitly maximize these properties. In our study, we treat the three different methods as tree-building algorithms, and we are not concerned with how well they approximate some optimality criterion associated with them. We then evaluate how closely the subset relationships obtained by each method match those of the true tree. Therefore, we emphasize that our study compares three different tree building algorithms whose definitions include all of the various options that we employ.

Materials and Methods

Dataset Generation

The dataset used by Rohlf et al. (1990) to evaluate the accuracy of UPGMA and MP as phylogenetic estimation methods was constructed by simulating character evolution along 20 different 8-OTU tree topologies. The selected topologies approximately covered the possible parameter space of two indices characterizing tree topology. The first is tree imbalance (Colless, 1982 as quantified by Shao, 1983). The index has been called [SI.sub.b] by Rohlf et al. (1990) and [I.sub.c] (1) by Shao and Sokal (1990) in their review of the topic. The index is calculated by examining each HTU and calculating the difference in numbers of descendent OTUs between the left and right subtending branches. This difference is summed over all HTUs in a given tree including the root HTU. The measure may also be standardized by dividing with the largest possible value for a given number of OTUs. The imbalance index measures the extent to which branches fail to divide the taxa into equal-sized groups. The second index is noncumulative stemminess, [St.sub.N] defined by Rohlf et al. (1990). For each HTU it calculates the ratio of the length its stem (the distance in time units to the most recent ancestor of the HTU) to the sum of the stem length and the combined length of its descendent nodes. The measure is averaged over all HTUs of a tree. It is a measure of the average distinctness of all the taxonomic subsets of a tree.

Multistate ordered characters (integer values 0, [+ or -] 1, [+ or -] 2,...) were evolved along these tree topologies derived from an ancestral state of zero. For the phyletic model described below, the characters were allowed to change by + 1 or - 1 with a fixed probability at each time step. For the speciational and punctuational models, the amount of character change was sampled from a Poisson distribution with its parameter adjusted to match the total expected amount of change in the phyletic model. The probability of direction of change, positive or negative, was set to 0.5. Character state changes were governed by three treatments with three levels each.

The first treatment represented three different models for the rate of character change along the tree branches. These were: (1) constant rate, where the probability of change is constant for all characters along all branches of the tree; (2) rate varying among lineages, where different branches differ in their probabilities of change but are identical for all characters within a branch; and (3) rate varying among characters, where the probability of change is different for each character but identical for all lineages. The second treatment represented three models that differ in where character state change occurs along a phylogenetic tree. Following Fiala and Sokal (1985), we refer to these as evolutionary contexts. These authors distinguished among: (1) the phyletic context, where character state changes may occur anywhere along a given tree; (2) the speciational context, where character state changes are assumed to occur only during a speciation event and in both daughter lineages; and (3) the punctuational context, where character state changes are assumed to occur only in one daughter species at the time of a speciation event. Finally, the third treatment represented three different levels of absolute probability of change at each character transition event. The levels were set to correspond to an average of 0.9, 1.8, and 3.6 state transition events per character over the entire tree. Because the absolute rate of 1.8 expected character change corresponds to that employed by Rohlf et al. (1990), we call it the standard rate, and term the other two rates for which datasets were newly simulated for this study, half and double, respectively. Analyzing the simulated data, we found an average of 0.692, 1.262, and 2.01 state changes per character, respectively, for the half, standard, and double levels of absolute rate of change. The values are smaller than the expected values because they were calculated from the observed character state distribution at the OTUs, i.e., the tips of the tree. Therefore, the actual number of character state changes are obscured by reversal events. The three levels of each of the three treatments yielded a factorial model of 27 different combinations. Therefore, the 20 tree topologies and 27 treatment combinations yielded a total of 540 different fundamental treatment units. For each of these 540 different units we generated 30 replicate datasets of 50 characters, resulting in 16,200 datasets.

Unfortunately, the original datasets for trees 7 and 8 studied by Rohlf et al. (1990) were lost. We regenerated these datasets by using the same parameters and the same computer program as in the original study. The only difference was that the random number seeds were different from those used earlier. Our procedure was therefore equivalent to generating more replicates of the original data. We tested the consistency of the new datasets with the old datasets for trees 7 and 8 by applying the UPGMA algorithm to the new datasets. Comparison of the mean accuracy from the new results with the original results yielded no statistically significant difference.

The Neighbor-Joining Method

The NJ algorithm, similar to agglomerative clustering (Sneath and Sokal, 1973), starts with a matrix of distances among the tOTUs and a star-like graph (corresponding to an unrooted and completely unresolved phylogenetic tree). The closest pair of OTUs is found, merged into a new HTU (hypothetical taxonomic unit; putative ancestor), and the original pair of OTUs is deleted until the matrix is reduced to a single OTU and the graph is fully resolved. The "closest pair" of OTUs is defined as the pair of OTUs (or HTUs from some previous step) whose merging would result in the maximum reduction in length of the graphs. Examination of Saitou and Nei's (1987) equation (number 4, p. 407) for calculating distances shows that their distance measure can be decomposed into three components. The first component is simply the sum of all the pairwise distances divided by the number of OTUs. Since this value is shared by any pair of OTUs, it will not affect the sequence in which pairs of OTUs are joined. For any pair of OTUs i and j, the second component is the original distance between i and j, while the third component is the sum of all pairwise distances excluding i and j. The second and third component values are weighted such that the former receives a weight of t - 2 (t = number of OTUs) while the latter receives a weight of 1. This transformation has the effect that the distances between pairs of OTUs far from the centroid of all OTUs are contracted, while the distances between those near the centroid are expanded. Therefore, if any pairs of OTUs have undergone rapid evolution and are far from the rest of the OTUs, the NJ method compensates for the unequal rate of change. After the OTU pair with the shortest distance as defined by this transformation is determined, the distance between the new node (the HTU resulting from a merging of OTUs i and j) and an existing node k is taken as the simple average of the distances [d.sub.ik] and [d.sub.jk]. This is a weighted average in terms of the original distance matrix, since it does not take into account the fact that OTUs (or HTUs) i and j may represent more than one original OTU.

The NJ algorithm is relatively fast. For large numbers t of OTUs, the computational effort increases proportional to [t.sup.2]. We programmed the neighbor-joining algorithm in PASCAL from the algorithm given in Saitou and Nei (1987) with the additional assistance of a FORTRAN program obtained from Dr. M. Nei. Recall, that in the NJ method, trees are constructed by linking together the two OTUs or HTUs that are the closest mutual "neighbors." At any step, several equally close neighbors may exist. Sourdis and Nei (1988) also point out this problem and they approach it by varying the input order of the OTUs. We explicitly modified the algorithm so that all such "tied" trees may be found. This is similar to the modification made to the UPGMA clustering algorithm described in Rohlf et al. (1990), based on earlier work by Hart (1983). The algorithm for NJ and UPGMA with ties are available as part of the NTSYSpc package of programs (Rohlf, 1990).

For each of the 16,200 datasets, we first constructed a distance matrix based on the average taxonomic distance (Sneath and Sokal, 1973), the same distance measure used by Rohlf et al. (1990) for UPGMA trees. The tie-modified NJ algorithm was then used to estimate the phylogeny from the distance matrix. Since the NJ algorithm produces an unrooted tree, we rooted the trees using midpoint rooting by choosing the midpoint of the longest path between any pairs of OTUs. Branch lengths were calculated following the algorithm in Saitou and Nei (1987). We used this method instead of including an outgroup in the simulations since all character states were initialized at the origin (zero) and allowed to change in both a positive and negative direction. Midpoint rooting could be questioned in the cases where there is variation of rate among lineages. However, we implemented this treatment by superimposing a random walk of the rates over each lineage. This meant that although parts of the branches of a tree may evolve at different rates, the overall divergence of the branch tips would not be very far from that expected for an equal rate model. Empirical examination of our data also reveals no evidence that there is a special bias or deviation for this particular treatment. Furthermore, this rooting is also identical to that used by Rohlf et al. (1990) for MP trees, we needed to make our treatment of NJ comparable.

Other Methods

The newly simulated 10,800 datasets at the half and double absolute rates were also subjected to UPGMA clustering of average taxonomic distances, maximum parsimony analysis of discrete characters using PAUP (Swofford, 1983) with exhaustive search (BANDB option, branch and bound) followed by midpoint rooting, in addition to the neighbor-joining of average taxonomic distances followed by midpoint rooting. The methods and programs employed were those of Rohlf et al. (1990). Using the strict consensus index (Rohlf, 1982) as the measure of agreement, all 16,200 estimated trees were compared to the tree topologies along which the data had been generated. (High values indicate better performance.) If a particular dataset resulted in more than one estimated tree because of ties, we constructed a majority rule consensus tree (Margush and McMorris, 1981) from all of the possible tied trees. The majority rule consensus tree was used since it yields a more resolved tree for those methods that produce many multiple trees. Some of the results reported below are not based on all 16,200 datasets. Such cases are duly noted. Where we did not carry out the analysis on the full dataset, we report the results based on the 5,400 datasets subjected to the standard (absolute) rate treatment. These analyses will provide results comparable to those of Rohlf et al. (1990).

Results

Neighbor-Joining Method

The mean strict consensus indices (based on the 30 replicates) for each of the 20 trees and nine different treatment combinations are shown in Table 1 for the standard rate of change. As discussed below, the half and double rates of change generally follow the pattern of the standard rate of change and are summarized in a shortened table (Table 3). For the phyletic context, NJ could more accurately estimate the more stemmy and balanced trees than trees with lower stemminess and greater imbalance as was the case with UPGMA and MP studied earlier (Rohlf et al., 1990). The strict consensus index averaged over all trees and treatments (for 16,200 datasets) was 0.578. We carried out two separate analyses of variance. The first ANOVA was based on the results from only the standard rate of change for comparison with the results of Rohlf et al. (1990). The second ANOVA, based on all 16,200 datasets, used the three rates as an added independent model variable. The results of the analysis of variance using the standard rate of change are shown in Table 2. The most important factors are (in descending order): context, imbalance, stemminess x context interaction, stemminess, and imbalance x context interaction. Out of 16 possible sources of variation, 10 are significant at P [is less than or equal to] 0.001 and one is significant at just P [is less than or equal to] 0.05. The results of the ANOVA of the full dataset are described below in a separate section.

Table 2. The results of analysis of variance on the standard rate dataset. The dependent variable is the accuracy of the neighbor-joining method measured by the strict consensus index. The design of the treatments is not balanced because four extra trees were added to furnish extreme values of stemminess and imbalance. Therefore, the sums of squares of each independent source do not necessarily add up to the total model sums of squares. NJ Source of variation df SS Sig. Model 35 211.64 *** Stemminess 1 10.56 *** Imbalance 1 22.52 *** Context 2 46.54 *** Rates 2 2.80 *** Imbal x Stem 1 0.08 Stem x Context 2 20.74 *** Stem x Rate 2 0.03 Imbal x Context 2 3.89 *** Imbal x Rate 2 0.61 *** Context x Rate 4 0.77 *** Imb x Stem x Ctxt 2 0.48 *** Imb x Stem x Rate 2 0.17 Stem x Ctxt x Rate 4 0.06 Imb x Ctxt x Rate 4 0.23 *** Imb x Stem x Ctxt x Rate 4 0.30 * Within 5,634 160.13 * -0.01 < P [is less than or equal to] 0.05; *** = P [is less than or equal to 0.001.

Comparing the Accuracy of Three

Phylogenetic Estimation Methods

By combining the results from Rohlf et al. (1990) with those of this study we are able to compare the accuracy (expressed as mean strict consensus index) of three different methods used to estimate phylogenetic trees. Figure la shows the mean strict consensus index for each estimation method for the standard rate of change. Figure 1a is arranged into nine groups representing the nine different combinations of the rate and context treatments. Since Figure 1a conveys large amounts of information and is therefore somewhat confusing, we direct the reader's attention first to Figure 1b where the layout of Figure 1a is explained. Within each small box enlarged in the top portion of Figure 1b the top (white), the middle (gray), and bottom (black) represent the accuracy for UPGMA, MP, and NJ, respectively. The width of each small box is scaled to a maximum strict consensus value of 1.0 (perfect accuracy). To the left of each small box, the letters U, P, and N for UPGMA, MP, and NJ, respectively, indicate which method is the most accurate for that particular combination of tree topology and evolutionary model. Each box represents a single tree from our 20 different tree topologies. The middle portion of Figure 1b depicts the H-shaped arrangement of 20 boxes representing the 20 topologies. The small boxes are stacked such that from bottom to top, stemminess increases for the trees tested, while from left to right imbalance increases. Each group of 20 boxes representing 20 trees is replicated in a three by three arrangement for the nine different treatment combinations as shown at the bottom of Figure 1b.

Figure 2 shows the relative accuracy as marginal means for two of the treatments, rates (left) and contexts (right). We do not present a similar figure for the results of the three absolute rates of change, since the pattern is simple to describe verbally (see below).

In general, the figures show that no one method is consistently better than the other two methods over all the evolutionary models studied. The three methods apparently differ in their sensitivity to various factors. UPGMA seems to work better than the others when the character evolution is phyletic and the tree topology is unbalanced. MP is more accurate than the others under the phyletic model when the trees are more balanced and especially when the rate of change varies among characters. NJ is more accurate than the other two under the punctuated and speciational context models. Overall, NJ is the most accurate, with the highest average consensus index, which agrees with the findings of Sourdis and Nei (1988). Of the 540 different topology/evolutionary model combinations, UPGMA was most accurate 63 times, MP 126 times, and NJ 331 times (there were 20 two-way ties). However, such direct counts can be somewhat misleading as can be seen by the slight differences between the lengths of bars in many of the boxes.

Within the three different rate models, all three methods are most accurate when the rate of change is constant, the mean consensus index yielding a value of 0.567. When rate of change varies among characters the mean consensus index is 0.549. The average accuracy is lowest when rates vary among lineages (mean = 0.509). Under the three different evolutionary context models, the highest accuracy for all three methods is found under the speciational context model (mean = 0.682), the second highest under the phyletic model (mean = 0.536), and the lowest accuracy under the punctuated model (mean = 0.407).

On average, the greatest differences between the most and the least accurate method are found when the rate of character change varies among lineages and under the punctuational context model. In these cases, NJ is usually most accurate, and UPGMA least accurate. The differences in accuracy are especially pronounced within the punctuational context model when the true trees are more balanced. As shown in Figure 1a, all of the methods are more sensitive to changes in imbalance under this model. This is also true to a lesser degree under the speciational context model.

Effects of Absolute

Evolutionary Rate

To conserve space the results for half and double the absolute rates of evolution are not shown in detail; they can be described in comparison to the results presented above and in Rohlf et al. (1990). An analysis of variance was performed with absolute evolutionary rate as an added factor. For the terms that are in common, the results of the tests of significance for UPGMA are the same as shown in table 4 of Rohlf et al. (1990), except that the following terms are no longer significant: imbalance x rate, context X rate, stemminess x imbalance x context, and stemminess x imbalance x rate. Of the new terms (absolute rate and all of its interactions with the terms listed in Table 2), only absolute rate and the absolute rate x context interaction terms are significant and are at the 0.001 level. For MP the term for imbalance x rate is now significant but only at the 0.05 level and context x rate is now not significant in comparison to the results given in Table 4 of Rohlf et al. (1990). Both absolute rate and the absolute rate x context interaction terms are significant at the 0.001 level. For the NJ method imbalance x rate is only significant at the 0.05 level and context x rate and imbalance x stemminess x context x rate are not significant. Of the new terms, the only ones that are significant are absolute rate at the 0.001 level and absolute rate x context at the 0.01 level.

The mean consensus values for the two-way tables of absolute rate against evolutionary context are shown in Table 3 for the three methods to illustrate the interaction reported above. The consensus values are means averaged over all trees and rate models. This table shows that for the phyletic and the speciational contexts mean accuracy for all three methods is highest at the double rate of evolution. For the punctuational context the accuracy for all methods is highest at the standard rate. For the NJ method in the phyletic context the accuracy for the standard rate is distinctly lower than for the half and double rates. The table also shows that the NJ method is usually the most accurate method (it is tied with MP in one case and slightly lower in another). Except for one case, UPGMA had the lowest accuracy.

Table 3. Marginal totals of the mean strict consensus index shown as a two-way table of absolute rate by evolutionary context for the UPGMA, MP, and NJ methods. Context Absolute rate UPGMA MP NJ Phyletic Half 0.502 0.512 0.558 Standard 0.485 0.508 0.503 Double 0.589 0.565 0.606 Speciational Half 0.584 0.684 0.700 Standard 0.615 0.675 0.696 Double 0.671 0.754 0.754 Punctuational Half 0.321 0.359 0.421 Standard 0.362 0.473 0.525 Double 0.339 0.420 0.438

Discussion

Although the most important factors affecting the accuracy of estimation are similar for all three methods, the order of importance of the factors varies somewhat. Evolutionary context is the most important factor (i.e., explained the most variance) for all three methods. Imbalance is the next most important factor for NJ, whereas it is the fourth most important factor for UPGMA and third for MP. This can be seen in Figure 1a, where the black bars representing mean accuracy for NJ shorten fairly rapidly as the imbalance increases (to the right of the H-shaped groups of boxes). This seems mostly due to NJ not performing well when the trees are highly imbalanced. With respect to the most important factor affecting all three methods, evolutionary context, NJ is more accurate than other methods mostly when the model is punctuational evolution. Since this model can also be seen as a case of extreme variation of rates among lineages (all changes are concentrated within only one of the daughter lineages), this is consistent with the observation that NJ is more accurate than the others when the rates of character state change vary among lineages (see Fig. 2).

All three methods have highest accuracy under the speciational context model (mean consensus index = 0.682). The character changes in the speciational (and punctuational) context model occur as a Poisson process and only during the bifurcation events (there are 7 in an 8-OTU tree). The parameter of the Poisson process [lambda] was set such that the expected number of character change events would be identical for all three context models. Therefore, if the phyletic context model had 1,000 time steps over the entire tree with probability of change p, the parameter [lambda] was set to p x 1,000/14. The number 14 is derived from the fact that there are 14 total change events for the speciational context model. However, the total expected amount of divergence for the different context models is not identical. Consider two OTUs separated by n time steps under the phyletic context model with parameter p. At each time step the probability of character state change is p with equal probability of changing by + 1 or - 1. The probability distribution for this process is given by,

1 = p/2

0 = 1 - p

-1 = p/2

The total amount of divergence after n steps is given by Y = [summation.sup.n.sub.i=1] [X.sub.i], where [X.sub.i] is the individual steps specified by the distribution given above. The expectation of square of Y is simply pn, found by taking the expectation of [Y.sup.2] and ignoring all cross products because of independence). Similar calculations for the speciational context model lead to the expectation ([lambda.up.2] + [lambda])m, where [lambda] is the parameter of the Poisson distribution and m is the number of change events (14 for 8-OTU tree). Since [lambda] m = pn, the expected amount of squared divergence is larger with the speciational context model than the phyletic context model by a factor of [(pn).sup.2]/m. For the parameter values used in the standard absolute rate treatment, the total expected divergence for the speciational context model is 13% larger than the phyletic context model. The discrepancy of expected divergence between the phyletic context model and the speciational context model is expected to increase as a function of p. We believe this is the factor that contributes to the increased accuracy with the speciational context model. Increased rate of divergence for ordered multistate characters is equivalent to having more numbers of characters as noted above.

All three methods have the lowest accuracy under the punctuational context model, which, as mentioned earlier, can be viewed as an extreme case of rates varying among lineages. We note some additional observations on the punctuational model. Under this model, the character changes happen only in one of the daughter lineages at each branching point. This means that the branch length of the daughter lineage lacking any change becomes zero, effectively collapsing that branch. Several such collapsing events at different levels of the tree will result in a tree that effectively (i.e., in terms of the character change) contains several multifurcations. We assign randomly which of the two daughter branches has the potential of character state changes with the result that multifurcating trees are fairly common. Of course, even with the punctuational model, character changes can be assigned such that the tree will not contain any multifurcations. The number of multifurcating HTUs and the degree of multifurcation at each HTU when the changes are randomly assigned to either of the daughter branches is dependent on the original topology. We believe that these multifurcations explain much of the comparatively lower accuracy of the various methods under the punctuational model versus other models.

The maximum parsimony method resulted in 2,588 multifurcating trees, whereas UPGMA yielded 770 such trees, and for NJ produced 23. This might have been a factor in the relative accuracies of the different methods. For the partially resolved trees it would have been interesting to distinguish between cases where two trees differ and conflict, and those where the two trees differ but are consistent. However, owing to the complexity of managing this large number of datasets, we were not able to investigate this matter. The three different methods also tend to preferentially produce trees of specific topology. The 8-OTU tree has 10,395 possible rooted and labeled topologies. However, if only the unlabeled topologies are counted, there are only 23 possible topologies. The trees produced by each method was classified into one of the 23 possible unlabeled topologies. Each of the 23 unlabeled topologies can also be assessed for its imbalance index which characterizes the topology. Figure 3 shows the frequency of each kind of erroneous (i.e., the frequency of each tree is counted only if the estimated tree differs from the true tree) unlabeled topology produced by the three methods. The same figure also shows the frequency of the true trees when classified according to the imbalance index of their unlabeled topologies. As can be seen in Figure 3, all methods tend to produce a surplus of balanced topologies when they are in error. The degree of bias towards balanced trees is more pronounced for the MP and NJ methods than for UP method. This may partially explain why the UP method tends to be relatively more accurate with imbalanced trees than with the balanced trees. From these results, it would seem that both MP and NJ methods (but especially MP) have preferential bias for producing relatively balanced trees compared to the UP method.

The overall means for the half, standard, and double absolute rates of change are, respectively, 0.516, 0.538, and 0.571 averaged over three estimation methods and all other treatments. Although when the means are broken down as in Table 3, the absolute rates show some interactions with the context models, in general, the accuracy rises with increased rates of change. This may seem counterintuitive since we expect higher rates of change to add noise to the data. However, our general model of evolution is one of an ordered character state change increasing or decreasing in small numbers of steps. Therefore, the data can contain mis-information from two different sources. First, is the reversal of characters toward the original state of zero. Second, is the loss of information through saturation when the character reaches some boundary state. The first type of noise depends on the probability taking of forward or backward steps at each change event. Since these were always set equal to each other, the absolute rate of change has no effect. The second kind of loss, saturation, could in theory occur in our model since we artificially set a limit of 32 possible states (16 positive and 16 negative). While this limit to the number of states was set to meet the requirements of the computer program, the bound was in fact never reached in our simulations, and its value is irrelevant. In practice, there were no limits. The average number of character state changes was 2.0 per character per tree.

Because these two sources of noise can be discounted, increasing the absolute rate of change actually increases the information in our datasets, since for a fixed number of ordered characters, more states increase the resolution of the dataset. This can be seen easily when we translate the ordered character set into binary additive characters. The 50 characters employed per tree will translate into more and more binary characters as the number of states in each character increases.

It should be emphasized that the rate parameter of character-change models for multistate ordered characters is fundamentally different from that for binary characters used in more theoretical studies, and cannot be interchanged. The term "rate" in binary characters usually refers to the transition probability of state changes, whereas for multistate ordered characters, its meaning is more ambiguous. Intuitively, rate should measure the expected amount of distance between two (or more) OTUs after a fixed period of character evolution. For binary or unordered characters, this distance has a limited bound whereas with ordered characters no bound exists as long as there are no limitations to the number of steps. The exact meaning of "character change rate" depends on the kind of character and the model of change.

As mentioned above, the accuracy of the NJ method over all treatments was 0.573. This was slightly better than MP with an overall accuracy of 0.550, which in turn was better than UPGMA with an overall accuracy of 0.496. The mean accuracy of all three methods averaged over all treatments is 0.542. This value is somewhat discouraging, indicating that over the evolutionary models studied, we can only expect topological similarity between the true tree and the estimated tree to be slightly over 50%. However, as noted above, the punctuated model we used is rather extreme. The true tree may be effectively a multifurcating tree, lowering the strict consensus index. Leaving out the punctuational context treatment, the average accuracy rises to 0.609. Another encouraging fact is that, as discussed above, the accuracy tended to increase as we raised the absolute rate of change. Since the absolute rate of change indicates to a degree the amount of information in our dataset, we can reasonably expect the accuracy to increase with more characters.

We can use the mean accuracy of each method applied to the 30 replicates of any one tree under a particular factor combination, to obtain estimates of the expected accuracy for the most accurate method, the second most accurate and the least accurate method. Averaged over all treatments, the mean difference is 0.077 between the most accurate and the second most accurate methods, while the mean difference between the second most accurate and the least accurate methods is 0.109. This seems to follow from UPGMA trees being more severely inaccurate than the other two when it fails. For example, for tree 4 (imbalance index = 0.0, stemminess index = 0.181) with the treatment combination of rates varying among characters, punctuational evolution, and standard rate, the most accurate method (NJ) has a mean consensus index of 0.767, the second most accurate method (MP) has a mean consensus value of 0.650, while the least accurate method (UPGMA) has a value of 0.389. The average difference between the least accurate and the second most accurate method is 0.261, more than twice the difference (0.116) between the most accurate and the second most accurate method. It is very important to determine the factors that cause a particular method to fail rather than just compare accuracies of different methods.

We calculated the average benefits to be gained by using the best method for particular factor combinations by averaging the strict consensus index values of only the most accurate method for each treatment. This yielded the value of 0.629. While this is better than using the best overall method all the time (e.g., NJ = 0.573), the gain in accuracy is relatively low, about 10%. This seems mostly because for certain treatment combinations none of the three methods do very well, as in the case of highly unbalanced trees under the punctuational evolution model. However, under other conditions (such as balanced trees under the punctuational evolution model) choosing the right estimation method can result in an improvement of more than 0.3 in terms of the strict consensus index, a 65% increase.

It was of interest to investigate whether the NJ estimated trees resembled the UPGMA trees more than the MP trees. The topologies of the trees estimated by NJ, MP and UPGMA from the same dataset were compared by computing the strict consensus index between UPGMA and NJ trees, NJ and MP trees, and MP and UPGMA trees. We analyzed the triplets of such trees for all 5,400 standard rate datasets. The mean strict consensus values between NJ and UPGMA is 0.622; between NJ and MP it is 0.690; and between MP and UPGMA it is 0.529. This shows that the NJ method produces trees that are more similar to MP trees than they are to UPGMA trees. NJ trees are also more similar to MP trees than UPGMA trees are to MP trees. These results seem to indicate the intermediacy of the NJ method between these other two methods.

The neighbor-joining method has the highest accuracy overall and also the highest accuracy under the marginal means of the three different treatments, rate, context, and absolute rate. It is important to note that the algorithm for NJ is computationally much faster when compared to that for MP with exhaustive search, especially as the number of OTUs increases. However, as can be seen from Figure 1a, a breakdown of the results into those due to different treatments and tree topologies shows that none of the three methods studied is consistently better than the other two. This suggests that it would be important to be able to know something of the underlying evolutionary processes for a given dataset, before deciding on which phylogenetic estimation method to employ. The investigator's prior knowledge of the natural history or the fossil record of a group of organisms under study might be useful for making educated guesses concerning the likelihood of the various models studied above. However, it would be far better if ways were developed to objectively examine a given dataset of a group of organisms and from it compute estimates of the levels of the various factors affecting the accuracy of phylogenetic estimation methods. With this information, the investigator could then decide on an optimal method of phylogenetic estimation. We are currently investigating the possibilities of such an approach in our laboratory.

Literature Cited

Colless, D. H. 1982. Review of phylogenetics: The theory and practice of phylogenetic systematics. Syst. Zool. 31:100-104. Eck, R. V., and M. O. Dayhoff. 1966. Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Silver Spring, MD USA. Faith, D. P. 1985. Distance methods and the approximation of most-parsimonious trees. Syst. Zool. 34:312-325. Farris, J. S. 1972. Estimating phylogenetic trees from distance matrices. Am. Nat. 106:645-668. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17:368-376. Fiala, K. L., and R. R. Sokal. 1985. Factors determining the accuracy of cladogram estimation. Evolution 39:609-622. Fitch, W. M. 1977. On the problem of discovering the most parsimonious tree. Am. Nat. 111:223-257. Fitch, W. M., and E. Marroliash. 1967. Construction of phylogenetic trees. Science 155:279-284. Hart, G. 1983. The occurrence of multiple UPGMA phenograms, pp. 254-258. In J. Felsenstein (ed,), Numerical Taxonomy. Springer-Verlag, Berlin, Germany. Jin, L., and M. Nei. 1990. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. 7:82-102. Li, W.-H. 1981. Simple method for constructing phylogenetic trees from distance matrices. Proc. Natl. Acad. Sci. USA 78:1085-1089. Li, W.-H., K. H. Wolfe, J. Sourdis, and P. M. Sharp. 1987. Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. Cold Spring Harbor Symp. Quant. Biol. 52:847-856. Margush, T., and F. R. McMorris. 1981. Consensus n-trees. Bull. Math. Biol. 43:239-244, Nei, M. 1991. Relative efficiencies of different tree-making methods for molecular data, pp. 90-128. In M. M. Miyamoto and J. L. Cracraft (eds.), Recent Advances in Phylogenetic Studies of DNA Sequences. Oxford University Press, Oxford, UK. Rohlf, F. J. 1982. Consensus indices for comparing classifications. Math. Biosci. 59:131-144, _____. 1990. NTSYS-pc, Version 1.60. Exeter Software, Setauket, N.Y., USA. Rohlf, F. J., W. S. Chang, R. R. Sokal, and J. Kim. 1990. Accuracy of estimated phylogenies: Effects of tree topology and evolutionary model. Evolution 44:1671-1684. Saitou, N., and T. Imanishi. 1989. Relative efficiencies of the Fitch-Margoliash, maximum parsimony, maximum likelihood, minimum evolution, and neighbor-joining methods of phylogenetic tree reconstruction in obtaining the correct tree. Mol. Biol. Evol. 6:514-525. Saitou, N., and M. Nei. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425. Sattath, S., and A. Tversky. 1977. Additive similarity trees. Psychometrika 42:319-345. Shao, K. T. 1983. Consensus methods in numerical taxonomy. Ph.D. Diss. State University of New York, Stony Brook, USA. Shao, K. T., and R. R. Sokal. 1990. Tree balance. Syst. Zool. 39:266-276. Sneath, P. H. A., and R. R. Sokal. 1973. Numerical Taxonomy, pp. 189-301. W. H. Freeman, San Francisco, CA USA. Sokal, R. R., and C. D. Michener. 1958. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38:1409-1438. Sourdis, J. and M. Nex. 1988. Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree. Mol. Biol. Evol. 5:298-311. Swofford, D. 1983. Phylogeny Analysis Using Parsimony. Illinois Natural History Survey, Champaign, IL USA. Tateno, Y., M. Nei, and F. Tajima. 1982. Accuracy of estimated phylogenetic trees from molecular data. I. Distantly related species. J. Mol. Evol. 18:387-404.

Printer friendly Cite/link Email Feedback | |

Author: | Kim, Junhyong; Rohlf, F. James; Sokal, Robert R. |
---|---|

Publication: | Evolution |

Date: | Apr 1, 1993 |

Words: | 7305 |

Previous Article: | Evolution of social behavior in a resource-rich, structured environment: selection experiments with Medaka (Oryzias latipes). |

Next Article: | Variation in environmental and genotypic sex-determining mechanisms across a latitudinal gradient in the fish, Menidia menidia. |

Topics: |