You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Description

Infers a species network(s) with a specified number of reticulation nodes using maximum likelihood. The returned species network(s) will have inferred branch lengths and inheritance probabilities. To find the optimal network, steepest descent is used. For every network topology being examined, we use Richard Brent's algorithm (from his book "Algorithms for Minimization without Derivatives", p. 79) to optimize the branch lengths to obtain the maximum likelihood score for that topology. The species network and gene trees must be specified in the Rich Newick Format

The inference can be made using only topologies of gene trees, or using both topologies and branch lengths of gene trees. The latter one requires the input gene trees to be ultrametric.

The input gene trees can be gene tree distributions inferred from Bayesian methods like MrBayes. See the second example below.

Usage

InferNetwork_ML geneTreeList numReticulations [-a taxa map] [-bl] [-b threshold] [-s startingNetwork] [-n numNetReturned] [-h {s1 [,s2...]}] [-w (w1,w2,w3,w4)] [-f maxFailure] [-x numRuns] [-m maxNetExamined] [-d maxDiameter] [-p (rel,abs)] [-r maxRounds] [-t maxTryPerBr] [-i improveThreshold] [-l maxBL] [-pl numProcessors] [-di] [result output file]

geneTreeList

Comma delimited list of gene tree identifiers or comma delimited list of sets of gene tree identifiers. See details.

mandatory

numReticulations

Maximum number of reticulations to added.

mandatory

-b threshold

Gene trees bootstrap threshold.

optional

-a taxa map

Gene tree / species tree taxa association.

optional

-bl

Use the branch lengths of the gene trees for the inference. 

optional

-s startingNetwork

Specify the network to start search. Default value is the optimal MDC tree.

optional

-n numNetReturned

Number of optimal networks to return. Default value is 1.

optional
-h {s1 [, s2...]}

A set of specified hybrid species. The size of this set equals the number of reticulation nodes in the inferred network. 

optional
-w (w1, w2, w3, w4)

The weights of operations for network arrangement during the network search. Default value is (0.15, 0.15, 0.2, 0.5).

optional
-f maxFailure

The maximum number of consecutive failures before the search terminates. Default value is 100.

optional

-x numRuns 

The number of runs of the search. Default value is 5.

optional

-m maxNetExamined

Maximum number of network topologies to examined. Default value is infinity.

optional

-d maxDiameter

Maximum diameter to make an arrangement during network search. Default value is infinity.

optional

-p (rel, abs)

The original stopping criterion of Brent’s algorithm. Default value is (0.01, 0.001).

optional

-r maxRound

Maximum number of rounds to optimize branch lengths for a network topology. Default value is 100.

optional

-t maxTryPerBr

Maximum number of trial per branch in one round to optimize branch lengths for a network topology. Default value is 100.

optional

-i improveThreshold

Minimum threshold of improvement to continue the next round of optimization of branch lengths. Default value is 0.001.

optional

-l maxBL

Maximum branch lengths considered. Default value is 6.

optional

-pl numProcessors 

Number of processors if you want the computation to be done in parallel. Default value is 1.

optional
-diOutput the Rich Newick string of the inferred network that can be read by Dendroscope .optional

result output file

Optional file destination for command output.

optional

It is mandatory to specify the number of reticulation nodes to added to the starting network. By default, the inference uses only the topologies of gene trees, however, users can also use both topologies and branch lengths of the gene trees to do the inference, by specifying option -bl. The -option allows the users to specify a starting network (can be a tree) for network search. Then starting from this network, numReticulations number of reticulation nodes will be added during the network search using steepest descent. If the starting network is not specified, the optimal tree under MDC (command infer_ST_MDC) will be used. If it is not binary, a random resolution will be used. By default, only the first optimal species network will be returned. However, users can use -n option to ask for multiple optimal networks. 

Simple hill climbing is used for the search. Users can specify the weights of four operations for network arrangement through option -w. By default, the search terminates when a preset limit of consecutive failures is reached (Default is 100, but users can change it through option -f). However, option -m allows users to specify the maximum number of networks examined during the search. Once that number is reached, the program will terminate and return the optimal network found so far. On the other hand, users can use option -d to specify the maximum diameter of an operation for network rearrangement, like what local-SPR does. In order to avoid getting stuck at some local optimum, it is recommended to performed the search multiple times, which users can specify by option -x.

For every network topology being examined, we use Richard Brent's algorithm (from his book "Algorithms for Minimization without Derivatives", p. 79) to optimize the branch lengths. Users can use different options to control this process. Option -p allows users to specify the original stopping criterion of Brent's algorithm. More precisely, abs and rel define a tolerance tol = rel |x| + abs. We optimize the branch lengths one by one. For every branch, it terminates when either maxTryPerBr (option -t) trials have been made or the Brent's algorithm suggests so. Users can put an upper bound of the branch lengths through option -l.Optimization of all branch lengths consists of a round. After every round, if the improvement in terms of likelihood score is greater than that from last round by at least improveThreshold (option -i), we starts next round. A maximum of maxRound (option -r) rounds will be tried. 

By default, it is assumed that only one individual is sampled per species in gene trees. However, the option [-a taxa map] allows multiple alleles to be sampled. If users have a prior knowledge of the hybrid species, they can specify them using option -h.

If users want to run the computation in parallel. Please specify the number of processors through option -pl.

Examples

#NEXUS

BEGIN TREES;

Tree geneTree1 = ((C,((B,D),A)),E);
Tree geneTree2 = (B,(D,(C,(A,E))));
Tree geneTree3 = (D,(B,((C,E),A)));
Tree geneTree4 = (D,((B,E),(C,A)));

END;


BEGIN PHYLONET;

InferNetwork_ML (geneTree1,geneTree2,geneTree3,geneTree4) 1;

END;
#NEXUS

BEGIN TREES;

Tree geneTree1 = [&W 0.9] ((C,((B,D),A)),E);
Tree geneTree2 = [&W 0.1] (B,(D,(C,(A,E))));
Tree geneTree3 = [&W 0.6] (D,(B,((C,E),A)));
Tree geneTree4 = [&W 0.4] (D,((B,E),(C,A)));

END;


BEGIN PHYLONET;

InferNetwork_ML (geneTree1,geneTree2,geneTree3,geneTree4) 1;

END;
#NEXUS

BEGIN TREES;

Tree geneTree1 = ((C:3,((B:1,D:1):1,A:2):1):1,E:4);
Tree geneTree2 = (B:4,(D:3,(C:2,(A:1,E:1):1):1):1);
Tree geneTree3 = (D:4,(B:3,((C:1,E:1):1,A:2):1):1);
Tree geneTree4 = (D:3,((B:1,E:1):1,(C:1,A:1):1):1);

END;


BEGIN PHYLONET;

InferNetwork_ML (geneTree1,geneTree2,geneTree3,geneTree4) 1 -bl;

END;

Command References

  • Y. Yu, N. Ristic and L. Nakhleh. Fast algorithms and Heuristics for Phylogenomics under hybridization and incomplete lineage sorting.  BMC Bioinformatics, vol. 14, no. Suppl 15, p. S6, 2013.
  • Y. Yu, J. Dong, K. Liu, and L. Nakhleh, Probabilistic inference of reticulate evolutionary histories, Under Review.



See Also

  • No labels