Description
Infers an allopolyploid species network(s) with a specified number of reticulation nodes under MDC criterion using parsimony-based method. To find the optimal network, steepest descent is used. The species network and gene trees must be specified in the Rich Newick Format. However, only topologies of them are used in the method.
The input gene trees can be gene tree distributions inferred from Bayesian methods like MrBayes.
Usage
InferNetwork_MP_Allopp (gene_tree_ident1 [, gene_tree_ident2...]) numReticulations [-a taxa map] [-b threshold] [-s startingNetwork] [-fs] [-n numNetReturned] [-m maxNetExamined] [-d maxDiameter] [-h {s1 [,s2...]}] [-w (w1,...,w6)] [-f maxFailure] [-x numRuns] [-pl numProcessors] [-di] [resultOutputFile]
gene_tree_ident1 [, gene_tree_ident2...] | Comma delimited list of gene tree identifiers. See details. | mandatory |
numReticulations | Maximum number of reticulations to be added. | mandatory |
-b threshold | Gene trees bootstrap threshold. Edges in the gene trees that have support lower than threshold will be contracted. | optional |
-a taxa map | Gene tree / species tree taxa association. | optional |
-s startingNetwork | Specify the network to start search. Default value is the optimal MDC tree. | optional |
-fs | Fix the start tree for search. If specified and give a start tree (-s), the search will fix the topology of the start tree. | optional |
-n numNetReturned | Number of optimal networks to return. Default value is 1. | optional |
-m maxNetExamined | Maximum number of network topologies to examined. Default value is infinity. | optional |
-d maxDiameter | Maximum diameter to make an arrangement during network search. Default value is infinity. | optional |
-h {s1 [, s2...]} | A set of specified hybrid species. | optional |
-w (w1, ..., w6) | The weights of operations for network arrangement during the network search. Default value is (0.1,0.1,0.15,0.55,0.15,0.15). | optional |
-f maxFailure | The maximum number of consecutive failures before the search terminates. Default value is 100. | optional |
-x numRuns | The number of runs of the search. Default value is 5. | optional |
-pl numProcessors | Number of processors if you want the computation to be done in parallel. Default value is 1. | optional |
-di | Output the Rich Newick string of the inferred network that can be read by Dendroscope. | optional |
resultOutputFile | Optional file destination for command output. | optional |
It is mandatory to specify the number of reticulation nodes to added to the starting network. If users have a prior knowledge of the allopolyploid species, it is recommended to specify them using option -h. The -s option allows the users to specify a starting network (can be a tree) for network search. Then starting from this network, numReticulations number of reticulation nodes will be added during steepest descent search. If the starting network is not specified, the optimal tree under MDC (command inferST_MDC) will be used. If it is not binary, a random resolution will be used. By default, only the first optimal species network will be returned. However, users can use -n option to ask for multiple optimal networks.
Simple hill climbing is used for the search. Users can specify the weights of six operations for network arrangement through option -w. The six weights correspond to adding a reticulation node, deleting a reticulation node, relocating the head of a reticulation edge, relocating the tail of an edge, reversing the direction of a reticulation edge and replacing a reticulation edge, respectively. By default, the search terminates when a preset limit of consecutive failures is reached (Default is 100, but users can change it through option -f). However, option -m allows users to specify the maximum number of networks examined during the search. Once that number is reached, the program will terminate and return the optimal network found so far. On the other hand, users can use option -d to specify the maximum diameter of an operation for network rearrangement, like what local-SPR does. In order to avoid getting stuck at some local optimum, it is recommended to performed the search multiple times, which users can specify by option -x.
By default, it is assumed that only one individual is sampled per species in gene trees. However, the option [-a
taxa map
]
allows multiple alleles to be sampled.
If users want to run the computation in parallel. Please specify the number of processors through option -pl.
Examples
#NEXUS BEGIN TREES; Tree geneTree1 = ((a,(b,((x1,y1),z1))),(d,(c,((x2,y2),z2)))); Tree geneTree2 = (((a,b),(x1,y1)),(((x2,z2),d),c)); END; BEGIN PHYLONET; InferNetwork_MP_Allopp (geneTree1,geneTree2) 1 -a <A:a;B:b;C:c;D:d;X:x1,x2;Y:y1,y2;Z:z1,z2>; END;
#NEXUS BEGIN TREES; Tree geneTree1 = (((C2,C1),(((T3,T4),(B2,B1)),((A2,A1),(T1,T2)))),(O2,O1)); Tree geneTree2 = (((C1,C2),(((A2,A1),(T1,T2)),((T3,T4),(B2,B1)))),(O1,O2)); Tree geneTree3 = ((((A2,(B2,T3)),((A1,((T1,T2),T4)),(C2,C1))),B1),(O2,O1)); END; BEGIN PHYLONET; InferNetwork_MP_Allopp (geneTree1,geneTree2,geneTree3) 1 -h {T} -a < O:O2,O1; A:A2,A1; T:T3,T4,T1,T2; B:B2,B1; C:C2,C1>; END;
Command References
- Z. Yan, Z. Cao, Y. Liu, and L. Nakhleh. Species Network Inference in the Presence of Polyploid Complexes Using PhyloNet. Manuscript in preparation.
- Y. Yu, R.M. Barnett, and L. Nakhleh. Parsimonious inference of hybridization in the presence of incomplete lineage sorting. Systematic Biology, vol. 62, no. 5, pp. 738-751, 2013.
- Z. Cao, J. Zhu and L. Nakhleh. Empirical performance of tree-based inference of phylogenetic networks. WABI 2019, vol. 143, 21: 1--21:13.