Maximum likelihood estimation of phylogenetic networks given bi-allelic genetic markers (SNPs, AFLPs, etc).
The number of iterations of simulated annealing. The temperature of simulated annealing is reset in the beginning of each iteration, then the temperature reduces gradually as more states are examined. By doing this, the search can jump out of local optimum in the beginning of one iteration easily, then random walk in the space of phylogenetic networks is performed during each iteration. The default value is 100.
|The maximum allowed times of examining a state during one iteration. During one iteration of simulated annealing, each state is obtained by random walk in the space of phylogenetic networks. A state is proposed by randomly altering the topology or parameters in the previous state, then the new state is examined, and can be accepted or rejected. If the number of states examined exceeds this limit, the current iteration terminates, and a new iteration starts. The default value is 50,000.
The number of optimal networks to output. The optimal networks are outputted after every iteration. The optimal networks outputted are the optimal networks in any state examined in any iteration. The default value is 10.
The maximum allowed times of failures to accept a new state during one iteration. If the number of times when new purposed states are continuously rejected exceeds this limit, the current iteration terminates, and a new iteration starts. The default value is 50.
The number of threads running in parallel. The computation of pseudo-likelihood is parallelized since the likelihood of trinets can be computed independently. This number of threads indicates how many threads are used for computation of pseudo-likelihood. However, more threads don’t necessarily mean faster computations usually. In practice, the user needs to figure out the best number of threads by experimenting on a smaller data set and see whether the inference is faster by increasing the number of threads. The default value is the number of threads in your machine.
|The maximum number of reticulation nodes in the sampled phylogenetic networks. This number is a bound on the number of reticulations that the method explores during the search. However, this does not mean that the inferred network has to have this number of reticulations. In theory, this number can be set to a very large value so as not to impose any real bound. However, in practice, the number of reticulations can affect the running time. Furthermore, in the absence of a real criterion for model selection, setting this parameter to a large value might result in overly complex networks. We recommend that the user sets the parameter at a value that is “reasonable” to them, based on knowledge of the data set. The default value is 4.
|Gene tree / species tree taxa association. By default, it is assumed that only one individual is sampled per species in gene trees. This option allows multiple alleles to be sampled. For example, the gene tree is (((a1,a2),(b1,b2)),c); and the species tree is ((a,b),c);, the command is -tm <a:a1,a2; b:b1,b2;c:c>. If the set of taxa appeared in this mapping is a subset of input data, the subset of input data will be used for the inference.
|Fix the population mutation rates associated with all branches of the phylogenetic network to this given value (theta). By default, we estimate a constant population size across all branches.
|Estimate the mean value of prior of population mutation rates.
|Starting State Settings
|Specify the starting network. The input network should be ultrametric with divergence times in units of expected number of mutations per site, inheritance probabilities and population sizes in units of population mutation rate (optional). See example below. The default starting network is the MDC trees given starting gene trees.
|Specify the mean value of prior of population mutation rate (startingThetaPrior). The default value is 0.036. If -esptheta is used, startingThetaPrior will be treated as the starting value, otherwise startingThetaPrior will be treated as the fixed mean value of prior of population mutation rates.
Data related settings
|Specify whether sequence sampled from diploids. If the sequence is from diploid and there are not dominant markers, the characters in the sequence should be ‘0’, ‘1’ or ‘2’. ‘0’ and ‘2’ are the homozygotes and ‘1’ is the heterozygote state.
|Specify which marker is dominant if the data is dominant. The dominant marker can either be ‘0’ or ‘1’. Only use when “-diploid” is specified. If this option is specified, the characters in the sequence should be ‘0’ or ‘1’.
|Specify whether or not to ignore all monomorphic sites. If this option is used, the data will be treated as containing only polymorphic sites, and all monomorphic sites are ignored. Then the frequencies of the monomorphic sites will be computed by the likelihood function.
|Specify the stationary distribution of marker "0". Value should be between 0 and 1. If not specified, the stationary distribution will be calculated from input data.
Please download the example instead of copying from this webpage and pasting into your local file!
This command will run maximum pseudolikelihood estimation of 10 iterations with 20 optimal networks printed. And after 100 times of failure to accept a new state, or after 50000 examinations of new states, it will start a new iteration. We will estimate population mutation rates for all branches, and they are the same across all branches. The number of reticulation nodes is limited to 1. The starting value of population mutation rate is given by 0.006. We use the random seed of 12345678. In the end, we indicate the mapping from taxa to species.