Description
- Black box variational inference of evolutionary parameters (node heights and population sizes of each internal node and internal branch) on a species tree under the coalescent with recombination. The data is sequence alignment of recombinant DNA. We estimate node heights in unit of generations and population sizes in unit of individuals. We estimate the mean and standard deviation of the posterior of each parameter.
- We use BEAGLE, a high-performance library to calculate the "Felsenstein Likelihood". Full details of installation instructions can be found here, always follow "Installing from source".
If there is error, try command: java -Djava.library.path="/usr/local/lib" -jar PhyloNet_X.X.X.jar script.nex
Usage
|
---|
Starting State Settings | ||
-st startingTree | Specify the starting tree topology and node heights. The input tree should be ultrametric with branch lengths in units of generations. For example, ((H:160000, C:160000):60000, G:220000); species a three-taxon tree with an internal node height of 160,000 generations and root node height of 220,000 generations. See the example below. | mandatory |
-mu mutationRate | The mutation rate in unit of expected number of mutations per site per generation. For example, 2.5e-8. | mandatory |
-rho recombinationRate | The recombination rate in unit of expected number of recombinations per site per generation. For example, 1.5e-8. | mandatory |
-nhsigma nodeHeightInitialSigma | The starting standard deviation of the variational posterior of each node height. The default value is 20,000. | optional |
-pssigma popSizeInitialSigma | The starting standard deviation of the variational posterior of each population size. The default value is 10,000. | optional |
Prior Settings | ||
-psp popSizePrior | Mean value of the prior of population sizes. The default value is 50,000. | optional |
Likelihood Simulator Settings | ||
-n0 N0ForMS | N0 for | optional |
-r crossoverRate | The cross-over rate that determines the length of simulation for building coalHMM. For details see ms documentation ("Crossing over") and our paper. Can use 1,000 as a starting point. | mandatory |
-nb numSubBranch | The number of sub-branches on each internal branch of the species tree for refining coalHMM state space. For details see our paper. Can use 2 as a starting point. | mandatory |
BBVI Settings | ||
-ns samplePerIter | The number of samples per iteration of BBVI for estimating gradient. The default value is 50. | optional |
-niter numIter | The number of iterations of BBVI. The default value is 200. | optional |
-nhmeanlr nodeHeightMeanLearningRate | Learning rate for the mean parameter of the variational posterior of node heights. The default value is 20,000. | optional |
-psmeanlr popSizeMeanLearningRate | Learning rate for the mean parameter of the variational posterior of population sizes. The default value is 10,000. | optional |
-nhsigmalr nodeHeightSigmaLearningRate | Learning rate for the standard deviation parameter of the variational posterior of node heights. The default value is 500. | optional |
-pssigmalr popSizeSigmaLearningRate | Learning rate for the standard deviation parameter of the variational posterior of population sizes. The default value is 500. | optional |
-nhsigmamin nodeHeigthSigmaMinimum | The minimum value of the standard deviation of node heights variational posterior. Since BBVI is possible to reach a negative standard deviation if the learning rate is not set carefully, a minimum value is required so that the standard deviation would not drop below the specified value during BBVI searches. The default value is 10,000. | optional |
-pssigmamin popSizeSigmaMinimum | The minimum value of the standard deviation of population sizes variational posterior. The default value is 3,000. | optional |
Starting State Settings | ||
-sgt | Specify the starting gene trees for each locus. Comma delimited list of gene tree identifiers. See details. The gene trees should be ultrametric trees with coalescent times in units of expected number of mutations per site. See example below. The default starting gene trees are UPGMA trees. | optional |
-snet | Specify the starting network. The input network should be ultrametric with divergence times in units of expected number of mutations per site, inheritance probabilities and population sizes in units of population mutation rate (optional). See example below. The default starting network is the MDC trees given starting gene trees. | optional |
-sps | Specify the starting population size. The default value is 0.036. See example below. | optional |
-pre | Specify the number of iterations for pre burn-in, e.g. "-pre 20" means 20x sampleFrequency iterations will be run before the MCMC chain starts. By default, we run 10x sampleFrequency iterations for pre burn-in. | optional |
Substitution Model | ||
-gtr paramList | Set GTR (general time-reversible) as the substitution model. The first four parameters in the list represent base frequencies for A, C, G, T. The rest six parameters represent transition probabilities for A>C, A>G, A>T, C>G, C>T and G>T. The default substitution model is JC69 model. | optional |
Phasing | ||
-diploid diploidSpeciesList | Integrates over all possible phasings of heterozygous genotypes when computing likelihoods [2] given diploid species list. For example, a list of (Scer, Spar) indicates species Scer and Spar will be treated as diploid species in likelihood computation. See Section S4 in G-PhoCS manual for full details. By default we assume the sequences come from haploid species, or the sequences are randomly phased. Note that the substitution model is set to JC69 (fixed). | optional |