Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

VI_coalHMM [-bl] -st startingTree -mu mutationRate -rho recombinationRate -r crossoverRate -nb numSubBranch [-len simulationShortRegionLength] [-nhsigma nodeHeightInitialSigma] [-pssigma popSizeInitialSigma] [-blsigma branchLengthInitialSigma] [-psp popSizePrior] [-n0 N0ForMS] [-ns samplePerIter] [-niter numIter] [-nhmeanlr nodeHeightMeanLearningRate] [-psmeanlr popSizeMeanLearningRate] [-blmeanlr branchLengthMeanLearningRate] [-nhsigmalr nodeHeightSigmaLearningRate] [-pssigmalr popSizeSigmaLearningRate] [-blsigmalr branchLengthSigmaLearningRate] [-nhsigmamin nodeHeightSigmaMinimum] [-pssigmamin popSizeSigmaMinimum] [-blsigmamin branchLengthSigmaMinimum]

 

 

Parametrization setting
-bl

Infer branch length of the branches in the species tree, instead of node heights of internal nodes. This option can reduce sampling illegal tree configurations when estimating the gradient of the ELBO by Monte Carlo samples.

optional
Starting State Settings
-st startingTree

Specify the starting tree topology and node heights. The input tree should be ultrametric with branch lengths in units of generations. For example, ((H:160000, C:160000):60000, G:220000); species a three-taxon tree with an internal node height of 160,000 generations and root node height of 220,000 generations. See the example below.

mandatory

-mu mutationRate

The mutation rate in unit of expected number of mutations per site per generation. For example, 2.5e-8.

mandatory

-rho recombinationRate

The recombination rate in unit of expected number of recombinations per site per generation. For example, 1.5e-8.

mandatory

-nhsigma nodeHeightInitialSigma

The starting standard deviation of the variational posterior of each node height. The default value is 20,000. (Only used when -bl is not set.)

optional

-pssigma popSizeInitialSigma

The starting standard deviation of the variational posterior of each population size. The default value is 10,000.

optional
-blsigma branchLengthInitialSigma

The starting standard deviation of the variational posterior of each branch length. The default value is 20,000. (Only used when -bl is set.)

optional
Prior Settings
-psp popSizePrior

Mean value of the prior of population sizes. The default value is 50,000.

optional
Likelihood Simulator Settings
-n0 N0ForMS

N0 for ms. The default value is 10,000. For details see ms documentation (subsection "Two species with population size differences" in section "Some examples") and our paper.

optional
-r crossoverRate

The cross-over rate that determines the length of simulation for building coalHMM. For details see ms documentation ("Crossing over") and our paper. Can use 1,000 as a starting point.

mandatory
-nb numSubBranchThe number of sub-branches on each internal branch of the species tree for refining coalHMM state space. For details see our paper. Can use 2 as a starting point.mandatory
-len simulationShortRegionLengthSimulating multiple independent short regions when building the HMM could save time compared to simulating a long region. This parameter is the length of each independent short region simulation. The default value is 5,000.optional
BBVI Settings
-ns samplePerIterThe number of samples per iteration of BBVI for estimating gradient. The default value is 50.

optional

-niter numIterThe number of iterations of BBVI. The default value is 200.optional
-nhmeanlr nodeHeightMeanLearningRateLearning rate for the mean parameter of the variational posterior of node heights. The default value is 20,000. (Only used when -bl is not set.)optional
-psmeanlr popSizeMeanLearningRateLearning rate for the mean parameter of the variational posterior of population sizes. The default value is 10,000.optional
-blmeanlr branchLengthMeanLearningRateLearning rate for the mean parameter of the variational posterior of branch lengths. The default value is 20,000. (Only used when -bl is set.)optional
-nhsigmalr nodeHeightSigmaLearningRateLearning rate for the standard deviation parameter of the variational posterior of node heights. The default value is 500. (Only used when -bl is not set.)optional
-pssigmalr popSizeSigmaLearningRateLearning rate for the standard deviation parameter of the variational posterior of population sizes. The default value is 500.optional
-blsigmalr branchLengthSigmaLearningRateLearning rate for the standard deviation parameter of the variational posterior of branch lengths. The default value is 500. (Only used when -bl is set.)optional
-nhsigmamin nodeHeigthSigmaMinimumThe minimum value of the standard deviation of node heights variational posterior. Since BBVI is possible to reach a negative standard deviation if the learning rate is not set carefully, a minimum value is required so that the standard deviation would not drop below the specified value during BBVI searches. The default value is 10,000. (Only used when -bl is not set.)optional
-pssigmamin popSizeSigmaMinimumThe minimum value of the standard deviation of population sizes variational posterior. The default value is 3,000.optional
-blsigmamin branchLengthSigmaMinimumThe minimum value of the standard deviation of branch length variational posterior. The default value is 10,000. (Only used when -bl is set.)optional

Example

Download:   test.nex

#NEXUS
Begin data;
Dimensions ntax=3 nchar=500000;
Format datatype=dna symbols="ACTG" missing=? gap=-;
Matrix

H TCGCTGTCTCATACTATATGGAGAGTCAAGGGGGTTGAGATAATTGTCGCATTGTCTAAGTGAATGGCGTAAAGCGAAC.......
C CCGCTGTCTCATACTATATGGAGAGTCAAGGGGGTTGAGATAATTGTCGCATTGTCTAAGTGTATGGCGTAAAGCGAAC.......
G TCGCTGTCTCATACTATATGGAGAGTCAAGTGGGTTGAGATAATTGTCGCATTGTCTAAGTGAATGGCGTAAAGCGAAC.......
;End;

BEGIN TREES;
Tree t0 = ((H:150000,C:150000):150000, G:300000);
END;

BEGIN PHYLONET;
VI_coalHMM -bl -st (t0) -mu 2.5e-8 -rho 1.5e-8 -r 1000 -nb 2 -psp 50000 -nhsigma 20000 -pssigma 10000 -n0 10000 -ns 50 -niter 200 -nhmeanlr 20000 -psmeanlr 10000 -nhsigmalr 500 -pssigmalr 500 -nhsigmamin 10000 -pssigmamin 3000;
END;

 

This command will run VI_coalHMM for the data given. It will infer branch lengths of one leaf branch and one internal branchthe divergence times of HC ancestor and HCG ancestor, as well as the population sizes of HC ancestor and HCG ancestor. The starting tree is ((H:150000,C:150000):150000, G:300000);. That is, we start the search with HC ancestor divergence time of 150,000 generations and HCG ancestor divergence time of 300,000 generations (150,000 generations for both branch lengths). Note that the Newick string must be given in the TREES section and referenced in the PHYLONET section. The mutation rate is set to 2.5e-8 mutations per site per generation. The recombination rate is set to 1.5e-8 recombinations per site per generation. The cross-over rate -r is set to 1000 and the number of sub-branches -nb is set to 2. For details of -r and -nb see our paper. All other parameters are set as default.

...