You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Description

  • Black box variational inference of evolutionary parameters (node heights and population sizes of each internal node and internal branch) on a species tree under the coalescent with recombination. The data is sequence alignment of recombinant DNA. We estimate node heights in unit of generations and population sizes in unit of individuals. We estimate the mean and standard deviation of the posterior of each parameter.
  • We use BEAGLE, a high-performance library to calculate the "Felsenstein Likelihood". Full details of installation instructions can be found here, always follow "Installing from source".

Usage

 

VI_coalHMM -st startingTree -mu mutationRate -rho recombinationRate -r crossoverRate -nb numSubBranch [-nhsigma nodeHeightInitialSigma] [-pssigma popSizeInitialSigma] [-psp popSizePrior] [-n0 N0ForMS] [-ns samplePerIter] [-niter numIter] [-nhmeanlr nodeHeightMeanLearningRate] [-psmeanlr popSizeMeanLearningRate] [-nhsigmalr nodeHeightSigmaLearningRate] [-pssigmalr popSizeSigmaLearningRate] [-nhsigmamin nodeHeightSigmaMinimum] [-pssigmamin popSizeSigmaMinimum]

 

 

Starting State Settings
-st startingTree

Specify the starting tree topology and node heights. The input tree should be ultrametric with branch lengths in units of generations. For example, ((H:160000, C:160000):60000, G:220000); species a three-taxon tree with an internal node height of 160,000 generations and root node height of 220,000 generations. See the example below.

mandatory

-mu mutationRate

The mutation rate in unit of expected number of mutations per site per generation. For example, 2.5e-8.

mandatory

-rho recombinationRate

The recombination rate in unit of expected number of recombinations per site per generation. For example, 1.5e-8.

mandatory

-nhsigma nodeHeightInitialSigma

The starting standard deviation of the variational posterior of each node height. The default value is 20,000.

optional

-pssigma popSizeInitialSigma

The starting standard deviation of the variational posterior of each population size. The default value is 10,000.

optional
Prior Settings
-psp popSizePrior

Mean value of the prior of population sizes. The default value is 50,000.

optional
Likelihood Simulator Settings
-n0 N0ForMS

N0 for ms. The default value is 10,000. For details see ms documentation (subsection "Two species with population size differences" in section "Some examples") and our paper.

optional
-r crossoverRate

The cross-over rate that determines the length of simulation for building coalHMM. For details see ms documentation ("Crossing over") and our paper. Can use 1,000 as a starting point.

mandatory
-nb numSubBranchThe number of sub-branches on each internal branch of the species tree for refining coalHMM state space. For details see our paper. Can use 2 as a starting point.mandatory
BBVI Settings
-ns samplePerIterThe number of samples per iteration of BBVI for estimating gradient. The default value is 50.

optional

-niter numIterThe number of iterations of BBVI. The default value is 200.optional
-nhmeanlr nodeHeightMeanLearningRateLearning rate for the mean parameter of the variational posterior of node heights. The default value is 20,000.optional
-psmeanlr popSizeMeanLearningRateLearning rate for the mean parameter of the variational posterior of population sizes. The default value is 10,000.optional
-nhsigmalr nodeHeightSigmaLearningRateLearning rate for the standard deviation parameter of the variational posterior of node heights. The default value is 500.optional
-pssigmalr popSizeSigmaLearningRateLearning rate for the standard deviation parameter of the variational posterior of population sizes. The default value is 500.optional
-nhsigmamin nodeHeigthSigmaMinimumThe minimum value of the standard deviation of node heights variational posterior. Since BBVI is possible to reach a negative standard deviation if the learning rate is not set carefully, a minimum value is required so that the standard deviation would not drop below the specified value during BBVI searches. The default value is 10,000.optional
-pssigmamin popSizeSigmaMinimumThe minimum value of the standard deviation of population sizes variational posterior. The default value is 3,000.optional

Example

Download: test.nex

 

#NEXUS
Begin data;
Dimensions ntax=3 nchar=500000;
Format datatype=dna symbols="ACTG" missing=? gap=-;
Matrix

H TCGCTGTCTCATACTATATGGAGAGTCAAGGGGGTTGAGATAATTGTCGCATTGTCTAAGTGAATGGCGTAAAGCGAAC.......
C CCGCTGTCTCATACTATATGGAGAGTCAAGGGGGTTGAGATAATTGTCGCATTGTCTAAGTGTATGGCGTAAAGCGAAC.......
G TCGCTGTCTCATACTATATGGAGAGTCAAGTGGGTTGAGATAATTGTCGCATTGTCTAAGTGAATGGCGTAAAGCGAAC.......
;End;

BEGIN TREES;
Tree t0 = ((H:150000,C:150000):150000, G:300000);
END;

BEGIN PHYLONET;
VI_coalHMM -st (t0) -mu 2.5e-8 -rho 1.5e-8 -r 1000 -nb 2;
END;

 

This command will run VI_coalHMM for the data given. The starting tree is ((H:150000,C:150000):150000, G:300000);. That is, we start the search with HC ancestor divergence time of 150,000 generations and HCG ancestor divergence time of 300,000 generations. Note that the Newick string must be given in the TREES section and referenced in the PHYLONET section. The mutation rate is set to 2.5e-8 mutations per site per generation. The recombination rate is set to 1.5e-8 recombinations per site per generation. Cross-over rate -r is set to 1000 and number of sub-branches -nb is set to 2. For details of -r and -nb see our paper. All other parameters are set as default.

Note on Learning Rates

Users can set separate learning rates for four kinds of variational posterior parameters: mean of node heights, standard deviation of node heights, mean of population sizes, and standard deviation of population sizes. These learning rates need to be set very carefully so that BBVI can converge quickly. During the BBVI search, VI_coalHMM will print the gradient of each parameter (mean and standard deviation of each demographic parameter) to the console. It is recommended that the user set the four learning rates according to the scale of the gradient of each parameter so that the step size of each parameter in each iteration is reasonable.

If learning rates or starting states are not set properly, you may often see the warning “Illegal value sampled this iteration.” This happens when an illegal configuration is sampled during BBVI gradient estimation (For example, child node has a higher node height than the parent node). If you see this message a lot, please change starting states and learning rates so that the variational posterior of node heights do not overlap and variational posterior of population sizes do not cover negative values.

Command References

  1. Xinhao Liu, Huw A. Ogilvie, and Luay Nakhleh. Variational Inference Using Approximate Likelihood Under the Coalescent With Recombination.
  • No labels