...
- Co-estimation of reticulate phylogenies (ILS & hybridization), gene trees, divergence times and population sizes on sequences from multiple independent loci.
- For species phylogeny or phylogenetic network, we infer network topology, divergence times in units of expected number of mutations per site, population sizes in units of population mutation rate per site, and inheritance probabilities.
- For gene trees, we infer gene tree topology and coalescent times in units of expected number of mutations per site.
- To convert the divergence times/coalescent times to units of years, or to coalescent units, see our paper for details (Page 3, Lines 36-43).
- We use BEAGLE, a high-performance library to calculate the "Felsenstein Likelihood". Full details of installation instructions can be found here, always follow "Installing from source".
If there is error, try command: java -Djava.library.path="/usr/local/lib” lib" -jar PhyloNet_X.X.X.jar script.nex
Usage
|
---|
MCMC Settings | ||
-loci locusList | The list of loci used in the inference. For example, -loci (YNR008W,YNL313C) indicates the inference is performed on two loci YNR008W and YNL313C. See the format of multilocus data here. Note that our method is able to handle missing data, see the example below. | optional |
-cl chainLength | The length of the MCMC chain. The default value is 10,000,000. | optional |
-bl burnInLength | The number of iterations in burn-in period. The default value is 2,000,000. | optional |
-sf sampleFrequency | The sample frequency. The default value is 5,000. | optional |
-sd seed | The random seed. The default seed is 12345678. | optional |
-pl parallelThreads | The number of threads running in parallel. The default value is the number of threads in your machine. | optional |
-dir outDirectory | The absolute path to store the output files. The default path is your home directory. | optional |
MC3 Settings | ||
-mc3 temperatureList | The list of temperatures for the Metropolis-coupled MCMC chains. For example, -mc3 (2.0, 3.0) indicates two hot chains with temperatures 2.0 and 3.0 respectively will be run along with the cold chain with temperature 1.0. By default only the cold chain will be run. Note that
| optional |
Inference Settings | ||
-mr maxReticulation | The maximum number of reticulation nodes in the sampled phylogenetic networks. The default value is 4. | optional |
-tm taxonMap | Gene tree / species tree taxa association. By default, it is assumed that only one individual is sampled per species in gene trees. This option allows multiple alleles to be sampled. For example, the gene tree is (((a1,a2),(b1,b2)),c); and the species tree is ((a,b),c);, the command is -tm <a:a1,a2; b:b1,b2;c:c>. Note that the taxa association should cover all species, e.g. -tm <a:a1,a2; b:b1,b2> is incorrect because c:c is dropped out. | optional |
-fixps popSize | Fix the population sizes associated with all branches of the phylogenetic network to this given value. By default, we estimate a constant population size across all branches. | optional |
-varyps | Vary the population sizes across all branches. By default, we estimate a constant population size across all branches. | optional |
-murate | Enabling the delta exchange operator for modeling varying substitution rates across loci. | optional |
Prior Settings | ||
-pp poissonParam | The Poisson parameter in the prior on the number of reticulation nodes. The default value is 1.0. | optional |
-dd | Disable the prior on the diameters of hybridizations. By default this prior on is exp(10). | optional |
-ee | Enable the Exponential(10) prior on the divergence times of nodes in the phylogenetic network. By default we use Uniform prior. | optional |
Starting State Settings | ||
-sgt | Specify the starting gene trees for each locus. Comma delimited list of gene tree identifiers. See details. The gene trees should be ultrametric trees with coalescent times in units of expected number of mutations per site. See example below. The default starting gene trees are UPGMA trees. | optional |
-snet | Specify the starting network. The input network should be ultrametric with divergence times in units of expected number of mutations per site, inheritance probabilities and population sizes in units of population mutation rate (optional). See example below. The default starting network is the MDC trees given starting gene trees. | optional |
-sps | Specify the starting population size. The default value is 0.03602. See example below. | optional |
-pre | Specify the number of iterations for pre burn-in, e.g. "-pre 20" means 20x sampleFrequency iterations will be run before the MCMC chain starts. By default, we run 10x sampleFrequency iterations for pre burn-in. | optional |
Substitution Model | ||
-gtr paramList | Set GTR (general time-reversible) as the substitution model. The first four parameters in the list represent base frequencies for A, C, G, T. The rest six parameters represent transition probabilities for A>C, A>G, A>T, C>G, C>T and G>T. The default substitution model is JC69 model. | optional |
Phasing | ||
-diploid diploidSpeciesList | Integrates over all possible phasings of heterozygous genotypes when computing likelihoods [2] given diploid species list. For example, a list of (Scer, Spar) indicates species Scer and Spar will be treated as diploid species in likelihood computation. See Section S4 in G-PhoCS manual for full details. By default we assume the sequences come from haploid species, or the sequences are randomly phased. Note that the substitution model is set to JC69 (fixed). | optional |
Substitution rate sampling | ||
-mupi paramList | Specify the substitution rates when sampling locus-specific substitution rates, which are a list of double values with the order of loci in the nexus file. The default value for all loci is 1.0. | |
-muweight paramList | Specify the weights for substitution rates when sampling locus-specific substitution rates, which are a list of integer values with the order of loci in the nexus file. The default value for all loci is 1. |
Simple Example
Download: MCMCseq_example0.nex
Please don't copy and paste, since some illegal characters might be copied.
Code Block | ||
---|---|---|
| ||
#NEXUS Begin data; Dimensions ntax=5 nchar=80; Format datatype=dna symbols="ACTG" missing=? gap=-; Matrix [YAL053W, 25, ...] Scer TCTTTATTGACGTGTATGGACAATT Spar TCTTTGTTAACGTGCATGGACAATT Smik TCCTTGCTAACATGCATGGACAATT Skud TCTTTGCTAACGTGCATGGATAATT Sbay TCTTTACTAACGTGCATGGATAACT [YAR007C, 30, ...] Scer ATGAGCAGTGTTCAACTTTCGAGGGGCGAT Spar ATGAGCAGCGTTCAACTTTCGAAGGGCGAC Smik ATGAGCAGCGTGCAACTATCAAAGGGCGAC Skud ATGAGCAGTGTTCAACTTTCGAAGGGCGAC Sbay ATGAGCAGCGTTCAACTTTCGAAGGGCGAC [YBL015W, 25, ...] Scer TCTAATTTGTTAAAGCAGAGAGTTA Spar TCTAATTTGTTAAAGCAGAGAGTTA Smik TCTAATTTGTTAAAACAGAGAGTTC Skud TCTAATCTGTTGAAGCAGAGAGTTA Sbay TCTAATCTGTTGAAGCAAAAAGTCA ;End; BEGIN PHYLONET; MCMC_SEQ -cl 250000 -bl 50000 -sf 5000; END; |
...