Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Bayesian Inference

MCMC_GT geneTreeList [-cl chainLength] [-bl burnInLength] [-sf sampleFrequency] [-sd seed] [-pp poissonParameter] [-mr maximumReticulation] [-pl parallelThreads] [-tp temperatureList] [-sn startingNetworkList] [-tm taxonMap] [-pseudo]
geneTreeListComma delimited list of gene tree identifiers or comma delimited list of sets of gene tree identifiers. See details.mandatory

-cl chainLength

The length of the MCMC chain. The default value is 1,100,000.

optional

-bl burnInLengthThe number of iterations in burn-in period. The default value is 100,000.optional

-sf sampleFrequency

The sample frequency. The default value is 1,000.

optional

-sd seedThe random seed. The default seed is 12345678optional

-pp poissonParameter

The Poisson parameter in the prior on the number of reticulation nodes. The default value is 1.0

optional

-mr maximumReticulationThe maximum number of reticulation nodes in the sampled phylogenetic networks. The default value is infinity.optional
-pl parallelThreads The number of threads running in parallel. The default value is 1.optional
-tp temperatureListThe list of temperatures for the Metropolis-coupled MCMC chains. For example, a list (1.0, 2.0, 3.0) indicates a cold chain with temperature 1.0 and two hot chains with temperatures 2.0 and 3.0 respectively will be run. The first value in the list should always be 1.0. The default list is (1.0).optional
-sn startingNetworkListComma delimited list of network identifiers. See details. If the list contains only one network, the network will be used as the starting network for all the chains. If the length of the list equals to the the length of the temperature list, each chain will have a corresponding starting network. The default list is empty and the MDC tree will be used as the starting network.optional
-tm taxonMapGene tree / species tree taxa association. By default, it is assumed that only one individual is sampled per species in gene trees. However, this option allows multiple alleles to be sampled.optional
-pseudoUse pseudo likelihood instead of full likelihood to reduce runtime, see details.optional

Summarization

 

MCMC_GT -sum fileList

...

Output of the second example

 

----------------------- Logger -----------------------

Iteration;    Posterior;  ESS;    Likelihood;    Prior;  ESS;    #Reticulation

0;    -32.31804;    0.00000;    -26.31804;   -6.00000;    0.00000;    0;

(C:1.0,(F:1.0,(G:1.0,K:1.0):1.0):1.0);

1;    -30.67945;    0.00000;    -24.36941;   -6.31004;    0.00000;    0;

(((F:1.100296825380626,G:3.091462635493304):0.0672071335563177,C:1.2631834608546109):0.022439620222693815,K:0.7654486703017418);

......

9;    -39.23243;    5.00000;    -24.51469;   -14.71774;    5.00000;    0;

(C:1.303229359865713,(G:1.184262389767834,(F:9.355326885355721,K:1.9913949707859038)I0:0.8785013943118355)I3:0.005027219859507291)I2;

10;    -47.01271;    6.00000;    -23.75691;   -23.25580;    6.00000;    1;

((((G:0.6775006643112376)I1#H1:0.10254403746151251::0.6222138281679854,(K:0.40305290671104776,F:11.774054935191478)I3:1.0127444936379693)I4:0.11119651081773953,C:0.6245266587065907)I2:0.24534630112752273,I1#H1:0.6332472675472998::0.3777861718320146)I0;

----------------------- Summarization: -----------------------

Burn-in = 5000, Chain length = 10000, Sample size = 5 Acceptance rate = 0.62080 

         --------------- Operations ---------------

Operation:Move-Head; Used:196; Accepted:35 ACrate:0.17857142857142858

Operation:Change-Length; Used:5782; Accepted:5628 ACrate:0.9733656174334141

Operation:Add-Reticulation; Used:92; Accepted:1 ACrate:0.010869565217391304

Operation:Flip-Reticulation; Used:191; Accepted:44 ACrate:0.23036649214659685

Operation:Change-Probability; Used:172; Accepted:165 ACrate:0.9593023255813954

Operation:Move-Tail; Used:3564; Accepted:332 ACrate:0.0931537598204265

Operation:Delete-Reticulation; Used:3; Accepted:3 ACrate:1.0

 

Overall MAP = -39.13178296704008

(((G:1.8467353971326255,F:3.669429912707025)I4:0.72972613976057,C:0.7672463997220461)I0:0.14233512944191162,K:7.34631035390478)I3;

         -------------- 95% credible set of topologies: --------------

Rank = 0; Size = 2; Percent = 40.00; MAP = -39.23243337560144:(C:1.303229359865713,(G:1.184262389767834,(F:9.355326885355721,K:1.9913949707859038)I0:0.8785013943118355)I3:0.005027219859507291)I2; Ave=-39.25394715434493; (C:1.20916762172346,(G:2.057607023787008,(K:4.61838854181666,F:6.605842763421295)I0:0.5911795016873735)I4:0.1657841343330578)I3;

Rank = 1; Size = 1; Percent = 20.00; MAP = -47.01271365298103:((((G:0.6775006643112376)I1#H1:0.10254403746151251::0.6222138281679854,(K:0.40305290671104776,F:11.774054935191478)I3:1.0127444936379693)I4:0.11119651081773953,C:0.6245266587065907)I2:0.24534630112752273,I1#H1:0.6332472675472998::0.3777861718320146)I0; Ave=-47.01271365298103; ((G:0.6775006643112376)I1#H1:0.6332472675472998::0.3777861718320146,(C:0.6245266587065907,((F:11.774054935191478,K:0.40305290671104776)I3:1.0127444936379693,I1#H1:0.10254403746151251::0.6222138281679854)I4:0.11119651081773953)I2:0.24534630112752273)I0;

Rank = 2; Size = 1; Percent = 20.00; MAP = -39.13178296704008:(((G:1.8467353971326255,F:3.669429912707025)I4:0.72972613976057,C:0.7672463997220461)I0:0.14233512944191162,K:7.34631035390478)I3; Ave=-39.13178296704008; (K:7.34631035390478,(C:0.7672463997220461,(F:3.669429912707025,G:1.8467353971326255)I4:0.72972613976057)I0:0.14233512944191162)I3;

Rank = 3; Size = 1; Percent = 20.00; MAP = -40.2382981193348:(((F:7.823288510084807,K:6.50888387662072)I0:0.32861454580215427,C:0.8862007905628368)I4:0.1556168196177095,G:1.218111399301718)I3; Ave=-40.2382981193348; (G:1.218111399301718,(C:0.8862007905628368,(K:6.50888387662072,F:7.823288510084807)I0:0.32861454580215427)I4:0.1556168196177095)I3; 

Total elapsed time : 31.53400 s

Summarization

  • Topologies: the samples (samples from burn-in period are excluded) from all the files are combined. The topologies are ranked on their posterior probabilities. Only the topologies in the 95% credible set are printed out. The first topology is MPP (maximum posterior probability) topology.
  • PSRF (Potential Scale Reduction Factor): The PSRF values of posterior, likelihood and prior are calculated respectively. A PSRF value indicates good mixing when it approaches 1.0. 
  • Sojourns: a method to evaluate convergence/mixing. 'Sojourn' is a consecutive series of samples in which only the topology of interest was found. For each file, we summarize the frequency, the posterior probability, the number of sojourns, and the max and average length of sojourns for each topology in the 95% credible set. Ideally the number of sojourns is large while the length of sojourns is small -- the ability to leave and then return quickly and repeatedly to the same topology suggests good mixing with respect to topologies. 
  • SRQ (Scaled Regeneration Quantile) Plot: a method to evaluate convergence/mixing via topologies. Let Ti be the number of sampled MPP topology in the first i iteration of the stationary phase. For each file, we extract <i/n, Ti/Tn> for every i in 1...n into <x,y>. The slope of <x,y> in an SRQ plot should ideally be close to the posterior probability of MPP. Departures from this indicate that at some points the chain was on a trajectory that should have led to a different final posterior probability. 
  • Trace Plot: a method to evaluate convergence/mixing via posterior values. For each file, we extract a list of posterior values from the sampling phase. The format is compatible with Matlab and Python. 
---------------- Topologies ----------------

PosteriorProbability Topology

0.4500 ((TaB:0.2658858620952823)I2#H1:0.7341141379047177::0.8708100650901477,(TaA:1.0,(TaD:0.32466899225557677,I2#H1:1.3465736635156926::0.1291899349098523)I3:2.4002600314073748)I1:0.19747493612759992)I0;
0.2500 ((TaD:1.0,TaA:1.0):1.0,TaB:1.0);
0.1500 (((TaD:1.57737312713865)I3#H1:0.7217516903113591::0.42596342801648157,TaB:0.3702926554366514)I2:0.5822736350667996,(I3#H1:0.8251033741688848::0.5740365719835184,TaA:0.5942441726893598)I1:0.6915888106669655)I0;
0.1500 ((((TaA:0.2655477426293258)I1#H1:0.11325339702702007::0.4241273916272569,TaD:1.935302259486147)I3:0.786549353980848,TaB:0.40541038525824113)I2:0.2974313193957629,I1#H1:1.0824385015100584::0.5758726083727431)I0;
---------------- PSRF ----------------
Posterior - 1.0498673507956626
Likelihood - 1.09497860895306455
Prior - 1.0576479101618842

---------------- Sojourn ----------------
Topology Frequency Posterior #sojourns max(sjs) ave(sjs)
1 & 4 & 0.4000 & 2 & 3 & 2.00 \\
2 & 4 & 0.4000 & 1 & 4 & 4.00 \\
3 & 1 & 0.1000 & 1 & 1 & 1.00 \\
4 & 1 & 0.1000 & 1 & 1 & 1.00 \\
Topology Frequency Posterior #sojourns max(sjs) ave(sjs)
1 & 8 & 0.8000 & 1 & 8 & 8.00 \\
2 & 2 & 0.2000 & 1 & 2 & 2.00 \\
3 & 0 & 0.0000 & 0 & 0 & NaN \\
4 & 0 & 0.0000 & 0 & 0 & NaN \\
Topology Frequency Posterior #sojourns max(sjs) ave(sjs)
1 & 4 & 0.4000 & 2 & 2 & 2.00 \\
2 & 2 & 0.2000 & 1 & 2 & 2.00 \\
3 & 1 & 0.1000 & 1 & 1 & 1.00 \\
4 & 3 & 0.3000 & 2 & 2 & 1.50 \\
Topology Frequency Posterior #sojourns max(sjs) ave(sjs)
1 & 2 & 0.2000 & 2 & 1 & 1.00 \\
2 & 2 & 0.2000 & 1 & 2 & 2.00 \\
3 & 4 & 0.4000 & 3 & 2 & 1.33 \\
4 & 2 & 0.2000 & 2 & 1 & 1.00 \\

---------------- SRQ Plot ----------------
y0 = [0.25,0.5,0.75,1.0];
x0 = [0.5,0.6,0.7,1.0];
y1 = [0.125,0.25,0.375,0.5,0.625,0.75,0.875,1.0];
x1 = [0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0];
y2 = [0.25,0.5,0.75,1.0];
x2 = [0.3,0.4,0.8,0.9];
y3 = [0.5,1.0];
x3 = [0.5,0.7];
---------------- Trace Plot ----------------

y0 = [-2736.2903,-2487.17354,-2464.78381,-2465.97101,-2452.65531,-2443.28357,-2439.46916,-2441.01877,-2439.93589,-2442.0554,];

y1 = [-2736.2903,-2469.68094,-2438.08212,-2438.52812,-2439.02494,-2438.4918,-2439.33072,-2438.70721,-2438.68125,-2438.68995,];
y2 = [-2736.2903,-2474.94649,-2446.39497,-2438.87233,-2440.56426,-2442.1411,-2438.8736,-2439.18374,-2438.65952,-2439.23713,];
y3 = [-2736.2903,-2466.11338,-2451.75728,-2439.36378,-2439.58731,-2437.75625,-2439.12275,-2438.99142,-2439.20819,-2440.1315,]; 

 

...