...
Detects and reconstructs horizontal gene transfer events from phylogenetic incongruence. The input trees must be specified in the rich newick format Rich Newick Format.
In brief, the algorithm reads in a single species tree and any number of gene trees. It returns the horizontal gene transfer events. The tool performs some algorithmic techniques to speed up the detection of HGT. One technique is to partition the solutions into independent subsets, and the tool then finds equivalent solutions to for every subset. In this way, the tool decreases the size, and thus decreases the running time, of the search space. The second technique is to handle (non-binary) caterpillars by replacing them with a chain of three leaves. In general, species and gene trees can be non-binary. There are various reasons for this, such as tree reconstruction errors, insufficient information to build trees. The tool maximally refines trees so that it does not introduce new HGT before detecting HGT.
The tool prints solutions for each pair of species tree and gene tree. For each pair, the tool first prints the species/gene trees, in this order, with the internal nodes labeled (this labeling is needed for printing the HGT edges). The internal node names in the HGT events refer to the names in the species, and not the gene, tree. Because solutions might share common HGT events, we group and print them by components to make the tool’s ouput more concise and informative. From the tool’s output, one can get a complete solution by selecting a subsolution from each component. For example, let’s consider the following species and gene trees:
Code Block | ||
---|---|---|
| ||
ST = ((e,(f,g)I1)I2,((a,(b,c)I3)I4,d)I0)I5;
GT = (((a:70.0,b:75.0):90.0,c:80.0):60.0,(((e:80.0,f:75.0):95.0, g:60.0):80.0,d:70.0));
|
HGT events for this pair (ST, GT) are computed and printed by the tool as:
Code Block | ||
---|---|---|
| ||
----------------------------------------------------------------------
Component I5:
Subsolution1:
I2 -> d (70.0)
Subsolution2:
d -> I2 (80.0)
Subsolution3:
I5 -> I4 (70.0) [time violation?]
----------------------------------------------------------------------
Component I2:
Subsolution1:
f -> e (95.0)
Subsolution2:
I2 -> g (95.0) [time violation?]
Subsolution3:
e -> f (95.0)
----------------------------------------------------------------------
Component I4:
Subsolution1:
b -> a (90.0)
14
Subsolution2:
I4 -> c (90.0) [time violation?]
Subsolution3:
a -> b (90.0)
**********************************************************************
|
There are 27 possible solutions for this pair of trees, each of which consists of events from sub-solutions of each component. The set {d -> I2, f -> e, a -> b
} is a solution, for example. Note that an HGT event has the format:
Code Block | ||
---|---|---|
| ||
sn → tn
|
where sn
and tn
are the names of nodes in the species tree, and are the source and target, respectively, of an HGT edge. This indicates that an edge should be added from the edge incident into sn
to the edge incident into tn in the species tree.
...
The above output format is compact, and so it is very useful for presentation. However, the tool also allows you to display the full solutions, by using the option -e
. In this case, the tool will present solutions in the form of a network in eNewick format.
Usage
Code Block | ||
---|---|---|
| ||
RIATAHGT species_tree_ident {(gene_tree_ident1 [, gene_tree_ident2...]}) [-u] [-p prefix] [-e] [result output file] |
{species_tree_ident} | The input species tree identifier. | mandatory | |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="504834a4-0b35-4e30-aee6-0c625a594826"><ac:plain-text-body><![CDATA[ | {gene_tree_ident1 [, gene_tree_ident2...]} | Comma delimited set list of gene tree identifiers. See details. | mandatory]]></ac:plain-text-body></ac:structured-macro> |
-u | Prevent the trees from being refined and contracted. | optional | |
-p prefix | Specifies a name prefix used to label internal nodes. If no prefix is specified, then the default prefix | optional | |
-e | Present full solutions. | optional | |
result output file | Optional file destination for command output. | optional |
...
Examples
Code Block | ||
---|---|---|
| ||
#NEXUS BEGIN TREES; Tree speceiesTree = ((e,(f,g):0.63:.70)::.80,((a,(b,c):0.5:.80):0.5:.67,d):0.6); Tree geneTree1 = (((a,b):0.5:.70,c):0.6:.80,(d,((e,f):0.4:.70,g)::.70)::.80); Tree geneTree2 = ((e,(f,g):0.5:.70):0.6:.80,((a,b):0.5:.90,(c,d):0.5:.87):0.57:.72); END; BEGIN PHYLONET; RIATAHGT speceiesTree {(geneTree1, geneTree2}); END |
Command Refernces
- L. Nakhleh, D. Ruths, and L.S. Wang. RIATA-HGT: A fast and accurate heuristic for reconstrucing horizontal gene transfer. In L. Wang, editor, Proceedings of the Eleventh International Computing and Combinatorics Conference (COCOON 05), pages 84–93, 2005. LNCS #3595.
...