In our module we want to explore how we can build and plot phylogenetic trees based on molecular and morphological criteria and how to compare these trees statistically.
Evolutionary biologists study the ways organisms change over time. This is often done by comparing several species to one another and analyzing the differences between them. These differences can then be used to build a graphical representation of species’ relationships to one another, called a phylogenetic tree. Originally, these trees were constructed based solely on organism morphology using shared derived traits, or synapomorphies, often in the form of character matrices. By focusing on these physical traits, biologists could hypothesize different life histories, like species divergence or common descent. However, using morphological data to determine taxonomic relationships is not foolproof. Some shared, analogous traits may be due to convergent evolution. Homoplasy can be difficult to detect through morphological analysis alone, and has historically resulted in phylogenetic trees that have since been disproved through the use of molecular data.
Modern biologists can now build phylogenetic trees based on the DNA of the organisms in question. By focusing on a few specific genes, evolutionary relationships can be hypothesized with greater accuracy. Examining the polymorphism at different loci can provide information on how distantly related organisms are to one another, and can reduce errors commonly associated with comparing or identifying cryptic species. Molecular data can help resolve unclear phylogenies that were created before genomic methods were available. Biologists can now revisit unresolved trees built from morphological data and revise them using DNA, allowing us to better hypothesize evolutionary relationships.
In order to assess any changes between these hypothesized evolutionary relationships, we can compare the topologies of phylogenetic trees. The topology refers to the branching pattern displayed, which represents the measure of relatedness among taxa. To understand the importance of topology, we must familiarize ourselves with the anatomy of a phylogenetic tree.
Anatomy of a tree:
There are several different ways to estimate phylogenies, each with their own strengths, weaknesses, and appropriate situations in which to apply them. While we won’t go over them in detail, familiarize yourselves by looking at Module 24
There are MANY different phylogenetics software that we can use to employ these methods of estimation, but some of the most common are PAUP, BEAST, MrBayes, and PHYLIP. These can take genetic data (alignments) in the form of fastq files, fasta files, or NEXUS files.
Types of Tree Formats: Newick, Nexus
Newick: A collection of data formatted using specific syntax that includes parentheses, commas, and semicolons to delineate weight, time, and evolutionary distance. “Newick files are simply text files that consist of one or more tree descriptions in the Newick notation. In contrast to Nexus files they contain no further syntax elements or other information than the trees.”
We will be comparing two graphical hypotheses of the genus adelpha, a group of butterflies.
In this module, we will compare tree topologies for two different analyses of the genus Adelpha; one based on morphological data and the other based on molecular data. The morphological-based tree is from Keith Wilmott’s 2003 paper “Cladistic analysis of the Neotropical butterfly genus Adelpha (Lepidoptera: Nymphalidae), with comments on the subtribal classification of Limenitidini.” Our molecular tree is from Emily Ebel’s 2015 paper “Rapid diversification associated with ecological specialization in Neotropical Adelpha butterflies.”
For starters, you will need to install {ape} {phytools} {fastmatch} {quadprog} {phangorn} {geiger} {ggplot2} {gridExtra} {ggtree} {dendexend}
Unlike many packages we’ve used this semester, you will need to install ggtree from the BiocManager, not baseR. You can do this using the code below
if (!requireNamespace("BiocManager", quietly = TRUE))
+ install.packages("BiocManager")
BiocManager::install("ggtree")
## Bioconductor version 3.13 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'ggtree'
## Old packages: 'Matrix'
library(ape)
library(fastmatch)
library(quadprog)
library(phangorn)
library(phytools)
## Loading required package: maps
library(geiger)
library(ggplot2)
library(ggtree)
## ggtree v3.0.4 For help: https://yulab-smu.top/treedata-book/
##
## If you use ggtree in published research, please cite the most appropriate paper(s):
##
## 1. Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi:10.1002/cpbi.96
## 2. Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, 35(12):3041-3043. doi:10.1093/molbev/msy194
## 3. Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
##
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
##
## rotate
For these packages to work effectively with the code below, its important to make sure you’re using the most up-to-date versions.
You can check that you have the most up-to-date version of R by running the command “R.Version”R.version
## _
## platform x86_64-apple-darwin17.0
## arch x86_64
## os darwin17.0
## system x86_64, darwin17.0
## status
## major 4
## minor 1.1
## year 2021
## month 08
## day 10
## svn rev 80725
## language R
## version.string R version 4.1.1 (2021-08-10)
## nickname Kick Things
packageVersion("ape")
## [1] '5.5'
To start off, we want to understand how a tree is created. For this example we will make our own tree using Newick format.
Here is a great explanation of Newick format: “Put simply, monophyletic clades are surrounded by parentheses and sister clades are separated by commas. For example, a simple tree could be written as (((A,B),C),(D,E)).” Let’s try making that!
If you already know the relationships between the groups of species that you want to plot, you can simply write out the tree as a text string in Newick format! Let’s try it first with letters. Try to recreate the tree in the image below by writing it as a text string in Newick format
text.string<-
"(((A,B),C),(D,E));"
example.tree<-read.tree(text=text.string)
plot(example.tree,no.margin=TRUE,edge.width=2)
Hey! Guess What?? Whales are cool!
For our second challenge, try recreating the whale clade in the image below using Newick notation and the read.tree() function
text.string<-
"((((humpback wahle, fin whale), (Antarctic minke whale, common minke whale)), bowhead whale), sperm whale);"
whale.tree<-read.tree(text=text.string)
plot(whale.tree,no.margin=TRUE,edge.width=2)
roundPhylogram(whale.tree) #creates rounded branches
plot(unroot(whale.tree),type="unrooted",no.margin=TRUE,lab4ut="axial",
edge.width=4) #creates an unrooted tree
plot(whale.tree, type = "fan", edge.width = 3, show.tip.label = FALSE)
While these trees all show the same information it is helpful to have different representations. For example this fan tree helps visualize that the evolutionary relationships between species are non linear. It is often confused that certain species or genera are more “evolved” or “older” but tree formats like fan trees are a helpful way to show how this is misleading.
Try using the plot() function to plot our whale tree as a phylogram with different branch and tip label colors
plot(whale.tree, type = "phylogram", edge.color = "red")
plot(whale.tree, type = "phylogram", edge.color = "red", tip.color = "darkgreen")
plot(whale.tree, type = "phylogram", edge.color = "red", tip.color = "darkgreen", font = 4)
Now that we are a little more comfortable working with phylogenetic trees in R, we can load our first tree!
First, copy the tree data at {This Link} Then go to your own working repo, select the “Create New File” option and paste the tree data into that file.
Once you have the tree file saved in your repo, you can load the tree into your R markdown file using the “read.nexus” commandmol.tree<-read.nexus(file="NJst.tre")
mol.tree
##
## Phylogenetic tree with 66 tips and 64 internal nodes.
##
## Tip labels:
## A_rothschildi, A_sichaeus, A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, ...
##
## Unrooted; includes branch lengths.
par(mfrow = c(3, 3))
plot(mol.tree, type = "phylogram", edge.color = "purple", font = 1, show.tip.label = FALSE)
plot(unroot(mol.tree),type="unrooted",cex=0.6,
use.edge.length=FALSE,lab4ut="axial",
no.margin=TRUE, show.tip.label = FALSE, edge.color = "red")
plot(mol.tree, type = "radial", edge.color = "darkgreen", tip.color = "black", font = 1, show.tip.label = FALSE)
plot(mol.tree, type = "fan", edge.color = "blue", tip.color = "black", font = 1, show.tip.label = FALSE)
plot(mol.tree, type = "cladogram", edge.color = "orange", tip.color = "black", font = 1, show.tip.label = FALSE)
Ntip(mol.tree) ##66 species in this tree
## [1] 66
##all the species names
mol.tree$tip.label
## [1] "A_rothschildi" "A_sichaeus"
## [3] "A_boreas_boreas" "A_saundersii_saundersii"
## [5] "A_attica_attica" "A_leuceroides"
## [7] "A_zina_irma" "A_zina_zina"
## [9] "A_jordani" "A_justina_valentina"
## [11] "A_olynthia" "A_boeotia_boeotia"
## [13] "A_malea_aethalia" "A_delinita"
## [15] "A_heraclea" "A_naxia"
## [17] "A_capucinus_capucinus" "A_mesentina"
## [19] "A_phylaca_pseudaethalia" "A_lycorias_lara"
## [21] "A_lycorias_spruceana" "A_erotia_erotia"
## [23] "A_pollina" "A_irmina_tumida"
## [25] "A_leucophthalma_irminella" "A_cocala_cocala"
## [27] "L_lorquini" "L_weidemeyerii"
## [29] "L_arthemis_arizonensis" "L_arthemis_arthemis"
## [31] "L_archippus_floridanensis" "L_populi"
## [33] "L_sydyi" "L_amphyssa"
## [35] "L_moltrechti" "L_doerriesi"
## [37] "L_helmanni" "L_camilla"
## [39] "L_homeyeri" "L_glorifica"
## [41] "L_reducta" "A_donysa_donysa"
## [43] "A_pithys" "A_alala_negra"
## [45] "A_tracta" "A_corcyra_aretina"
## [47] "Athyma_selenophora" "Pandita_sinope"
## [49] "Sumalia_daraxa" "Parasarpa_zayla"
## [51] "Moduza_urdaneta" "A_seriphia_aquillia"
## [53] "A_seriphia_therasia" "A_serpa_celerio"
## [55] "A_melona_leucocoma" "A_cytherea_cytherea"
## [57] "A_cytherea_daguana" "A_salmoneus_colada"
## [59] "A_iphicleola_thessalita" "A_iphiclus_iphiclus"
## [61] "A_thessalia_thessalia" "A_epione_agilla"
## [63] "A_ethelda_ethelda" "A_basiloides"
## [65] "A_plesaure_phliassa" "A_shuara"
A great way to draw attention to a specific species in a tree plot is to add an arrow pointing to a specific species. Now that you can see all the species names, pick a species and use add.arrow() to draw an arrow to it!
##add an arrow on a specific branch tip
plot(mol.tree)
add.arrow(mol.tree,tip="A_saundersii_saundersii",arrl=4)
plot(mol.tree,no.margin=TRUE,edge.width=2)
nodelabels(text=1:mol.tree$Nnode,node=1:mol.tree$Nnode+Ntip(mol.tree))
plot(mol.tree, "fan", main="Molecular Tree", edge.width = 3, edge.color = "pink", show.tip.label = FALSE)
arc.cladelabels(tree=mol.tree, "A. rothschildi", node = 4, cex=1, orientation="curved")
string <- "((A_alala_negra, A_corcyra_aretina, A_tracta, A_pithys, A_donysa_donysa), ((A_serpacelerio, A_seriphiaaquillia, A_seriphiatherasia), (A_melonaleucocoma, A_salmoneus_colada, A_cytherea_cytherea, A_cythereadaguana, A_epioneagilla, A_etheldaethelda, A_thessaliathessalia, A_iphicleolathessalita, A_iphiclusiphiclus, A_shuara, A_plesaurephliassa, A_basiloides, A_atticaattica, A_leucerioides, A_saundersiisaundersii, A_boreasboreas, A_rothschildi, A_sichaeus, A_cocalacocala, A_leucophthalmairminella, A_irminatumida, A_pollina, A_lycoriaslara, A_lycoriasspruceana, A_erotia_erotia, A_mesentina, A_phylaca_pseudaethalia, A_capucinus_capucinus, A_heraclea_heraclea, A_naxia_naxia, A_justina_valentina, A_olynthia, A_jordani, A_zinazina, A_zinairma, A_delinita_delinita, A_boeotia_boeotia, A_maleaaethalia)));"
morph.tree<-read.tree(text=string)
plot(morph.tree,no.margin=TRUE,edge.width=2)
par(mfrow = c(3, 3))
plot(morph.tree, type = "phylogram", edge.color = "purple", font = 1, show.tip.label = FALSE)
plot(unroot(morph.tree),type="unrooted",cex=0.6,
use.edge.length=FALSE,lab4ut="axial",
no.margin=TRUE, show.tip.label = FALSE, edge.color = "red")
plot(morph.tree, type = "radial", edge.color = "darkgreen", tip.color = "black", font = 1, show.tip.label = FALSE)
plot(morph.tree, type = "fan", edge.color = "blue", tip.color = "black", font = 1, show.tip.label = FALSE)
plot(morph.tree, type = "cladogram", edge.color = "orange", tip.color = "black", font = 1, show.tip.label = FALSE)
#Morphological tree plot
library(ggtree)
p <- ggtree(morph.tree, color="purple", size=0.5)
p <- p + geom_tiplab(size=2)
p
#Molecular tree plot
g <- ggtree(mol.tree, color="darkgreen", size=0.5)
g <- g + geom_tiplab(size=2)
g
Now we want to see these trees side by side, we’ll go over direct comparison later in the module but for now, use the grid.arrange() command in gridExtra
to place these trees side by side. Notice the major differences between the two, how might these compare?
library("gridExtra")
grid.arrange(
ggtree(mol.tree, color="purple") + theme_tree("hotpink"),
ggtree(morph.tree, color="hotpink") + theme_tree("purple"),
ncol=2)
The first thing you probably notice is the difference in complexity, the molecular tree has a more detailed relationship between species than the morphological tree.
all.equal(mol.tree, morph.tree, use.edge.length = TRUE, use.tip.label = TRUE)
## [1] FALSE
It’s a good exploratory function, but after we learn that they are not the same, we have to dig deeper.
Next we’ll try comparing phylogenetic tree topographies using the comparePhylo() function from the APE package. Check the output to see if these trees are ultrametric and compare the differences in nodes. How many nodes does the morphological tree have? How many does the molecular have? Going one step further, how many of the clades in the molecular tree are not represented in the morphological tree.
comparePhylo(morph.tree, mol.tree, plot = FALSE, force.rooted = TRUE, use.edge.length = FALSE)
## => Comparing morph.tree with mol.tree.
## Trees have different numbers of tips: 46 and 66.
## Tips in morph.tree not in mol.tree : A_serpacelerio, A_seriphiaaquillia, A_seriphiatherasia, A_melonaleucocoma, A_cythereadaguana, A_epioneagilla, A_etheldaethelda, A_thessaliathessalia, A_iphicleolathessalita, A_iphiclusiphiclus, A_plesaurephliassa, A_atticaattica, A_leucerioides, A_saundersiisaundersii, A_boreasboreas, A_cocalacocala, A_leucophthalmairminella, A_irminatumida, A_lycoriaslara, A_lycoriasspruceana, A_heraclea_heraclea, A_naxia_naxia, A_zinazina, A_zinairma, A_delinita_delinita, A_maleaaethalia.
## Tips in mol.tree not in morph.tree : A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, A_zina_irma, A_zina_zina, A_malea_aethalia, A_delinita, A_heraclea, A_naxia, A_lycorias_lara, A_lycorias_spruceana, A_irmina_tumida, A_leucophthalma_irminella, A_cocala_cocala, L_lorquini, L_weidemeyerii, L_arthemis_arizonensis, L_arthemis_arthemis, L_archippus_floridanensis, L_populi, L_sydyi, L_amphyssa, L_moltrechti, L_doerriesi, L_helmanni, L_camilla, L_homeyeri, L_glorifica, L_reducta, Athyma_selenophora, Pandita_sinope, Sumalia_daraxa, Parasarpa_zayla, Moduza_urdaneta, A_seriphia_aquillia, A_seriphia_therasia, A_serpa_celerio, A_melona_leucocoma, A_cytherea_daguana, A_iphicleola_thessalita, A_iphiclus_iphiclus, A_thessalia_thessalia, A_epione_agilla, A_ethelda_ethelda, A_plesaure_phliassa.
## Trees have different numbers of nodes: 5 and 64.
## morph.tree is rooted, mol.tree is unrooted.
## Both trees are not ultrametric.
## 4 clades in morph.tree not in mol.tree.
## 63 clades in mol.tree not in morph.tree.
This tells you that both the trees are non-ultrametric. Essentially, both trees have structure and are not one big polytomy, and branch lengths differ, although one phylogeny is more structured than the other. Can you tell which one it is?
It also tells us that they have different numbers of tips (46 and 66) which makes sense because they aren’t looking at the same number of individuals. Naturally because of this, these trees will inherently be different from each other structurally. The morphological tree has fewer nodes not only because it’s hypothesizing the relationships among fewer individuals, but also because there is more uncertainty. This tree is comb like, and we can interpret this as an unresolved phylogeny.
When both trees are non-ultrametric, the number of analyses that we can perform is limited. Let’s explore what more we can learn about these trees.
Check to see if they are binary. According to the APE package, “The test differs whether the tree is rooted or not. An urooted tree is considered binary if all its nodes are of degree three (i.e., three edges connect to each node). A rooted tree is considered binary if all nodes (including the root node) have exactly two descendant nodes, so that they are of degree three expect the root which is of degree 2.” Our molecular tree is unrooted and our morphological tree is rooted, so keep that in mind.is.binary(mol.tree)
## [1] TRUE
is.binary(morph.tree)
## [1] FALSE
library(ape)
library(phytools)
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.15.2
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:ggtree':
##
## rotate
## The following object is masked from 'package:geiger':
##
## is.phylo
## The following object is masked from 'package:phytools':
##
## untangle
## The following objects are masked from 'package:ape':
##
## ladderize, rotate
## The following object is masked from 'package:stats':
##
## cutree
library(phylogram)
##
## Attaching package: 'phylogram'
## The following object is masked from 'package:dendextend':
##
## prune
tree1 <- mol.tree
tree1 <- midpoint.root(tree1)
tree2 <- morph.tree
tree1 <- compute.brlen(tree1)
tree2 <- compute.brlen(tree2)
tree1<- as.dendrogram(tree1)
tree2<- as.dendrogram(tree2)
dndlist <- dendextend::dendlist(tree1, tree2)
prunedplot <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
prunedplotri <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "t")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
labels(tree1) #A_tracta is 19
## [1] "L_arthemis_arizonensis" "L_arthemis_arthemis"
## [3] "L_lorquini" "L_weidemeyerii"
## [5] "L_archippus_floridanensis" "L_populi"
## [7] "L_sydyi" "L_amphyssa"
## [9] "L_moltrechti" "L_doerriesi"
## [11] "L_helmanni" "L_camilla"
## [13] "L_homeyeri" "L_glorifica"
## [15] "L_reducta" "A_donysa_donysa"
## [17] "A_pithys" "A_alala_negra"
## [19] "A_tracta" "A_corcyra_aretina"
## [21] "Athyma_selenophora" "Pandita_sinope"
## [23] "Sumalia_daraxa" "Parasarpa_zayla"
## [25] "Moduza_urdaneta" "A_seriphia_aquillia"
## [27] "A_seriphia_therasia" "A_serpa_celerio"
## [29] "A_melona_leucocoma" "A_rothschildi"
## [31] "A_sichaeus" "A_boreas_boreas"
## [33] "A_saundersii_saundersii" "A_attica_attica"
## [35] "A_leuceroides" "A_zina_irma"
## [37] "A_zina_zina" "A_jordani"
## [39] "A_justina_valentina" "A_olynthia"
## [41] "A_boeotia_boeotia" "A_malea_aethalia"
## [43] "A_delinita" "A_heraclea"
## [45] "A_naxia" "A_capucinus_capucinus"
## [47] "A_mesentina" "A_phylaca_pseudaethalia"
## [49] "A_lycorias_lara" "A_lycorias_spruceana"
## [51] "A_erotia_erotia" "A_pollina"
## [53] "A_irmina_tumida" "A_leucophthalma_irminella"
## [55] "A_cocala_cocala" "A_basiloides"
## [57] "A_plesaure_phliassa" "A_shuara"
## [59] "A_iphicleola_thessalita" "A_iphiclus_iphiclus"
## [61] "A_thessalia_thessalia" "A_epione_agilla"
## [63] "A_ethelda_ethelda" "A_cytherea_cytherea"
## [65] "A_cytherea_daguana" "A_salmoneus_colada"
tree1 <- color_labels(tree1, col = "red", labels = labels(tree1)[c(19)])
labels(tree2)
## [1] "A_alala_negra" "A_corcyra_aretina"
## [3] "A_tracta" "A_pithys"
## [5] "A_donysa_donysa" "A_serpacelerio"
## [7] "A_seriphiaaquillia" "A_seriphiatherasia"
## [9] "A_melonaleucocoma" "A_salmoneus_colada"
## [11] "A_cytherea_cytherea" "A_cythereadaguana"
## [13] "A_epioneagilla" "A_etheldaethelda"
## [15] "A_thessaliathessalia" "A_iphicleolathessalita"
## [17] "A_iphiclusiphiclus" "A_shuara"
## [19] "A_plesaurephliassa" "A_basiloides"
## [21] "A_atticaattica" "A_leucerioides"
## [23] "A_saundersiisaundersii" "A_boreasboreas"
## [25] "A_rothschildi" "A_sichaeus"
## [27] "A_cocalacocala" "A_leucophthalmairminella"
## [29] "A_irminatumida" "A_pollina"
## [31] "A_lycoriaslara" "A_lycoriasspruceana"
## [33] "A_erotia_erotia" "A_mesentina"
## [35] "A_phylaca_pseudaethalia" "A_capucinus_capucinus"
## [37] "A_heraclea_heraclea" "A_naxia_naxia"
## [39] "A_justina_valentina" "A_olynthia"
## [41] "A_jordani" "A_zinazina"
## [43] "A_zinairma" "A_delinita_delinita"
## [45] "A_boeotia_boeotia" "A_maleaaethalia"
tree2 <- color_labels(tree2, col = "red", labels = labels(tree2)[c(3)])
#highlighting certain taxa in each tree manually
dndlist <- dendextend::dendlist(tree1, tree2)
prunedplottaxa <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
Now it’s your turn! Try selecting a species to isolate and then identifying its number. You can also isolate multiple species or change the color of the label.
tree1 <- color_labels(tree1, col = "hotpink", labels = labels(tree1)[c(38)])
labels(tree2)
## [1] "A_alala_negra" "A_corcyra_aretina"
## [3] "A_tracta" "A_pithys"
## [5] "A_donysa_donysa" "A_serpacelerio"
## [7] "A_seriphiaaquillia" "A_seriphiatherasia"
## [9] "A_melonaleucocoma" "A_salmoneus_colada"
## [11] "A_cytherea_cytherea" "A_cythereadaguana"
## [13] "A_epioneagilla" "A_etheldaethelda"
## [15] "A_thessaliathessalia" "A_iphicleolathessalita"
## [17] "A_iphiclusiphiclus" "A_shuara"
## [19] "A_plesaurephliassa" "A_basiloides"
## [21] "A_atticaattica" "A_leucerioides"
## [23] "A_saundersiisaundersii" "A_boreasboreas"
## [25] "A_rothschildi" "A_sichaeus"
## [27] "A_cocalacocala" "A_leucophthalmairminella"
## [29] "A_irminatumida" "A_pollina"
## [31] "A_lycoriaslara" "A_lycoriasspruceana"
## [33] "A_erotia_erotia" "A_mesentina"
## [35] "A_phylaca_pseudaethalia" "A_capucinus_capucinus"
## [37] "A_heraclea_heraclea" "A_naxia_naxia"
## [39] "A_justina_valentina" "A_olynthia"
## [41] "A_jordani" "A_zinazina"
## [43] "A_zinairma" "A_delinita_delinita"
## [45] "A_boeotia_boeotia" "A_maleaaethalia"
tree2 <- color_labels(tree2, col = "hotpink", labels = labels(tree2)[c(41)])
dndlist <- dendextend::dendlist(tree1, tree2)
prunedplottaxa <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
prunedplotcolor <- dendextend::tanglegram(dndlist, faster = TRUE, margin_inner = 7, lab.cex = .88, lwd =1, edge.lwd = 0.5, type = "r", main_left = "Molecular", main_right = "Morphological", highlight_branches_col = TRUE, highlight_branches_lwd = TRUE, k_branches = 5, k_labels=2, center=TRUE)
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
## Warning in cutree.dendrogram(dend, k = k, h = h, order_clusters_as_data =
## FALSE): It is impossible to produce a one-to-one cut for the k/h you specidied.
## 0's have been introduced. Note that this result would be different from R's
## default cutree output.
#color coded tanglegram
x <- prunedplot
x %>% plot(main = paste("entanglement =", round(entanglement(x), 2)))
The entanglement score of our trees is 0.44
Next we are going to work with the command untangle() in dendextend. This useful command will rearrange the tree into the simplest configuration without changing the phylogenetic relationships.prunedplotuntangle<-prunedplot %>% untangle(method = "step1side") %>% tanglegram(common_subtrees_color_lines = TRUE, faster = TRUE, margin_inner = 7, lab.cex = .88, lwd =1, edge.lwd = 0.5, type = "r", main_left = "Molecular", main_right = "Morphological", highlight_branches_col = TRUE, highlight_branches_lwd = TRUE, k_branches = 5, k_labels=2, center=TRUE)
## Warning in cutree.dendrogram(dend, k = k, h = h, order_clusters_as_data =
## FALSE): It is impossible to produce a one-to-one cut for the k/h you specidied.
## 0's have been introduced. Note that this result would be different from R's
## default cutree output.
#untangled color coded tanglegram
Now we can clearly see the associations between these trees and the individuals and how they were rearranged. You can go branch by branch and follow the shaded lines to view where they are on the opposite tree.
As we discussed above, untangling your tanglegram will help remove any patterns in the tanglegram and the entanglement analysis that are not due to phylogentic relationships. Try using the entanglement function to get an entanglement score for the untangled tanglegram (say that 5 times fast).
y <- prunedplotuntangle
y %>% plot(main = paste("entanglement =", round(entanglement(y), 2)))
The entanglement score of our untangled trees is 0.01 which is lower than the entanglement score of the original tanglegram. Generally an entanglement from an untangled tanglegram will be more representative of the true phylogenetic differences between two trees.
For those interested in comparing between morphological and molecular trees using a statistical analysis, there will be several things that may hinder you. Your tree must have branch lengths greater than zero, which may prove difficult if your morphological tree lacks structure like Wilmott 2003. Having branch lengths of zero may also be a common problem since morphological trees are largely unresolved and have high amounts of polytomy. Currently, there are no accessible ,reliable ways to statistically compare morphological trees to molecular trees, but biologists are working towards a solution for this. (Lee & Palci, 2015).
This may also be difficult if you are using public data, like from dryad, because for an in-depth analysis, it’s often necessary to have the character matrix of morphological traits that the tree was originally built on, and that information may not be included (Sil et al, 2018).
Also, comparing just one morphological tree to just one molecular tree gives us a sample size of two, which would not yield reliable results. In similar studies measuring congruence of morphological vs molecular trees, the sample size is much higher, over 200 (Pisani, Benton & Wilkinson, 2007).
Our original intention was to investigate how to compare morphological and molecular trees in order to see the differences in efficiency and how data visualization has progressed over the years. However, what we came to realize was that it’s not possible to statistically compare these two kinds of trees without a common factor like branch length. It would be desirable to make this comparison because it would give us a measure of how our scientific techniques have advanced but also give us a more accurate representation of evolution.