1 Tree Comparisons

In our module we want to explore how we can build and plot phylogenetic trees based on molecular and morphological criteria and how to compare these trees statistically.

1.1 Objectives

  1. Practice writing and plotting trees in Newick format
  2. Load phylogenetic tree data into R and plot in baseR and ggtree
  3. Compare tree topologies

1.2 Introduction

Evolutionary biologists study the ways organisms change over time. This is often done by comparing several species to one another and analyzing the differences between them. These differences can then be used to build a graphical representation of species’ relationships to one another, called a phylogenetic tree. Originally, these trees were constructed based solely on organism morphology using shared derived traits, or synapomorphies, often in the form of character matrices. By focusing on these physical traits, biologists could hypothesize different life histories, like species divergence or common descent. However, using morphological data to determine taxonomic relationships is not foolproof. Some shared, analogous traits may be due to convergent evolution. Homoplasy can be difficult to detect through morphological analysis alone, and has historically resulted in phylogenetic trees that have since been disproved through the use of molecular data.

Modern biologists can now build phylogenetic trees based on the DNA of the organisms in question. By focusing on a few specific genes, evolutionary relationships can be hypothesized with greater accuracy. Examining the polymorphism at different loci can provide information on how distantly related organisms are to one another, and can reduce errors commonly associated with comparing or identifying cryptic species. Molecular data can help resolve unclear phylogenies that were created before genomic methods were available. Biologists can now revisit unresolved trees built from morphological data and revise them using DNA, allowing us to better hypothesize evolutionary relationships.

In order to assess any changes between these hypothesized evolutionary relationships, we can compare the topologies of phylogenetic trees. The topology refers to the branching pattern displayed, which represents the measure of relatedness among taxa. To understand the importance of topology, we must familiarize ourselves with the anatomy of a phylogenetic tree.

Anatomy of a tree:

1.3 Methods

There are several different ways to estimate phylogenies, each with their own strengths, weaknesses, and appropriate situations in which to apply them. While we won’t go over them in detail, familiarize yourselves by looking at Module 24
There are MANY different phylogenetics software that we can use to employ these methods of estimation, but some of the most common are PAUP, BEAST, MrBayes, and PHYLIP. These can take genetic data (alignments) in the form of fastq files, fasta files, or NEXUS files.

Types of Tree Formats: Newick, Nexus

Newick: A collection of data formatted using specific syntax that includes parentheses, commas, and semicolons to delineate weight, time, and evolutionary distance. “Newick files are simply text files that consist of one or more tree descriptions in the Newick notation. In contrast to Nexus files they contain no further syntax elements or other information than the trees.”

1.4 Comparing Trees

We will be comparing two graphical hypotheses of the genus adelpha, a group of butterflies. image

In this module, we will compare tree topologies for two different analyses of the genus Adelpha; one based on morphological data and the other based on molecular data. The morphological-based tree is from Keith Wilmott’s 2003 paper “Cladistic analysis of the Neotropical butterfly genus Adelpha (Lepidoptera: Nymphalidae), with comments on the subtribal classification of Limenitidini.” Our molecular tree is from Emily Ebel’s 2015 paper “Rapid diversification associated with ecological specialization in Neotropical Adelpha butterflies.”

1.5 Packages

For starters, you will need to install {ape} {phytools} {fastmatch} {quadprog} {phangorn} {geiger} {ggplot2} {gridExtra} {ggtree} {dendexend}

Unlike many packages we’ve used this semester, you will need to install ggtree from the BiocManager, not baseR. You can do this using the code below

 if (!requireNamespace("BiocManager", quietly = TRUE))
+     install.packages("BiocManager")
 BiocManager::install("ggtree")
## Bioconductor version 3.13 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
##   re-install: 'ggtree'
## Old packages: 'Matrix'
Once all our packages are installed, we can load them into our R markdown file!
library(ape)
library(fastmatch)
library(quadprog)
library(phangorn)
library(phytools)
## Loading required package: maps
library(geiger)
library(ggplot2)
library(ggtree)
## ggtree v3.0.4  For help: https://yulab-smu.top/treedata-book/
## 
## If you use ggtree in published research, please cite the most appropriate paper(s):
## 
## 1. Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics, 2020, 69:e96. doi:10.1002/cpbi.96
## 2. Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution 2018, 35(12):3041-3043. doi:10.1093/molbev/msy194
## 3. Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
## 
## Attaching package: 'ggtree'
## The following object is masked from 'package:ape':
## 
##     rotate

For these packages to work effectively with the code below, its important to make sure you’re using the most up-to-date versions.

You can check that you have the most up-to-date version of R by running the command “R.Version”
R.version
##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          1.1                         
## year           2021                        
## month          08                          
## day            10                          
## svn rev        80725                       
## language       R                           
## version.string R version 4.1.1 (2021-08-10)
## nickname       Kick Things
You can make sure you have the most up-to-date version of each package by using the command “packageVersion(”package-name")
packageVersion("ape")
## [1] '5.5'

1.6 Building Trees in Newick Notation

To start off, we want to understand how a tree is created. For this example we will make our own tree using Newick format.

Here is a great explanation of Newick format: “Put simply, monophyletic clades are surrounded by parentheses and sister clades are separated by commas. For example, a simple tree could be written as (((A,B),C),(D,E)).” Let’s try making that!

1.6.1 CHALLENGE 1

If you already know the relationships between the groups of species that you want to plot, you can simply write out the tree as a text string in Newick format! Let’s try it first with letters. Try to recreate the tree in the image below by writing it as a text string in Newick format

image

1.6.1.1 SOLUTION 1

Once we have the tree written in Newick notation, we can use the command “read.tree()” to read the text string into R as a phylogenetic tree. Once we have a tree, we can plot it using the “plot()” command
text.string<-
    "(((A,B),C),(D,E));"
example.tree<-read.tree(text=text.string)
plot(example.tree,no.margin=TRUE,edge.width=2) 

1.6.2 CHALLENGE 2

Hey! Guess What?? Whales are cool! image

For our second challenge, try recreating the whale clade in the image below using Newick notation and the read.tree() function

image

1.6.2.1 SOLUTION 2

Great Job!
text.string<-
    "((((humpback wahle, fin whale), (Antarctic minke whale, common minke whale)), bowhead whale), sperm whale);"
whale.tree<-read.tree(text=text.string)
plot(whale.tree,no.margin=TRUE,edge.width=2)

1.7 Visualizing Trees in baseR

There are many different commands that will allow you to visualize your tree. Let’s try a few a few in baseR!
roundPhylogram(whale.tree) #creates rounded branches 

The tree we wrote in Newick format is rooted, but we can unroot it in our plot using the code below
plot(unroot(whale.tree),type="unrooted",no.margin=TRUE,lab4ut="axial",
    edge.width=4) #creates an unrooted tree 

We can also make a fan tree! the command “edge.width” allows you to adjust the thickness of tree branches, and you can use “show.tip.label” to remove the tip labels.
plot(whale.tree, type = "fan", edge.width = 3, show.tip.label = FALSE)

While these trees all show the same information it is helpful to have different representations. For example this fan tree helps visualize that the evolutionary relationships between species are non linear. It is often confused that certain species or genera are more “evolved” or “older” but tree formats like fan trees are a helpful way to show how this is misleading.

1.7.1 CHALLENGE 3

Try using the plot() function to plot our whale tree as a phylogram with different branch and tip label colors

1.7.1.1 SOLUTION 3

plot(whale.tree, type = "phylogram",  edge.color = "red")

plot(whale.tree, type = "phylogram",  edge.color = "red", tip.color = "darkgreen")

plot(whale.tree, type = "phylogram",  edge.color = "red", tip.color = "darkgreen", font = 4)

1.8 Load Molecular Tree Data

Now that we are a little more comfortable working with phylogenetic trees in R, we can load our first tree!

First, copy the tree data at {This Link} Then go to your own working repo, select the “Create New File” option and paste the tree data into that file.

Once you have the tree file saved in your repo, you can load the tree into your R markdown file using the “read.nexus” command
mol.tree<-read.nexus(file="NJst.tre")
mol.tree
## 
## Phylogenetic tree with 66 tips and 64 internal nodes.
## 
## Tip labels:
##   A_rothschildi, A_sichaeus, A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, ...
## 
## Unrooted; includes branch lengths.
Once the tree is loaded, we can try plotting it! Here’s a couple ways you can try plotting it:
par(mfrow = c(3, 3))
plot(mol.tree, type = "phylogram",  edge.color = "purple", font = 1, show.tip.label = FALSE)

plot(unroot(mol.tree),type="unrooted",cex=0.6,
    use.edge.length=FALSE,lab4ut="axial",
    no.margin=TRUE, show.tip.label = FALSE, edge.color = "red")

plot(mol.tree, type = "radial",  edge.color = "darkgreen", tip.color = "black", font = 1, show.tip.label = FALSE)

plot(mol.tree, type = "fan",  edge.color = "blue", tip.color = "black", font = 1, show.tip.label = FALSE)

plot(mol.tree, type = "cladogram",  edge.color = "orange", tip.color = "black", font = 1, show.tip.label = FALSE)

the Ntip() function will tell you how many different species (or tips) are represented in your tree
Ntip(mol.tree) ##66 species in this tree
## [1] 66
To generate a list of species names, use mol.tree$tip.label
##all the species names 
mol.tree$tip.label
##  [1] "A_rothschildi"             "A_sichaeus"               
##  [3] "A_boreas_boreas"           "A_saundersii_saundersii"  
##  [5] "A_attica_attica"           "A_leuceroides"            
##  [7] "A_zina_irma"               "A_zina_zina"              
##  [9] "A_jordani"                 "A_justina_valentina"      
## [11] "A_olynthia"                "A_boeotia_boeotia"        
## [13] "A_malea_aethalia"          "A_delinita"               
## [15] "A_heraclea"                "A_naxia"                  
## [17] "A_capucinus_capucinus"     "A_mesentina"              
## [19] "A_phylaca_pseudaethalia"   "A_lycorias_lara"          
## [21] "A_lycorias_spruceana"      "A_erotia_erotia"          
## [23] "A_pollina"                 "A_irmina_tumida"          
## [25] "A_leucophthalma_irminella" "A_cocala_cocala"          
## [27] "L_lorquini"                "L_weidemeyerii"           
## [29] "L_arthemis_arizonensis"    "L_arthemis_arthemis"      
## [31] "L_archippus_floridanensis" "L_populi"                 
## [33] "L_sydyi"                   "L_amphyssa"               
## [35] "L_moltrechti"              "L_doerriesi"              
## [37] "L_helmanni"                "L_camilla"                
## [39] "L_homeyeri"                "L_glorifica"              
## [41] "L_reducta"                 "A_donysa_donysa"          
## [43] "A_pithys"                  "A_alala_negra"            
## [45] "A_tracta"                  "A_corcyra_aretina"        
## [47] "Athyma_selenophora"        "Pandita_sinope"           
## [49] "Sumalia_daraxa"            "Parasarpa_zayla"          
## [51] "Moduza_urdaneta"           "A_seriphia_aquillia"      
## [53] "A_seriphia_therasia"       "A_serpa_celerio"          
## [55] "A_melona_leucocoma"        "A_cytherea_cytherea"      
## [57] "A_cytherea_daguana"        "A_salmoneus_colada"       
## [59] "A_iphicleola_thessalita"   "A_iphiclus_iphiclus"      
## [61] "A_thessalia_thessalia"     "A_epione_agilla"          
## [63] "A_ethelda_ethelda"         "A_basiloides"             
## [65] "A_plesaure_phliassa"       "A_shuara"

1.8.1 CHALLENGE 4

A great way to draw attention to a specific species in a tree plot is to add an arrow pointing to a specific species. Now that you can see all the species names, pick a species and use add.arrow() to draw an arrow to it!

1.8.1.1 SOLUTION 4

##add an arrow on a specific branch tip 
plot(mol.tree)
add.arrow(mol.tree,tip="A_saundersii_saundersii",arrl=4)

With Nexus trees, you can also create node labels. This is especially useful if you want to highlight a specific node.
plot(mol.tree,no.margin=TRUE,edge.width=2)
nodelabels(text=1:mol.tree$Nnode,node=1:mol.tree$Nnode+Ntip(mol.tree))

plot(mol.tree, "fan", main="Molecular Tree", edge.width = 3, edge.color = "pink", show.tip.label = FALSE)
arc.cladelabels(tree=mol.tree, "A. rothschildi", node = 4, cex=1, orientation="curved")

1.9 Morphological Tree

In order to compare molecular and morphological trees of the Adelpha lineage, we needed to build a tree in R from a phylogeny constructed based on morphological data, like the one reported in Keith Wilmott’s paper. However, because this tree was created in 2003 and is therefore not in a format that is easily compatible with current R software, we ended up transcribing the tree into R in Newick format based on the phylogenetic relationships presented in Wilmott’s Figure 8. This is great for visualizing the tree but there are some drawbacks to this method. Having a tree without character matrix data makes it harder to run statistical tests- which is why molecular data is helpful for phylogenetic analysis as it provides that data.
string <- "((A_alala_negra, A_corcyra_aretina, A_tracta, A_pithys, A_donysa_donysa), ((A_serpacelerio, A_seriphiaaquillia, A_seriphiatherasia), (A_melonaleucocoma, A_salmoneus_colada, A_cytherea_cytherea, A_cythereadaguana, A_epioneagilla, A_etheldaethelda, A_thessaliathessalia, A_iphicleolathessalita, A_iphiclusiphiclus, A_shuara, A_plesaurephliassa, A_basiloides, A_atticaattica, A_leucerioides, A_saundersiisaundersii, A_boreasboreas, A_rothschildi, A_sichaeus, A_cocalacocala, A_leucophthalmairminella, A_irminatumida, A_pollina, A_lycoriaslara, A_lycoriasspruceana, A_erotia_erotia, A_mesentina, A_phylaca_pseudaethalia, A_capucinus_capucinus, A_heraclea_heraclea, A_naxia_naxia, A_justina_valentina, A_olynthia, A_jordani, A_zinazina, A_zinairma, A_delinita_delinita, A_boeotia_boeotia, A_maleaaethalia)));"

morph.tree<-read.tree(text=string)
plot(morph.tree,no.margin=TRUE,edge.width=2)

par(mfrow = c(3, 3))
plot(morph.tree, type = "phylogram",  edge.color = "purple", font = 1, show.tip.label = FALSE)

plot(unroot(morph.tree),type="unrooted",cex=0.6,
    use.edge.length=FALSE,lab4ut="axial",
    no.margin=TRUE, show.tip.label = FALSE, edge.color = "red")

plot(morph.tree, type = "radial",  edge.color = "darkgreen", tip.color = "black", font = 1, show.tip.label = FALSE)

plot(morph.tree, type = "fan",  edge.color = "blue", tip.color = "black", font = 1, show.tip.label = FALSE)

plot(morph.tree, type = "cladogram",  edge.color = "orange", tip.color = "black", font = 1, show.tip.label = FALSE)

1.10 Modifying trees with ggtree

You can also plot these trees using the ggplot2 and ggtree packages
#Morphological tree plot
library(ggtree)
p <- ggtree(morph.tree, color="purple", size=0.5)
p <- p + geom_tiplab(size=2)
p

#Molecular tree plot 
g <- ggtree(mol.tree, color="darkgreen", size=0.5)
g <- g + geom_tiplab(size=2)
g

1.10.1 CHALLENGE 5

Now we want to see these trees side by side, we’ll go over direct comparison later in the module but for now, use the grid.arrange() command in gridExtra to place these trees side by side. Notice the major differences between the two, how might these compare?

1.10.1.1 SOLUTION 5

library("gridExtra")
grid.arrange(
    ggtree(mol.tree, color="purple") + theme_tree("hotpink"),
    ggtree(morph.tree, color="hotpink") + theme_tree("purple"),
    ncol=2)

The first thing you probably notice is the difference in complexity, the molecular tree has a more detailed relationship between species than the morphological tree.

1.11 Comparison

Now we will get into comparing our morphological tree and our molecular tree. First we will use the function all.equal from the APE package. This is a good place to start when comparing trees because it will tell you the basics: are these trees the same?
all.equal(mol.tree, morph.tree, use.edge.length = TRUE, use.tip.label = TRUE)
## [1] FALSE

It’s a good exploratory function, but after we learn that they are not the same, we have to dig deeper.

1.11.1 CHALLENGE 6

Next we’ll try comparing phylogenetic tree topographies using the comparePhylo() function from the APE package. Check the output to see if these trees are ultrametric and compare the differences in nodes. How many nodes does the morphological tree have? How many does the molecular have? Going one step further, how many of the clades in the molecular tree are not represented in the morphological tree.

1.11.1.1 SOLUTION 6

comparePhylo(morph.tree, mol.tree, plot = FALSE, force.rooted = TRUE, use.edge.length = FALSE)
## => Comparing morph.tree with mol.tree.
## Trees have different numbers of tips: 46 and 66.
## Tips in morph.tree not in mol.tree : A_serpacelerio, A_seriphiaaquillia, A_seriphiatherasia, A_melonaleucocoma, A_cythereadaguana, A_epioneagilla, A_etheldaethelda, A_thessaliathessalia, A_iphicleolathessalita, A_iphiclusiphiclus, A_plesaurephliassa, A_atticaattica, A_leucerioides, A_saundersiisaundersii, A_boreasboreas, A_cocalacocala, A_leucophthalmairminella, A_irminatumida, A_lycoriaslara, A_lycoriasspruceana, A_heraclea_heraclea, A_naxia_naxia, A_zinazina, A_zinairma, A_delinita_delinita, A_maleaaethalia.
## Tips in mol.tree not in morph.tree : A_boreas_boreas, A_saundersii_saundersii, A_attica_attica, A_leuceroides, A_zina_irma, A_zina_zina, A_malea_aethalia, A_delinita, A_heraclea, A_naxia, A_lycorias_lara, A_lycorias_spruceana, A_irmina_tumida, A_leucophthalma_irminella, A_cocala_cocala, L_lorquini, L_weidemeyerii, L_arthemis_arizonensis, L_arthemis_arthemis, L_archippus_floridanensis, L_populi, L_sydyi, L_amphyssa, L_moltrechti, L_doerriesi, L_helmanni, L_camilla, L_homeyeri, L_glorifica, L_reducta, Athyma_selenophora, Pandita_sinope, Sumalia_daraxa, Parasarpa_zayla, Moduza_urdaneta, A_seriphia_aquillia, A_seriphia_therasia, A_serpa_celerio, A_melona_leucocoma, A_cytherea_daguana, A_iphicleola_thessalita, A_iphiclus_iphiclus, A_thessalia_thessalia, A_epione_agilla, A_ethelda_ethelda, A_plesaure_phliassa.
## Trees have different numbers of nodes: 5 and 64.
## morph.tree is rooted, mol.tree is unrooted.
## Both trees are not ultrametric.
## 4 clades in morph.tree not in mol.tree.
## 63 clades in mol.tree not in morph.tree.

This tells you that both the trees are non-ultrametric. Essentially, both trees have structure and are not one big polytomy, and branch lengths differ, although one phylogeny is more structured than the other. Can you tell which one it is?

It also tells us that they have different numbers of tips (46 and 66) which makes sense because they aren’t looking at the same number of individuals. Naturally because of this, these trees will inherently be different from each other structurally. The morphological tree has fewer nodes not only because it’s hypothesizing the relationships among fewer individuals, but also because there is more uncertainty. This tree is comb like, and we can interpret this as an unresolved phylogeny.

When both trees are non-ultrametric, the number of analyses that we can perform is limited. Let’s explore what more we can learn about these trees.

Check to see if they are binary. According to the APE package, “The test differs whether the tree is rooted or not. An urooted tree is considered binary if all its nodes are of degree three (i.e., three edges connect to each node). A rooted tree is considered binary if all nodes (including the root node) have exactly two descendant nodes, so that they are of degree three expect the root which is of degree 2.” Our molecular tree is unrooted and our morphological tree is rooted, so keep that in mind.
is.binary(mol.tree)
## [1] TRUE
is.binary(morph.tree)
## [1] FALSE
Now, let’s try to plot these trees and view the characters that they have in common. Let’s use the package dendextend. You will get some errors about the mol.tree being non-ultrametric and the morph.tree having a branch length of zero, but keep going through to plot(prunedplot).
library(ape)
library(phytools)
library(dendextend)
## 
## ---------------------
## Welcome to dendextend version 1.15.2
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------
## 
## Attaching package: 'dendextend'
## The following object is masked from 'package:ggtree':
## 
##     rotate
## The following object is masked from 'package:geiger':
## 
##     is.phylo
## The following object is masked from 'package:phytools':
## 
##     untangle
## The following objects are masked from 'package:ape':
## 
##     ladderize, rotate
## The following object is masked from 'package:stats':
## 
##     cutree
library(phylogram)
## 
## Attaching package: 'phylogram'
## The following object is masked from 'package:dendextend':
## 
##     prune
tree1 <- mol.tree
tree1 <- midpoint.root(tree1)
tree2 <- morph.tree
tree1 <- compute.brlen(tree1)
tree2 <- compute.brlen(tree2)
tree1<- as.dendrogram(tree1)
tree2<- as.dendrogram(tree2)

dndlist <- dendextend::dendlist(tree1, tree2)
prunedplot <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.

This code will give you a triangle tanglegram, which is another way to visualize the two trees.
prunedplotri <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "t")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.

Next we want to try isolating certain taxa to compare them between the two trees. To do this you have to find the you have to find the number associated for the label you want to highlight. For this example we want to isolate the species A. tracta and then use the command color_labels().
labels(tree1) #A_tracta is 19
##  [1] "L_arthemis_arizonensis"    "L_arthemis_arthemis"      
##  [3] "L_lorquini"                "L_weidemeyerii"           
##  [5] "L_archippus_floridanensis" "L_populi"                 
##  [7] "L_sydyi"                   "L_amphyssa"               
##  [9] "L_moltrechti"              "L_doerriesi"              
## [11] "L_helmanni"                "L_camilla"                
## [13] "L_homeyeri"                "L_glorifica"              
## [15] "L_reducta"                 "A_donysa_donysa"          
## [17] "A_pithys"                  "A_alala_negra"            
## [19] "A_tracta"                  "A_corcyra_aretina"        
## [21] "Athyma_selenophora"        "Pandita_sinope"           
## [23] "Sumalia_daraxa"            "Parasarpa_zayla"          
## [25] "Moduza_urdaneta"           "A_seriphia_aquillia"      
## [27] "A_seriphia_therasia"       "A_serpa_celerio"          
## [29] "A_melona_leucocoma"        "A_rothschildi"            
## [31] "A_sichaeus"                "A_boreas_boreas"          
## [33] "A_saundersii_saundersii"   "A_attica_attica"          
## [35] "A_leuceroides"             "A_zina_irma"              
## [37] "A_zina_zina"               "A_jordani"                
## [39] "A_justina_valentina"       "A_olynthia"               
## [41] "A_boeotia_boeotia"         "A_malea_aethalia"         
## [43] "A_delinita"                "A_heraclea"               
## [45] "A_naxia"                   "A_capucinus_capucinus"    
## [47] "A_mesentina"               "A_phylaca_pseudaethalia"  
## [49] "A_lycorias_lara"           "A_lycorias_spruceana"     
## [51] "A_erotia_erotia"           "A_pollina"                
## [53] "A_irmina_tumida"           "A_leucophthalma_irminella"
## [55] "A_cocala_cocala"           "A_basiloides"             
## [57] "A_plesaure_phliassa"       "A_shuara"                 
## [59] "A_iphicleola_thessalita"   "A_iphiclus_iphiclus"      
## [61] "A_thessalia_thessalia"     "A_epione_agilla"          
## [63] "A_ethelda_ethelda"         "A_cytherea_cytherea"      
## [65] "A_cytherea_daguana"        "A_salmoneus_colada"
tree1 <- color_labels(tree1, col = "red", labels = labels(tree1)[c(19)])
labels(tree2)
##  [1] "A_alala_negra"            "A_corcyra_aretina"       
##  [3] "A_tracta"                 "A_pithys"                
##  [5] "A_donysa_donysa"          "A_serpacelerio"          
##  [7] "A_seriphiaaquillia"       "A_seriphiatherasia"      
##  [9] "A_melonaleucocoma"        "A_salmoneus_colada"      
## [11] "A_cytherea_cytherea"      "A_cythereadaguana"       
## [13] "A_epioneagilla"           "A_etheldaethelda"        
## [15] "A_thessaliathessalia"     "A_iphicleolathessalita"  
## [17] "A_iphiclusiphiclus"       "A_shuara"                
## [19] "A_plesaurephliassa"       "A_basiloides"            
## [21] "A_atticaattica"           "A_leucerioides"          
## [23] "A_saundersiisaundersii"   "A_boreasboreas"          
## [25] "A_rothschildi"            "A_sichaeus"              
## [27] "A_cocalacocala"           "A_leucophthalmairminella"
## [29] "A_irminatumida"           "A_pollina"               
## [31] "A_lycoriaslara"           "A_lycoriasspruceana"     
## [33] "A_erotia_erotia"          "A_mesentina"             
## [35] "A_phylaca_pseudaethalia"  "A_capucinus_capucinus"   
## [37] "A_heraclea_heraclea"      "A_naxia_naxia"           
## [39] "A_justina_valentina"      "A_olynthia"              
## [41] "A_jordani"                "A_zinazina"              
## [43] "A_zinairma"               "A_delinita_delinita"     
## [45] "A_boeotia_boeotia"        "A_maleaaethalia"
tree2 <- color_labels(tree2, col = "red", labels = labels(tree2)[c(3)])
#highlighting certain taxa in each tree manually

dndlist <- dendextend::dendlist(tree1, tree2)
prunedplottaxa <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.

1.11.2 CHALLENGE 7

Now it’s your turn! Try selecting a species to isolate and then identifying its number. You can also isolate multiple species or change the color of the label.

1.11.2.1 SOLUTION 7

tree1 <- color_labels(tree1, col = "hotpink", labels = labels(tree1)[c(38)])
labels(tree2)
##  [1] "A_alala_negra"            "A_corcyra_aretina"       
##  [3] "A_tracta"                 "A_pithys"                
##  [5] "A_donysa_donysa"          "A_serpacelerio"          
##  [7] "A_seriphiaaquillia"       "A_seriphiatherasia"      
##  [9] "A_melonaleucocoma"        "A_salmoneus_colada"      
## [11] "A_cytherea_cytherea"      "A_cythereadaguana"       
## [13] "A_epioneagilla"           "A_etheldaethelda"        
## [15] "A_thessaliathessalia"     "A_iphicleolathessalita"  
## [17] "A_iphiclusiphiclus"       "A_shuara"                
## [19] "A_plesaurephliassa"       "A_basiloides"            
## [21] "A_atticaattica"           "A_leucerioides"          
## [23] "A_saundersiisaundersii"   "A_boreasboreas"          
## [25] "A_rothschildi"            "A_sichaeus"              
## [27] "A_cocalacocala"           "A_leucophthalmairminella"
## [29] "A_irminatumida"           "A_pollina"               
## [31] "A_lycoriaslara"           "A_lycoriasspruceana"     
## [33] "A_erotia_erotia"          "A_mesentina"             
## [35] "A_phylaca_pseudaethalia"  "A_capucinus_capucinus"   
## [37] "A_heraclea_heraclea"      "A_naxia_naxia"           
## [39] "A_justina_valentina"      "A_olynthia"              
## [41] "A_jordani"                "A_zinazina"              
## [43] "A_zinairma"               "A_delinita_delinita"     
## [45] "A_boeotia_boeotia"        "A_maleaaethalia"
tree2 <- color_labels(tree2, col = "hotpink", labels = labels(tree2)[c(41)])
dndlist <- dendextend::dendlist(tree1, tree2)
prunedplottaxa <- dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 5, lab.cex = 1, lwd = 1, edge.lwd = .1, type = "r")
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.

There are also arguments within tanglegram() to highlight species and branches. k_branches and k_labels will assign colors to the number of branches and clades you specify respectively. We used the k_labels arguments to differentiate between the lowland clade in blue and the montane in red.
prunedplotcolor <- dendextend::tanglegram(dndlist, faster = TRUE, margin_inner = 7, lab.cex = .88, lwd =1, edge.lwd = 0.5, type = "r", main_left = "Molecular", main_right = "Morphological", highlight_branches_col = TRUE, highlight_branches_lwd = TRUE, k_branches = 5, k_labels=2, center=TRUE) 
## Warning in intersect_trees(dend1, dend2, warn = TRUE): The labels in both tree
## had different values - trees were pruned.
## Warning in cutree.dendrogram(dend, k = k, h = h, order_clusters_as_data =
## FALSE): It is impossible to produce a one-to-one cut for the k/h you specidied.
## 0's have been introduced. Note that this result would be different from R's
## default cutree output.

#color coded tanglegram
We can use the untangle function to improve the quality of alignment between two trees. When both trees are arranged in their most untangled state, we can then assess the quality of the alignment between these two trees using “entanglement”. Entanglement is a measure of similarity between two phylogenetic trees and ranges from 0 to 1. An entanglement score of 0 would mean that the trees on either side of the tanglegram are identical. First let’s find the entanglement score for the tanglegram.
x <- prunedplot
x %>% plot(main = paste("entanglement =", round(entanglement(x), 2)))

The entanglement score of our trees is 0.44

Next we are going to work with the command untangle() in dendextend. This useful command will rearrange the tree into the simplest configuration without changing the phylogenetic relationships.
prunedplotuntangle<-prunedplot %>% untangle(method = "step1side") %>% tanglegram(common_subtrees_color_lines = TRUE,  faster = TRUE, margin_inner = 7, lab.cex = .88, lwd =1, edge.lwd = 0.5, type = "r", main_left = "Molecular", main_right = "Morphological", highlight_branches_col = TRUE, highlight_branches_lwd = TRUE, k_branches = 5, k_labels=2, center=TRUE)
## Warning in cutree.dendrogram(dend, k = k, h = h, order_clusters_as_data =
## FALSE): It is impossible to produce a one-to-one cut for the k/h you specidied.
## 0's have been introduced. Note that this result would be different from R's
## default cutree output.

#untangled color coded tanglegram

Now we can clearly see the associations between these trees and the individuals and how they were rearranged. You can go branch by branch and follow the shaded lines to view where they are on the opposite tree.

1.11.3 CHALLENGE 8 (aka an entanglement score for an untangled tanglegram)

As we discussed above, untangling your tanglegram will help remove any patterns in the tanglegram and the entanglement analysis that are not due to phylogentic relationships. Try using the entanglement function to get an entanglement score for the untangled tanglegram (say that 5 times fast).

1.11.3.1 SOLUTION 8

y <- prunedplotuntangle
y %>% plot(main = paste("entanglement =", round(entanglement(y), 2)))

The entanglement score of our untangled trees is 0.01 which is lower than the entanglement score of the original tanglegram. Generally an entanglement from an untangled tanglegram will be more representative of the true phylogenetic differences between two trees.

2 Concluding Remarks

For those interested in comparing between morphological and molecular trees using a statistical analysis, there will be several things that may hinder you. Your tree must have branch lengths greater than zero, which may prove difficult if your morphological tree lacks structure like Wilmott 2003. Having branch lengths of zero may also be a common problem since morphological trees are largely unresolved and have high amounts of polytomy. Currently, there are no accessible ,reliable ways to statistically compare morphological trees to molecular trees, but biologists are working towards a solution for this. (Lee & Palci, 2015).

This may also be difficult if you are using public data, like from dryad, because for an in-depth analysis, it’s often necessary to have the character matrix of morphological traits that the tree was originally built on, and that information may not be included (Sil et al, 2018).

Also, comparing just one morphological tree to just one molecular tree gives us a sample size of two, which would not yield reliable results. In similar studies measuring congruence of morphological vs molecular trees, the sample size is much higher, over 200 (Pisani, Benton & Wilkinson, 2007).

Our original intention was to investigate how to compare morphological and molecular trees in order to see the differences in efficiency and how data visualization has progressed over the years. However, what we came to realize was that it’s not possible to statistically compare these two kinds of trees without a common factor like branch length. It would be desirable to make this comparison because it would give us a measure of how our scientific techniques have advanced but also give us a more accurate representation of evolution.

3 Works Cited