Introduction

In this document, I outline the reanalyis of Somerville et al. (2016), who examine the evidence for captive management of leproids at Teotihuacan, one of the earliest and largest urban centers in the Americas. Somerville et al. (2016) analysed the carbon and oxygen stable isotopes from the bone apatite of archaeological specimens from two genera of lepoids, Lepus (hares), and Sylvilagus (cottontails). Carbon is used as a proxy for human impact on the diet of these animals: higher δ13C apatite ratios imply an larger component of the diet was made up by human cultivated C4 or CAM plants such as maize. They incorperate the stable carbon isotope analysis with stable oxygen isotope analysis, as δO18 apatite ratios can be used in this context as a proxy for local humidity, providing a measure of control over environmental changes that might have influence baseline signatures of δ13C.

Their dataset (S2_Table.csv) contains the carbon and isotope ratios from five different parts of the city, some of which overlap slightly in period of occupation. The Moon Pyramid, Teopancazco and Oztoyahualco sites are the three that cover the classic period (150BC-550AD) occupation of Teotihuacan, and overlap somewhat in their occupation ranges. Puerta 5 and Cuevas (here Tunels y Cuevas), refer respectively to the epi-classic (600-1150AD) and post-classic (Aztec, 1150-1500AD) periods of occupation at the site. They also analysed a sample of “modern” (1892-1966AD) comparative specimens from the Basin of Mexico (S1_Table.csv), although these were not taken from Teotihuacan valley, they should be fairly representative of the recent agricultural landscape in the area.

As previous research had indicated the neighborhood of Ozotoyahualco might have specialised in raising cottontail rabbits, Somerville et al. (2016) hypothesised that this site would have a higher δ13C than other parts of the city. Similarly, due to previous research on urban provisioning and the declining prevalence of deer in the region during the classic period, they hypothesised that management of small animals such as leproids would become more important over time, and that this should be associated with an increase in δ13C across the sites during Teotihuacan’s classic period occupation. Finally, due to certain biological and behavioural differences between Lepus and Sylvilagus, the authors hypotheses that while both may have been managed by humans, Sylvilagus would be more suitable to captive management in an urban center, and thus would likely have a higher δ13C than Lepus.

Outline of their project and my replication

In their analysis, they perform a variety of descriptive and interpretive analyses, as well as vizualisation of the data. I plan to replicate all of their tests.

  1. Load and clean data to remove outliers and any samples that failed their quality check
  2. Test for statistical differences between genera by calculating simple descriptive statistics and analysis with student’s T-test.
  3. Two, two-factor analysis of variance tests to evaluate the effect of genus and site on δ13C and δ18O as well as potential for interaction effects between genus and site. They also present a summary table of means and standard deviation broken down by site and genus.
  4. After genus was found to have no effect, the analysis focused on site. They performed a one-way ANOVA and a post-hoc Tukey test on each of δ13C and δ18O to assess which sites had an effect on the two variables. One site was found to have a sigificant effect on δ13C, so a Mann-Whitney-U (two sample Wilcox rank sum test) test was employed to compare the means of that one site to the remainder of the sample.
  5. The final step was visualising the data in a scatterplot of the means and standard deviation of each site for the two main variables (δ13C and δ18O) and two boxplots grouped by site. In this step they interpret ideas about how δ13C increased over time at Teotihuacan, and use Pearson’s test for correlation between the two variables to rule out environmental causes for the interpretations about δ13C.

Step 1. Load and clean data

First I run a code to clear the workspace, to avoid too many variables building up in my environment and start with a blank slate each time I run the entire length of code.

rm(list=ls())

Load data

I read in the archaeological data (S1_table). In excel, I deleted any rows of non-table data, i.e. title, and a note, and saved the file (S2_Table) as a CSV in the project directory. Then I read it into R and converted it to a table using dplyr. I trimed white space at beginnings and ends of things where there shouldn’t be any, and kept strings as for now. Initially I used the skip code to remove the headers of the table instead of excel, but I had trouble at some point and simply removed these in the saved file.

require(curl)
## Loading required package: curl
## Warning: package 'curl' was built under R version 3.4.2
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
f <- curl("https://raw.githubusercontent.com/mazattack/data-reanalysis-assignment/master/S2_Table.csv") #This is the archaeological data.
iso_total <- read.csv(f, header = TRUE, stringsAsFactors = FALSE, strip.white=TRUE)
iso_total<-tbl_df(iso_total)
head(iso_total)
## # A tibble: 6 x 16
##   X.U.FEFF.AS.Lab.. Spec.or.Tag..         Site Unit.Project
##               <chr>         <chr>        <chr>        <chr>
## 1           AS-0315    A-1224.001 Moon Pyramid  Structure 2
## 2           AS-0321    A-1955.002 Moon Pyramid  Structure 1
## 3           AS-0322    A-1985.002 Moon Pyramid  Structure 1
## 4           AS-0323    A-1592.001 Moon Pyramid  Structure 1
## 5           AS-0324    A-1917.001 Moon Pyramid  Structure 1
## 6           AS-0325    A-1955.001 Moon Pyramid  Structure 1
## # ... with 12 more variables: Provenience.Location.A <chr>,
## #   Provenience.Location.B <chr>, Phase <chr>, Genus <chr>, Species <chr>,
## #   Element <chr>, Side <chr>, d13C.apatite.VPDB.... <dbl>,
## #   d18O.apatite.VSMOW.... <dbl>, C.P <dbl>, IR.SF <dbl>, Notes <chr>
str(iso_total)
## Classes 'tbl_df', 'tbl' and 'data.frame':    136 obs. of  16 variables:
##  $ X.U.FEFF.AS.Lab..     : chr  "AS-0315" "AS-0321" "AS-0322" "AS-0323" ...
##  $ Spec.or.Tag..         : chr  "A-1224.001" "A-1955.002" "A-1985.002" "A-1592.001" ...
##  $ Site                  : chr  "Moon Pyramid" "Moon Pyramid" "Moon Pyramid" "Moon Pyramid" ...
##  $ Unit.Project          : chr  "Structure 2" "Structure 1" "Structure 1" "Structure 1" ...
##  $ Provenience.Location.A: chr  "Tunel" "Tunel 3" "Tunel 3" "Tunel 3" ...
##  $ Provenience.Location.B: chr  "LII-LIII" "LXXIX, LXXX" "LXXIX y LXXX" "LVI y LXXIX" ...
##  $ Phase                 : chr  "Patlachique-Tlamimilolpa" "Patlachique-Tlamimilolpa" "Patlachique-Tlamimilolpa" "Patlachique-Tlamimilolpa" ...
##  $ Genus                 : chr  "Lepus" "Sylvilagus" "Sylvilagus" "Sylvilagus" ...
##  $ Species               : chr  "" "" "" "" ...
##  $ Element               : chr  "Femur" "Femur" "Tibia" "Femur" ...
##  $ Side                  : chr  "R" "R" "L" "" ...
##  $ d13C.apatite.VPDB.... : num  -9.1 -8.9 -10.6 -8.8 -5.8 -8.9 -6.4 -9.2 -9.9 -10.3 ...
##  $ d18O.apatite.VSMOW....: num  29.9 29.2 27.3 24.8 27.1 28.5 25.4 26.1 27.6 27.3 ...
##  $ C.P                   : num  NA 0.3 0.21 0.18 0.15 0.2 0.15 0.11 NA 0.2 ...
##  $ IR.SF                 : num  NA 2.8 3.04 3.06 3.17 3.02 3.09 3.09 NA 2.78 ...
##  $ Notes                 : chr  "" "" "" "" ...

I did the same with the modern data.

f <- curl("https://raw.githubusercontent.com/mazattack/data-reanalysis-assignment/master/S1_Table.csv") #This is the archaeological data
modern<-read.csv(f, header=TRUE, sep = ",",strip.white=TRUE, stringsAsFactors = FALSE) 

modern<-tbl_df(modern)
head(modern)
## # A tibble: 6 x 11
##   Lab_num Cat_num                        Species            State
##     <chr>   <int>                          <chr>            <chr>
## 1 AS-0655   50078 Sylvilagus floridanus orizabae Distrito Federal
## 2 AS-0656   50079 Sylvilagus floridanus orizabae Distrito Federal
## 3 AS-0657   51111 Sylvilagus floridanus orizabae Distrito Federal
## 4 AS-0660   55592 Sylvilagus floridanus orizabae          Hidalgo
## 5 AS-0661   55599 Sylvilagus floridanus orizabae          Hidalgo
## 6 AS-0677   78468    Lepus californicus texianus        Queretaro
## # ... with 7 more variables: Location <chr>, Biome <chr>, Ecoregion <chr>,
## #   Elev..m. <int>, Collection.date..yrs.AD. <int>,
## #   X13C.apatite.VPDB <dbl>, X18O.apatite.VSMOW <dbl>
str(modern)
## Classes 'tbl_df', 'tbl' and 'data.frame':    13 obs. of  11 variables:
##  $ Lab_num                 : chr  "AS-0655" "AS-0656" "AS-0657" "AS-0660" ...
##  $ Cat_num                 : int  50078 50079 51111 55592 55599 78468 81463 81528 540914 540914 ...
##  $ Species                 : chr  "Sylvilagus floridanus orizabae" "Sylvilagus floridanus orizabae" "Sylvilagus floridanus orizabae" "Sylvilagus floridanus orizabae" ...
##  $ State                   : chr  "Distrito Federal" "Distrito Federal" "Distrito Federal" "Hidalgo" ...
##  $ Location                : chr  "Tlalpan" "Tlalpan" "Tlalpan" "Tulancingo" ...
##  $ Biome                   : chr  "Tropical and Subtropical Coniferous Forests" "Tropical and Subtropical Coniferous Forests" "Tropical and Subtropical Coniferous Forests" "Deserts and Xeric Shrublands" ...
##  $ Ecoregion               : chr  "Trans-Mexican Volcanic Belt pine-oak forests" "Trans-Mexican Volcanic Belt pine-oak forests" "Trans-Mexican Volcanic Belt pine-oak forests" "Meseta Central matorral" ...
##  $ Elev..m.                : int  2438 2316 2316 2393 2591 1981 1890 1890 2075 2075 ...
##  $ Collection.date..yrs.AD.: int  1892 1892 1892 1893 1893 1896 1896 1896 1966 1966 ...
##  $ X13C.apatite.VPDB       : num  -11.7 -13.6 -12.8 -8.4 -14.3 -10.2 -11.9 -10.8 -14 -12.8 ...
##  $ X18O.apatite.VSMOW      : num  26.7 24.6 24.7 25.5 23.4 27.5 27 26.5 26.1 25.6 ...

Clean data

To make life easier, I changed the names of two key variables because they were long and involved weird characters. First I tried 18O and 13C as the variable names, but it threw me an error, I think by starting with a number.

names(iso_total)[12]<-"C13ap" #I used names(iso_total) to get the position of these for renaming
names(iso_total)[13]<-"O18ap"
names(iso_total)[1]<-"lab_n"
names(modern)[10]<-"modC13"
names(modern)[11]<-"modO18"

The first step in their analysis is to remove samples potentially affected by diagenisis. In this article, they tested a random sample (n=101) of bone for potential post-depositional alteration to the mineral structure. Using FTIR, they tested the carbonate to phospate ratios (C/P) and infrared splitting factor (IR-SF) using FTIR and removed samples which fell outside of the expected ranges of 0.1 and 0.5 for C/P and 2.0 and 4.0 for IR-SF. The samples that they removed are marked in grey in the supplimentary data.

Here, I used the filter function to select only those rows where C/P and IR-SF fell within the expected ranges, and saved this as a new dataset “iso”. I also needed to make sure the rows with NA values were still included, since only a random sample was analysed. I did this using the | notation which functions like “OR”. Since the FTIR analysis returned both C/P and IR-SF values, I only need to include one is.na function here. I also excluded an outlier with an extremely low δO18 that they excluded by converting this number to NA, as the sample’s δ13C value was still incorperated in the analysis.

#to filter out only Lepus and Sylvilagus genus, I first needed to define the column as a factor
iso_total$Genus<-as.factor(iso_total$Genus)
iso<-filter(iso_total, (Genus=="Lepus" | Genus=="Sylvilagus") & between (C.P, 0.1, 0.5)&between (IR.SF, 2.0, 4.0) | (is.na(IR.SF))) #brackets around the Genus filter were important here, otherwise the | worked on everything that came after
iso$O18ap[iso$O18ap==11.9]<-NA #couldn't figure out how to get this to work with dplyr or chaining
iso            
## # A tibble: 121 x 16
##      lab_n Spec.or.Tag..         Site Unit.Project Provenience.Location.A
##      <chr>         <chr>        <chr>        <chr>                  <chr>
##  1 AS-0315    A-1224.001 Moon Pyramid  Structure 2                  Tunel
##  2 AS-0321    A-1955.002 Moon Pyramid  Structure 1                Tunel 3
##  3 AS-0322    A-1985.002 Moon Pyramid  Structure 1                Tunel 3
##  4 AS-0323    A-1592.001 Moon Pyramid  Structure 1                Tunel 3
##  5 AS-0324    A-1917.001 Moon Pyramid  Structure 1                Tunel 3
##  6 AS-0325    A-1955.001 Moon Pyramid  Structure 1                Tunel 3
##  7 AS-0326    A-1962.001 Moon Pyramid  Structure 1                Tunel 3
##  8 AS-0327    A-1962.002 Moon Pyramid  Structure 1                Tunel 3
##  9 AS-0328    A-1962.003 Moon Pyramid  Structure 1                Tunel 3
## 10 AS-0329    A-1962.004 Moon Pyramid  Structure 1                Tunel 3
## # ... with 111 more rows, and 11 more variables:
## #   Provenience.Location.B <chr>, Phase <chr>, Genus <fctr>,
## #   Species <chr>, Element <chr>, Side <chr>, C13ap <dbl>, O18ap <dbl>,
## #   C.P <dbl>, IR.SF <dbl>, Notes <chr>

There were a couple rows at the end of the dataset with no data in them so this excluded those from the dataset too.

iso<-filter(iso, !(is.na(iso$Genus))) #removes empty rows

Step 2. Test for differences between genera

Descriptive statistics

In order to test whether there was a significant difference in the isotopic values between Lepus and Sylvilagus, Somerville et al. (2016) first provide some summary statistics for the dataset which I calculated to ensure I had correctly removed all the outliers from the same.

Descriptive statistics from the article (Somerville et al. 2016:10)

Archaological specimens:

δ13Capatite = -7.8 ± 2.4‰(N = 114, 1 S.D.)

δ18Oapatite = 26.0‰(N = 113, 1 S.D.)

Modern reference specimens:

δ13Capatite = -12.4 ± 1.8‰(N = 13, 1 S.D.)

δ18Oapatite = 26.1‰± 1.4 (N = 13, 1 S.D.)

my_sum <- 
  list("Archaeological samples" = 
        round (c("δC13 n" = sum(!is.na(iso$C13ap)),
              "δC13 mean" =  mean(iso$C13ap, na.rm=TRUE),
              "δC13 sd" =  sd(iso$C13ap, na.rm=TRUE),
              "δO18 n" = sum(!is.na(iso$O18ap)),
              "δO18 mean" =  mean(iso$O18ap, na.rm=TRUE),
              "δO18 sd" =  sd(iso$O18ap, na.rm=TRUE)),2),
       "Modern samples" = 
         round(c("δC13 n" = sum(!is.na(modern$modC13)),
              "δC13 mean" =  mean(modern$modC13, na.rm=TRUE),
              "δC13 sd" =  sd(modern$modC13, na.rm=TRUE),
              "δO18 n" = sum(!is.na(modern$modO18)),
              "δO18 mean" =  mean(modern$modO18, na.rm=TRUE),
              "δO18 sd" =  sd(modern$modO18, na.rm=TRUE)),2)
       )
    
my_sum
## $`Archaeological samples`
##    dC13 n dC13 mean   dC13 sd    dO18 n dO18 mean   dO18 sd 
##    114.00     -7.75      2.34    113.00     26.01      2.19 
## 
## $`Modern samples`
##    dC13 n dC13 mean   dC13 sd    dO18 n dO18 mean   dO18 sd 
##     13.00    -12.39      1.80     13.00     26.08      1.40

My summary statistics closely match those presented in the article.

Note: the first time I tried this it was not a match and so I realised I needed to keep the na values in the filter above. The second time it was also not correct so I added two lines to count the number of values in the two columns of interest to see if the sample sizes match. I noted the single value from delta O18 to remove but I still have one extra row in there than their sample number. I checked on the excel, and there is no evidence of this extra row that they excluded. I took a guess that this may have been a species listed in the spreadsheet as Lepus? and removal of this specimen now produces averages that match the data presented in the article.

T-tests

They then calculate student’s T-tests to evaluate whether there is a significant difference in the means of the δ13C and δ18O between Lepus and Sylvilagus.

First I created a short hand name for each variable group

LC<-iso$C13ap[iso$Genus=="Lepus"]
SC<-iso$C13ap[iso$Genus=="Sylvilagus"]
LO<-iso$O18ap[iso$Genus=="Lepus"]
SO<-iso$O18ap[iso$Genus=="Sylvilagus"]

Carbon apatite two sample T-test

t.test(LC, SC, var.equal =TRUE, alternative = "two.sided")
## 
##  Two Sample t-test
## 
## data:  LC and SC
## t = 0.5185, df = 112, p-value = 0.6051
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7374447  1.2602033
## sample estimates:
## mean of x mean of y 
## -7.558621 -7.820000

Oxygen apatite two sample T-test

t.test(SO, LO, var.equal =TRUE, alternative = "two.sided")
## 
##  Two Sample t-test
## 
## data:  SO and LO
## t = 0.83133, df = 111, p-value = 0.4076
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.542769  1.327334
## sample estimates:
## mean of x mean of y 
##  26.10952  25.71724

Summary of my results

Carbon apatite t-sample t-test comparing Lepus and Sylvilagus: t = 0.5185, df = 112, p-value = 0.6051

Oxygen apatite t-sample t-test comparing Lepus and Sylvilagus: t = -0.83133, df = 111, p-value = 0.4076

Test values reported in Somerville et al. (2016:10)

δ13Capatite values (t[110] = -0.806, p = 0.422)

δ18Oapatite values (t[110] = 0.428, p = 0.669)

Both of these tests returned different test statistics, degrees of freedom and p-values to those reported by Somerville et al. (2016). I tried multiple variations of the t-test in R, from Welsh t-test and one sided t-tests to one sided Welsh t-tests and none of the tests produced a similar result to those reported by Somerville. It later became clear there was an error in either their reported data or supplimentary data - see step 3 for explanation. That said, the outcome of all tests is the same, there is no significant difference in the means of the samples.

Test for normality

While they did not report any tests for normality, I also decided to do a cursory check of the suitability of students T-test to these data. I check that the data fits a normal distribution by viewing their histogram, qnorm and qqline plots as well as performing the shapiro test for normality. Finally, I also test for equal variance between the samples.

δ13C

require(ggplot2)
## Loading required package: ggplot2
a_hist<-ggplot(iso, aes(C13ap))
a_hist+geom_histogram(bins= 15)+ facet_wrap(~Genus, scales="free") #I couldn't figure out how to get rid of the blank chart at the start
## Warning: Removed 7 rows containing non-finite values (stat_bin).

qqnorm(LC,xlab="Theoretical",main="Lepus")
qqline(LC)

qqnorm(SC,xlab="Theoretical",main="Sylvilagus")
qqline(SC)

shapiro.test(LC) #Lepus
## 
##  Shapiro-Wilk normality test
## 
## data:  LC
## W = 0.94867, p-value = 0.169
shapiro.test(SC) #Sylvilagus
## 
##  Shapiro-Wilk normality test
## 
## data:  SC
## W = 0.98596, p-value = 0.4885
var.test(LC, SC) #test of equal variance
## 
##  F test to compare two variances
## 
## data:  LC and SC
## F = 1.6153, num df = 28, denom df = 84, p-value = 0.09787
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.9160149 3.1318232
## sample estimates:
## ratio of variances 
##           1.615282

In these tests, Sylvilagus looks approximately normal while Lepus has a more noticeable skew. However both pass the Shapiro-Wilk test and have equal variance (although would have failed this test if alpha were 0.1) so are likely normal enough to trust the t-test results.

δ18O

require(ggplot2)
a_hist<-ggplot(iso, aes(O18ap))
a_hist+geom_histogram(bins=15)+ facet_wrap(~Genus, scales="free")
## Warning: Removed 8 rows containing non-finite values (stat_bin).

qqnorm(LO,xlab="Theoretical",main="Lepus")
qqline(LO)

qqnorm(SO,xlab="Theoretical",main="Sylvilagus")
qqline(SO)

shapiro.test(LO) #Lepus
## 
##  Shapiro-Wilk normality test
## 
## data:  LO
## W = 0.98809, p-value = 0.9802
shapiro.test(SO) #Sylvilagus
## 
##  Shapiro-Wilk normality test
## 
## data:  SO
## W = 0.97626, p-value = 0.1228
var.test(LO, SO)
## 
##  F test to compare two variances
## 
## data:  LO and SO
## F = 0.8248, num df = 28, denom df = 83, p-value = 0.5763
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.4671707 1.6002060
## sample estimates:
## ratio of variances 
##          0.8247966

Here, Lepus fits the normal distrbution well, while Sylvilagus is divergent. Looking at the histgram again, it could potentitally be bimodal. If this were my data, I would explore that possibility further as it could potentially indicate sylvilagus a group of rabbits with different environmental backgrounds. However, both past the Shapiro-Wilk test and appear to have equal variances so the t-test should be ok.

Step 3: Test for effects of site and genus

The article uses two-factor analysis of variance tests to evaluate the effect of genus and site on δ13C and δ18O as well as potential for interaction effects between genus and site. They also present a summary table of means and standard deviation broken down by site and genus.

Summary table:

I created the summary table to compare to their Table 2

First I factorize Genus and define levels of the modern dataset in order to merge with the archaeological dataset

Genus<- as.factor(modern$Species) 
modern<-cbind(modern, Genus)
levels(modern$Genus)[levels(modern$Genus)=="Lepus californicus texianus"]<-"Lepus"
levels(modern$Genus)[levels(modern$Genus)=="Sylvilagus floridanus"]<-"Sylvilagus"
levels(modern$Genus)[levels(modern$Genus)=="Sylvilagus floridanus orizabae"]<-"Sylvilagus"
levels(modern$Genus)    
## [1] "Lepus"      "Sylvilagus"

Here I code the merge, selecting only the necessary columns

new<-select(iso, lab_n, Site, Phase, Genus, C13ap, O18ap)
new2<-select(modern, lab_n=Lab_num, Site=State, Phase=Collection.date..yrs.AD., Genus, C13ap=modC13, O18ap=modO18)%>%mutate(Site="modern") #Giving the modern dataset a "site" variable
iso2<-rbind(filter(new,(!(is.na(C13ap)))), new2) #merging with rbind
iso2
## # A tibble: 127 x 6
##      lab_n         Site                    Phase      Genus C13ap O18ap
##      <chr>        <chr>                    <chr>     <fctr> <dbl> <dbl>
##  1 AS-0315 Moon Pyramid Patlachique-Tlamimilolpa      Lepus  -9.1  29.9
##  2 AS-0321 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus  -8.9  29.2
##  3 AS-0322 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus -10.6  27.3
##  4 AS-0323 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus  -8.8  24.8
##  5 AS-0324 Moon Pyramid Patlachique-Tlamimilolpa      Lepus  -5.8  27.1
##  6 AS-0325 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus  -8.9  28.5
##  7 AS-0326 Moon Pyramid Patlachique-Tlamimilolpa      Lepus  -6.4  25.4
##  8 AS-0327 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus  -9.2  26.1
##  9 AS-0328 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus  -9.9  27.6
## 10 AS-0329 Moon Pyramid Patlachique-Tlamimilolpa Sylvilagus -10.3  27.3
## # ... with 117 more rows

Finally I put together the summary table. I decided not to also include the subtotal row in this table, as that would be needlessly complicated and is not something I would typically use R for.

iso2$Genus <-as.factor(iso2$Genus)
iso2_g <- group_by(iso2, Site, Genus)
sum_tot<- summarize(iso2_g, length(C13ap), mean(C13ap, na.rm=T), sd(C13ap, na.rm=T), length(O18ap), mean(O18ap, na.rm=T), sd(O18ap, na.rm=T))

tot_tot<-iso2%>%
  group_by(Genus)%>%
  summarize(length(C13ap), mean(C13ap, na.rm=T), sd(C13ap, na.rm=T), length(O18ap), mean(O18ap, na.rm=T), sd(O18ap, na.rm=T))
tot_tot<-mutate(tot_tot, Site="Total")
tot<-rbind.data.frame(sum_tot, tot_tot)
require(knitr)
## Loading required package: knitr
kable(tot)#using knitr to ensure the entire table is visible in the html output
Site Genus length(C13ap) mean(C13ap, na.rm = T) sd(C13ap, na.rm = T) length(O18ap) mean(O18ap, na.rm = T) sd(O18ap, na.rm = T)
modern Lepus 1 -10.200000 NA 1 27.50000 NA
modern Sylvilagus 12 -12.575000 1.7467502 12 25.95833 1.3976983
Moon Pyramid Lepus 16 -8.406250 3.1501257 16 25.14375 2.0886898
Moon Pyramid Sylvilagus 34 -8.658824 2.0647280 34 26.76970 1.9265391
Oztoyahualco Lepus 3 -5.000000 1.4177447 3 27.03333 2.3501773
Oztoyahualco Sylvilagus 11 -6.218182 2.3029625 11 24.94545 2.3526001
Puerta 5 Lepus 2 -6.150000 0.7778175 2 24.80000 1.8384776
Puerta 5 Sylvilagus 19 -7.278947 2.0868847 19 24.53684 1.5699294
Teopancazco Lepus 4 -6.875000 2.5342652 4 26.97500 0.9105859
Teopancazco Sylvilagus 15 -6.953333 1.1268583 15 27.56667 1.5723807
Túneles y Cuevas Lepus 4 -7.475000 1.3720423 4 26.22500 2.1700614
Túneles y Cuevas Sylvilagus 6 -9.883333 1.7679555 6 25.95000 3.3303153
Total Lepus 30 -7.646667 2.7676373 30 25.77667 2.0261537
Total Sylvilagus 97 -8.408247 2.6444781 97 26.09062 2.1485897

Note the differences across these summary statistics compared to the table provided in the article (Table 2).

This reflects a difference in the data from the article compared to what they provided in the supplimentary data. I checked this in excel to confirm. It seems a few cases were moved between contexts and genus variables, so while this difference is not enough to substantially throw off my results, most of the tests do not match up exactly.

Two-way ANOVA

The article employed two, two-factor analysis of variance tests to evaluate the effect of genus and site on δ13C and δ18O as well as potential for interaction effects between genus and site. I tested the type II anova, which would be better for the unbalanced design, however they gave almost identical p-values so I continued with type I for ease here.

carbon<-lm(iso$C13ap~iso$Genus*iso$Site)
anova(carbon)#not sure why but moon pyramid is missing from all these ANOVA, and then we had that class and now I know.
## Analysis of Variance Table
## 
## Response: iso$C13ap
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## iso$Genus            1   1.48  1.4773  0.3175 0.574334    
## iso$Site             4 120.18 30.0446  6.4571 0.000111 ***
## iso$Genus:iso$Site   4  11.34  2.8356  0.6094 0.656749    
## Residuals          104 483.91  4.6529                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(carbon$residuals)# I also decided to check out the residuals

oxygen<-lm(iso$O18ap~iso$Genus*iso$Site)
anova(oxygen)
## Analysis of Variance Table
## 
## Response: iso$O18ap
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## iso$Genus            1   3.32  3.3174  0.8436 0.360510    
## iso$Site             4  96.28 24.0694  6.1209 0.000185 ***
## iso$Genus:iso$Site   4  31.51  7.8764  2.0030 0.099594 .  
## Residuals          103 405.03  3.9323                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(oxygen$residuals)# I also decided to check out the residuals

Somerville et al’s (2016) two-way ANOVA results:

“For δ13C apatite, the main effect for genus was not significant (F[1,102] = 3.361), p = 0.070), but the main effect of site context was significant (F[4,102] = 5.365, p = 0.001). Importantly, the interaction between genus and site was not significant for δ13Capatite (F[4,102] = 0.590, p<0.670) or for δ18Oapatite (F[4,102] = 0.834, p<0.507).”

While the values do not match exactly due to the error in the dataset, my two way Anova summaries match the conclusions they derived in the article. That Genus had no significant effect on either δ13C or δO18 apatite ratios. However there is a significant effect of Site location on these variables. Also, the residuals show no clear patterns that would indicate another major influencing variable on these data.

Step 4. Identifying the site factors effecting the isotopic signature

After genus was found to have no effect, the analysis focused on site. Somerville et al. (2016) performed a one-way ANOVA and a post-hoc Tukey test on each of δ13C and δ18O to assess which sites had an effect on the two variables. One site was found to have a sigificant effect on δ13C, so a Mann-Whitney-U (two sample Wilcox rank sum test) test was employed to compare the means of that one site to the remainder of the sample.

One way ANOVA tests and post-hoc Tukey test

δ13C

carbon2<-lm(iso$C13ap~iso$Site) 
m <- aov(iso$C13ap~iso$Site)
summary(m)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## iso$Site      4  112.6  28.140   6.082 0.000187 ***
## Residuals   109  504.3   4.627                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 7 observations deleted due to missingness
posthoc <- TukeyHSD(m, "iso$Site", conf.level = 0.95)
posthoc 
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = iso$C13ap ~ iso$Site)
## 
## $`iso$Site`
##                                     diff         lwr        upr     p adj
## Oztoyahualco-Moon Pyramid      2.6208571  0.81662427  4.4250900 0.0009681
## Puerta 5-Moon Pyramid          1.4065714 -0.14505136  2.9581942 0.0948394
## Teopancazco-Moon Pyramid       1.6411579  0.03305302  3.2492628 0.0430481
## Túneles y Cuevas-Moon Pyramid -0.3420000 -2.40900843  1.7250084 0.9907453
## Puerta 5-Oztoyahualco         -1.2142857 -3.27307539  0.8445040 0.4778465
## Teopancazco-Oztoyahualco      -0.9796992 -3.08138499  1.1219865 0.6960159
## Túneles y Cuevas-Oztoyahualco -2.9628571 -5.43340476 -0.4923095 0.0103261
## Teopancazco-Puerta 5           0.2345865 -1.65468851  2.1238614 0.9969330
## Túneles y Cuevas-Puerta 5     -1.7485714 -4.04114259  0.5439997 0.2209070
## Túneles y Cuevas-Teopancazco  -1.9831579 -4.31432724  0.3480115 0.1344336

The one way ANOVA test for δ13C also identified a significant effect of site on δ13C and the post-hoc Tukey test highlights significant differences in the δ13C between Oztoyahualco-Moon Pyramid, Teopancazco-Moon Pyramid and Túneles y Cuevas-Oztoyahualco. This mirrors Somerville et al.’s (2016) findings and implys that leproids at Oztoyahualco had a higher levels of C4 plants in the diet. This supports the theory that residents at Oztoyahualco may have been raising leproids.

Somerville et al.’s (2016:10) One-way ANOVA and Tukey results: “Significant differences were found in δ13Capatite (F[4, 107] = 5.8, p<0.001). Post-hoc Tukey tests indicate that Oztoyahualco leporids had significantly higher δ13Capatite values than those of the Moon Pyramid and the Cuevas”

δ18O

m <- aov(iso$O18ap~iso$Site)
summary(m)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## iso$Site      4   90.9  22.731   5.514 0.000445 ***
## Residuals   108  445.2   4.122                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 8 observations deleted due to missingness
posthoc <- TukeyHSD(m, "iso$Site", conf.level = 0.95)
posthoc  
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = iso$O18ap ~ iso$Site)
## 
## $`iso$Site`
##                                     diff         lwr        upr     p adj
## Oztoyahualco-Moon Pyramid     -0.8459184 -2.55296779  0.8611311 0.6450024
## Puerta 5-Moon Pyramid         -1.6768707 -3.14606497 -0.2076765 0.0168045
## Teopancazco-Moon Pyramid       1.2033298 -0.31903070  2.7256902 0.1902292
## Túneles y Cuevas-Moon Pyramid -0.1787755 -2.13341129  1.7758603 0.9990725
## Puerta 5-Oztoyahualco         -0.8309524 -2.77451365  1.1126089 0.7592704
## Teopancazco-Oztoyahualco       2.0492481  0.06519165  4.0333046 0.0393030
## Túneles y Cuevas-Oztoyahualco  0.6671429 -1.66513066  2.9994164 0.9319595
## Teopancazco-Puerta 5           2.8802005  1.09666637  4.6637346 0.0001784
## Túneles y Cuevas-Puerta 5      1.4980952 -0.66616299  3.6623535 0.3128666
## Túneles y Cuevas-Teopancazco  -1.3821053 -3.58280138  0.8185909 0.4126923

The one way ANOVA test for δ18O also identified a significant effect of site on δ18O and the post-hoc Tukey test highlights significant differences in the δ18O between Puerta 5-Moon Pyramid, Teopancazco-Oztoyahualco and Teopancazco-Puerta 5. This mirrors Somerville et al.’s (2016) findings.

Somerville et al.’s (2016:11) One-way ANOVA and Tukey results: “Significant differences were also found in δ18Oapatite means between site contexts (F[5, 120] = 4.704, p<0.001), but no obvious temporal trends were observed (Fig 6). Post-hoc Tukey tests revealed significant differences in δ18O apatite between Oztoyahualco and Teopancazco (p = 0.046), between Puerta 5 and Teopancazco (p<0.001), and between Puerta 5 and the Moon Pyramid (p = 0.008).”

Mann-Whitney-U test

Finally, to confirm that the mean δ13C samples from Oztoyahualco are significantly different from the remainder of the site, they performed a Mann-Whitney-U test.

group1<-(iso2) %>%
  filter(Site!="modern", Site!="Oztoyahualco") %>%
  select(C13ap)
group2<-(iso2) %>%
  filter(Site=="Oztoyahualco") %>%
  select(C13ap)
group1<-as.numeric(unlist(group1)) #there was some weird stuff going on, but unisting it worked
group2<-as.numeric(unlist(group2))
wilcox.test(group1, group2)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  group1 and group2
## W = 339, p-value = 0.001853
## alternative hypothesis: true location shift is not equal to 0

As they expected, mean δ13C is siginifancly higher than other areas at Teotihuacan which supports the theory that residents at Oztoyahualco may have been raising leproids.

Somerville et al.’s (2016:11) Mann-Whitney-U test for δ13C apatite: (U = 329, p = 0.001)

group1<-(iso2) %>%
  filter(Site!="modern", Site!="Oztoyahualco") %>%
  select(O18ap)
group2<-(iso2) %>%
  filter(Site=="Oztoyahualco") %>%
  select(O18ap)
group1<-as.numeric(unlist(group1))
group2<-as.numeric(unlist(group2))
wilcox.test(group1, group2)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  group1 and group2
## W = 782.5, p-value = 0.4379
## alternative hypothesis: true location shift is not equal to 0

No significant differences were identified for the δ18O between Oztoyahualco and the remainder of the site. They do not interpret this result, but I imagine it supports the conclusion they make later which is that differences in δ13C cannot be attributed to environmental change as measured by δ18O values. This result closely matches their test.

Somerville et al.’s (2016:11) Mann-Whitney-U test for δ18O apatite:(U = 636, p = 0.434).

Step 5: Data visualisation

The final step was visualising the data in a scatterplot of the means and standard deviation of each site for the two main variables (δ13C and δ18O) and two boxplots grouped by site. In this step they interpret ideas about how δ13C increased over time at Teotihuacan, and use Pearson’s test for correlation between the two variables to rule out environmental causes for the interpretations about δ13C.

Scatterplot

First I relabel the site factors to include the temporal information

iso2$Site = factor(iso2$Site, c("Moon Pyramid", "Teopancazco", "Oztoyahualco", "Puerta 5", "Túneles y Cuevas", "modern"), labels=c("Moon Pyramid (150 BC-AD 450)", "Teopancazco (AD 200-550)", "Oztoyahualco (AD 350-550)", "Puerta 5 (AD 600-1150)", "Túneles y Cuevas (AD 1150-1500)", "modern (AD 1892-1966)"))

Then calculate summary statistics to create the summary graph Figure 4.

gm <- iso2 %>% 
        group_by(Site) %>% 
        summarise(mean_C13ap = mean(C13ap),
                  mean_O18ap  = mean(O18ap, na.rm=T),
                  C13SD=sd(C13ap),
                  O18SD=sd(O18ap, na.rm=T))
gm
## # A tibble: 6 x 5
##                              Site mean_C13ap mean_O18ap    C13SD    O18SD
##                            <fctr>      <dbl>      <dbl>    <dbl>    <dbl>
## 1    Moon Pyramid (150 BC-AD 450)  -8.578000   26.23878 2.433716 2.105035
## 2        Teopancazco (AD 200-550)  -6.936842   27.44211 1.434964 1.456905
## 3       Oztoyahualco (AD 350-550)  -5.957143   25.39286 2.158245 2.428505
## 4          Puerta 5 (AD 600-1150)  -7.171429   24.56190 2.016220 1.547086
## 5 Túneles y Cuevas (AD 1150-1500)  -8.920000   26.06000 1.977541 2.784162
## 6           modern (AD 1892-1966) -12.392308   26.07692 1.797434 1.404845
require(ggplot2)
g<-ggplot(gm, aes(x=mean_C13ap, y=mean_O18ap, shape = Site))
g+geom_point(data=gm, size=4)+ xlim(-16, -1) +ylim(21, 31)+
      geom_errorbarh(data = gm,aes(xmin = mean_C13ap - C13SD,xmax = mean_C13ap + C13SD,y = mean_O18ap)) + 
    geom_errorbar(data = gm,aes(ymin = mean_O18ap - O18SD,ymax = mean_O18ap + O18SD,x = mean_C13ap)) +
  labs(x="δ13C apatite-VPDB (‰)", y="δ18O apatite-VSMOW (‰)") +
  annotate(geom = "text", x = -12, y = 22, label = "note error bars are SD")

Compare this to their figure 4 (below)

Box plots

δ13C

require(stringr)
## Loading required package: stringr
g<-ggplot(iso2, aes(x=Site, y=C13ap))
g<-g+geom_boxplot(aes(fill=Genus))+
  ylim(-16, -2) +
  labs(x="" ,y="δ13C apatite-VPDB (‰)") +
  scale_color_manual(values=c("black", "black"))+
  scale_fill_manual(values=c( "grey90", "grey45"))+
  theme(axis.text.x = element_text(size=8, angle=45, hjust = 1))+
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+#this wraps the text on the x axis
   coord_fixed(.45)+
    geom_vline(aes(xintercept = 3.5), linetype=2) #this line separates classic period of Teotihuacan from epi and post Classic phases
g

ggsave(filename = "carbon_box.png", g,
       width = 8, height = 9, dpi = 300, units = "in", device='png')

There are clear temporal trends in the δ13C values at Teotihuacan, generally increasing in the Classic period and decreasing after the classic period. Compare to Figure 5 (below)

δ18O

require(stringr)
g<-ggplot(iso2, aes(x=Site, y=O18ap))
g<-g+geom_boxplot(aes(fill=Genus))+
  ylim(18, 32) +
  labs(x="" ,y="δ18O apatite-VSMOW (‰)") +
  scale_color_manual(values=c("black", "black"))+
  scale_fill_manual(values=c( "grey90", "grey45"))+
  theme(axis.text.x = element_text(size=8, angle=45, hjust = 1))+
  scale_x_discrete(labels = function(x) str_wrap(x, width = 15))+
   coord_fixed(.45)+
   geom_vline(aes(xintercept = 3.5), linetype=2)
g
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

ggsave(filename = "oxygen_box.png", g,
       width = 8, height = 9, dpi = 300, units = "in", device='png')
## Warning: Removed 1 rows containing non-finite values (stat_boxplot).

There is no obvious temporal trend in oxygen isotopes. Compare to Figure 6 (below)

The observations from the previous plots suggest no major correlation between Oxygen and Carbon isotope. This was tested using Pearson’s correlation test and the results supported their observation of no significant correlation.

Pearson’s Correlation

cor.test(x=iso2$C13ap, y=iso2$O18ap, method= "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  iso2$C13ap and iso2$O18ap
## t = 0.2772, df = 124, p-value = 0.7821
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1506772  0.1989266
## sample estimates:
##        cor 
## 0.02488555

Somerville et al.’s (2016:14) Pearson’s correlation: (R = -0.003, p = 0.971).