DataReAnalysis

A Brief Introduction

Arboreal embryos of red-eyed treefrogs, Agalychnis callidryas, hatch prematurely to escape from egg predators, cued by vibrations in attacks. Young embryos modulate hatching based on multiple frequency and temporal properties of cues, reducing false alarms that unnecessarily expose them to risk in the water. Because the cost of false alarms decreases developmentally we hypothesized that, if sampling costs are high or stimuli ambiguous, older embryos would accept more false alarms.

Methods, again briefly

We assessed changes in sensitivity to sampling costs using vibration playbacks (to trays of embryos, pictured above, sideways) at two developmental stages. We designed sets of 3 stimuli, based on prior results with younger embryos, so one elicited high hatching and two elicited similarly low hatching, but sampling costs differed between low-hatching stimuli.

File logistics

The original paper and full dataset used are uploaded onto the repository in pdf and csv form, respectively. I will replicate all analyses included in the paper, including mann-whitney-wilcoxon tests, two-sample t-tests, wilcoxon rank sum tests, binomial GLMs, interaction plots, ANOVAs, and AIC model comparisons.

A report of findings in paragraph form (as presented in the original paper) can be found at the end of the report.

First, we will clear the environment and set our working directory:

rm(list=ls()) #clear environment
setwd('/Users/juliejung/Documents/GitHub/AN 597/data-reanalysis-assignment') #set working directory

User Defined Functions

Here we’ll define some functions that will help us later:

#####################################################################
#########                f u n c t i o n s          #################
#####################################################################

The following function will give us the mode:

# gives mode
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

The following function will give us the count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%). Details are in annotations within script.

Important Variables: - data: a data frame. - measurevar: the name of a column that contains the variable to be summarized - groupvars: a vector containing names of columns that contain grouping variables - na.rm: a boolean that indicates whether to ignore NA’s - conf.interval: the percent range of the confidence interval (default is 95%)

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                      conf.interval=.95, .drop=TRUE) {
  library(plyr)
  # New version of length which can handle NA's: if na.rm==T, don't count them
  length2 <- function (x, na.rm=FALSE) {
    if (na.rm) sum(!is.na(x))
    else       length(x)
  }
  # This does the summary. For each group's data frame, return a vector with
  # N, mean, and sd
  datac <- ddply(data, groupvars, .drop=.drop,
                 .fun = function(xx, col) {
                   c(N    = length2(xx[[col]], na.rm=na.rm),
                     mean = mean   (xx[[col]], na.rm=na.rm),
                     sd   = sd     (xx[[col]], na.rm=na.rm)
                   )
                 },
                 measurevar
  )
  # Rename the "mean" column    
  datac <- rename(datac, c("mean" = measurevar))
  datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean
  # Confidence interval multiplier for standard error
  # Calculate t-statistic for confidence interval: 
  # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
  ciMult <- qt(conf.interval/2 + .5, datac$N-1)
  datac$ci <- datac$se * ciMult
  return(datac)
}

Preparation: Load packages

Here, we load packages that we’ll need later on in the code.

library(curl)

## Warning: package 'curl' was built under R version 3.1.3

#library(xlsx)
library(sciplot)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.1.3

library(MASS)

## Warning: package 'MASS' was built under R version 3.1.3

library(multcomp)

## Warning: package 'multcomp' was built under R version 3.1.3

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: splines

## Loading required package: TH.data

library(AICcmodavg)
library(car)

## Warning: package 'car' was built under R version 3.1.3

Preparation: read in data

Next, we want to read in the relevant datafile (from the data-reanalysis-assignment repo) and show a few lines of raw data in your output (e.g., using head()).

f <- curl("https://raw.githubusercontent.com/jamjulie/data-reanalysis-assignment/master/WarkentinJungRuedaMcDaniel-DataForDeposit.csv")
data <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(data)

##       Date Age.d. Set TestOrder STIMULUS SetupTime.h.  Tray
## 1 20151013    5.2   1         1       LS         3.02 390-2
## 2 20151013    5.2   1         2       LF         3.35   390
## 3 20151013    5.2   1         3       HF         3.75   370
## 4 20151013    5.2   2         4       HF         4.08 388-2
## 5 20151013    5.2   2         5       LS         4.45   388
## 6 20151013    5.2   2         6       LF         4.82 383-2
##   AlreadyHatchedorRuptured HatchedinSetup LessDeveloped TestEggs
## 1                        1              1             0       13
## 2                        3              0             0       12
## 3                        2              0             0       13
## 4                        1              2             0       12
## 5                        0              0             0       15
## 6                        3              0             1       11
##   FirstHatch.s. FirstHatch.s.600 CycleLength.s. FirstHatch.cycles.
## 1            90               90             17           5.294118
## 2            79               79              2          39.500000
## 3            NA              600              2                 NA
## 4            NA              600              2                 NA
## 5            67               67             17           3.941176
## 6            24               24              2          12.000000
##   FirstHatch.cycles.600 X5minHatch X10minHatch ProportionHatched
## 1              5.294118          2           2             0.154
## 2             39.500000          2           2             0.167
## 3            300.000000          0           0             0.000
## 4            300.000000          0           0             0.000
## 5              3.941176          3           3             0.200
## 6             12.000000          1           2             0.182
##   GutCoilStage T1length T2length T3length MeanLength
## 1           NA   10.793    8.124   11.124     10.014
## 2           NA   10.514   10.698   11.134     10.782
## 3           NA   10.489   10.527   11.000     10.672
## 4            0   10.244   10.768   10.644     10.552
## 5            0   10.655   10.346   11.501     10.834
## 6            1   10.856   11.158   10.323     10.779

Results

The following results follow the order of presented results in the paper (pdf in repo).

First, we will check there are at least 8 eggs per tray.

min(data$TestEggs, na.rm=T)

## [1] 8

Now let’s look at the structure of our data, and make sure the variables are defined as we want them to be defined.

str(data)

## 'data.frame':    81 obs. of  24 variables:
##  $ Date                    : int  20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 ...
##  $ Age.d.                  : num  5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 ...
##  $ Set                     : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ TestOrder               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ STIMULUS                : Factor w/ 3 levels "HF","LF","LS": 3 2 1 1 3 2 3 1 2 1 ...
##  $ SetupTime.h.            : num  3.02 3.35 3.75 4.08 4.45 4.82 5.27 5.77 6.08 6.52 ...
##  $ Tray                    : Factor w/ 81 levels "370","371-2",..: 27 26 1 24 23 18 17 25 22 19 ...
##  $ AlreadyHatchedorRuptured: int  1 3 2 1 0 3 1 1 0 0 ...
##  $ HatchedinSetup          : int  1 0 0 2 0 0 0 0 0 1 ...
##  $ LessDeveloped           : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ TestEggs                : int  13 12 13 12 15 11 14 14 15 14 ...
##  $ FirstHatch.s.           : int  90 79 NA NA 67 24 NA 58 34 NA ...
##  $ FirstHatch.s.600        : int  90 79 600 600 67 24 600 58 34 600 ...
##  $ CycleLength.s.          : int  17 2 2 2 17 2 17 2 2 2 ...
##  $ FirstHatch.cycles.      : num  5.29 39.5 NA NA 3.94 ...
##  $ FirstHatch.cycles.600   : num  5.29 39.5 300 300 3.94 ...
##  $ X5minHatch              : int  2 2 0 0 3 1 0 2 5 0 ...
##  $ X10minHatch             : int  2 2 0 0 3 2 1 2 5 0 ...
##  $ ProportionHatched       : num  0.154 0.167 0 0 0.2 0.182 0.071 0.143 0.333 0 ...
##  $ GutCoilStage            : int  NA NA NA 0 0 1 1 0 0 1 ...
##  $ T1length                : num  10.8 10.5 10.5 10.2 10.7 ...
##  $ T2length                : num  8.12 10.7 10.53 10.77 10.35 ...
##  $ T3length                : Factor w/ 80 levels "10.323","10.368",..: 21 22 16 7 41 1 10 13 2 20 ...
##  $ MeanLength              : num  10 10.8 10.7 10.6 10.8 ...

data$GutCoilStage<-as.numeric(as.character(data$GutCoilStage))
data$T3length<-as.numeric(as.character(data$T3length))

## Warning: NAs introduced by coercion

data$ProportionHatched<-as.numeric(as.character(data$ProportionHatched))
data$STIMULUS<-as.factor(data$STIMULUS)
data$FirstHatch.s. <-as.numeric(as.character(data$FirstHatch.s. ))
data$FirstHatch.cycles. <-as.numeric(as.character(data$FirstHatch.cycles. ))

Here let’s subset the data into 2 age groups: younger and older.

younger<-subset(data, Age.d.==5.2, na.rm=T)
older<-subset(data, Age.d.==5.7, na.rm=T)

Since the paper reports the mode of stages for each age category, let’s find that:

Mode(younger$GutCoilStage)

## [1] 1

Mode(older$GutCoilStage)

## [1] 3

Here we’ll create a table summary of statistics for hatching rates (in proportion hatched) for each age category, grouped by stimulus treatment.

“summarySE”" is a user defined function, which you’ll find above in the functions section.

younger_errorstats_g8 <- summarySE(younger, measurevar="ProportionHatched", groupvars="STIMULUS")

## Warning: package 'plyr' was built under R version 3.1.3

younger_errorstats_g8$AgeGroup<-"younger"

older_errorstats_g8 <- summarySE(older, measurevar="ProportionHatched", groupvars="STIMULUS")
older_errorstats_g8$AgeGroup<-"older"

Should we use parametric or non-parametric tests?

hist(data$GutCoilStage) #non parametric

hist(younger$GutCoilStage)

hist(older$GutCoilStage)

We’ll use mann-whitney-wilcoxon test because our 2 data samples (younger and older) are independent/come from distinct populations, so the samples do not affect each other.

Also because our data are non-parametric.

wtest<-wilcox.test(data$GutCoilStage~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)

## Warning in wilcox.test.default(x = c(0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(x = c(0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, :
## cannot compute exact confidence intervals with ties

qnorm(wtest$p.value) #z-value to report

## [1] -6.606223

wtest

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$GutCoilStage by data$Age.d.
## W = 88.5, p-value = 1.971e-11
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -1.999996 -1.000003
## sample estimates:
## difference in location 
##              -1.999985

Moving right along, here we’ll calculation the total lengths (mean and se) of hatchlings in each age group.

mean(younger$MeanLength)

## [1] 11.06536

mean(older$MeanLength)

## [1] 11.64307

se(younger$MeanLength)

## [1] 0.07123206

se(older$MeanLength)

## [1] 0.0546835

Since the data are normal and we get equal variances, we’ll perform a t-test.

hist(data$MeanLength)#normal

var.test(younger$MeanLength, older$MeanLength) # equal variances

## 
##  F test to compare two variances
## 
## data:  younger$MeanLength and older$MeanLength
## F = 1.5756, num df = 38, denom df = 41, p-value = 0.1554
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8396433 2.9795488
## sample estimates:
## ratio of variances 
##           1.575628

t.test(younger$MeanLength, older$MeanLength, alternative=c("two.sided"), paired=FALSE, var.equal=TRUE, conf.level=0.95)

## 
##  Two Sample t-test
## 
## data:  younger$MeanLength and older$MeanLength
## t = -6.4874, df = 79, p-value = 7.01e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7549655 -0.4004594
## sample estimates:
## mean of x mean of y 
##  11.06536  11.64307

We want to calculate some means and ses for spontaneous hatching before the test period.

mean(younger$AlreadyHatchedorRuptured)

## [1] 0.5128205

se(younger$AlreadyHatchedorRuptured)

## [1] 0.1317985

mean(older$AlreadyHatchedorRuptured)

## [1] 0.9047619

se(older$AlreadyHatchedorRuptured)

## [1] 0.1219733

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$AlreadyHatchedorRuptured)#non-parametric, mann-whitney

wtest<-wilcox.test(data$AlreadyHatchedorRuptured~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)

## Warning in wilcox.test.default(x = c(1L, 3L, 2L, 1L, 0L, 3L, 1L, 1L, 0L, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(x = c(1L, 3L, 2L, 1L, 0L, 3L, 1L, 1L, 0L, :
## cannot compute exact confidence intervals with ties

qnorm(wtest$p.value) #z-value to report

## [1] -2.439028

wtest

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$AlreadyHatchedorRuptured by data$Age.d.
## W = 559.5, p-value = 0.007363
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -9.999772e-01 -6.210509e-05
## sample estimates:
## difference in location 
##          -2.881878e-05

We want to calculate some means and ses for the number of individuals that hatched before the test period.

# hatching from set up
mean(younger$HatchedinSetup)

## [1] 0.8974359

se(younger$HatchedinSetup)

## [1] 0.2317418

mean(older$HatchedinSetup)

## [1] 2.452381

se(older$HatchedinSetup)

## [1] 0.2974971

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$HatchedinSetup)#non-parametric, mann-whitney

wtest<-wilcox.test(data$HatchedinSetup~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)

## Warning in wilcox.test.default(x = c(1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(x = c(1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, :
## cannot compute exact confidence intervals with ties

qnorm(wtest$p.value) #z-value to report

## [1] -3.886517

wtest

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$HatchedinSetup by data$Age.d.
## W = 404.5, p-value = 5.085e-05
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -2.0000029 -0.9999489
## sample estimates:
## difference in location 
##              -1.000005

We want to calculate some means and ses for the number of test eggs per tray in older vs. younger.

#smaller number of test eggs per tray in older than younger
mean(younger$TestEggs)

## [1] 13.46154

se(younger$TestEggs)

## [1] 0.2404622

mean(older$TestEggs)

## [1] 11.45238

se(older$TestEggs)

## [1] 0.3089873

min(younger$TestEggs)

## [1] 8

max(younger$TestEggs)

## [1] 15

min(older$TestEggs)

## [1] 8

max(older$TestEggs)

## [1] 15

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$TestEggs)#non-parametric, mann-whitney

wtest<-wilcox.test(data$TestEggs~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)

## Warning in wilcox.test.default(x = c(13L, 12L, 13L, 12L, 15L, 11L, 14L, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(x = c(13L, 12L, 13L, 12L, 15L, 11L, 14L, :
## cannot compute exact confidence intervals with ties

qnorm(wtest$p.value) #z-value to report

## [1] -4.361872

wtest

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$TestEggs by data$Age.d.
## W = 1289.5, p-value = 6.448e-06
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  1.000044 2.999979
## sample estimates:
## difference in location 
##               2.000011

Next, we want to know whether age, stimulus, and/or their interaction affected the hatching response of embryos in playbacks

We’ll do this by performing a binomial GLM:

#binomial glm
glm1<-glm(cbind(X10minHatch,TestEggs)~Age.d., family=binomial(logit), data=data)
glm2<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=data)
glm3<-glm(cbind(X10minHatch,TestEggs)~Age.d.+STIMULUS, family=binomial(logit), data=data)
glm4<-glm(cbind(X10minHatch,TestEggs)~Age.d.*STIMULUS, family=binomial(logit), data=data)

glms<-list(glm1, glm2, glm3, glm4)
aictab(glms)

## Warning in aictab.AICglm.lm(glms): 
## Model names have been supplied automatically in the table

## 
## Model selection based on AICc :
## 
##      K   AICc Delta_AICc AICcWt Cum.Wt      LL
## Mod4 6 329.47       0.00   0.97   0.97 -158.17
## Mod3 4 336.80       7.33   0.03   1.00 -164.14
## Mod1 2 390.29      60.82   0.00   1.00 -193.07
## Mod2 3 394.16      64.69   0.00   1.00 -193.93

Anova(glm4)

## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            59.578  1  1.175e-14 ***
## STIMULUS          57.863  2  2.724e-13 ***
## Age.d.:STIMULUS   11.934  2   0.002562 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This is the plot for our proportion hatched data (Figure 6 in paper)

interaction.plot(data$Age.d., data$STIMULUS, data$ProportionHatched, 
                 leg.bty="0", pch=c(18,24), 
                 ylab="Proportion hatched", 
                 xlab="Age (days)", 
                 trace.label="")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’re looking at part A of the figure.

ALTERNATIVELY, we can use ggplot to create a much prettier plot, with standard errors. Note that ideally the order of younger and older would be switched, but the data presented are the same as presented in the original paper.

If I wanted to, I could switch this order by following the steps through this link:

ggplot(data, aes(x=as.factor(AgeGroup), y=ProportionHatched, colour=STIMULUS)) + 
  geom_point(data=younger_errorstats_g8 , size=3) +
  geom_errorbar(data=younger_errorstats_g8, aes(ymin=ProportionHatched-se, ymax=ProportionHatched+se), width=0.05)+ 
  geom_point(data=older_errorstats_g8, size=3) +
  geom_line(data=older_errorstats_g8)+
  geom_errorbar(data=older_errorstats_g8, aes(ymin=ProportionHatched-se, ymax=ProportionHatched+se), width=0.05)+ 
  theme_bw(20)+
  ylab("Proportion of tray hatched\n")+
  xlab("\n Age group")

Moving on: Here we perform separate binomial GLMs at each age, with grouped categories of treatment stimuli.

#binomial glm for YOUNGER
glm2<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=younger)
Anova(glm2)

## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)    
## STIMULUS   40.133  2  1.929e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#binomial glm for OLDER
glm2<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=older)
Anova(glm2)

## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)    
## STIMULUS   29.664  2  3.618e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Latency Analysis:

Because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

min(data$FirstHatch.s., na.rm=T)

## [1] 9

max(data$FirstHatch.s., na.rm=T)

## [1] 420

data$FirstHatch.cycles.

##  [1]   5.2941176  39.5000000          NA          NA   3.9411765
##  [6]  12.0000000          NA  29.0000000  17.0000000          NA
## [11]  12.3529412  29.5000000          NA   9.5000000  80.0000000
## [16]  61.5000000  12.5000000   1.5882353          NA 210.0000000
## [21]  60.0000000 125.5000000  12.0000000   3.7647059  13.0000000
## [26]  13.9411765  22.5000000          NA          NA          NA
## [31]          NA          NA          NA   7.2941176   5.5000000
## [36] 115.5000000   6.5000000  22.5000000   4.0588235  17.0000000
## [41]   0.8235294  12.5000000  47.5000000   6.0000000   1.8823529
## [46]   2.6470588  13.5000000  15.0000000  18.0000000   7.5000000
## [51]   1.7058824          NA          NA  10.5000000          NA
## [56]   4.1176471   8.5000000   2.0000000          NA  11.5000000
## [61]   8.0000000          NA   1.4705882   0.7647059  10.0000000
## [66]   6.5000000  12.0000000   1.3529412  10.0000000   5.5000000
## [71]   0.5294118  12.5000000   2.1176471   9.0000000  19.5000000
## [76]  13.5000000   8.0000000   4.4705882   1.0588235  13.0000000
## [81]   5.5000000

Are our data parametric?

hist(data$FirstHatch.s.) #nonparametric

hist(data$FirstHatch.cycles.) #nonparametric

hist(log(data$FirstHatch.s.)) #parametric!

hist(log(data$FirstHatch.cycles.)) #parametric!

Since they are, we can use ANOVAs of log-transformed data to test for effects of age, stimulus and age-by-stimulus interaction on the latency to hatch.

# IN SECONDS, Anova
aov1 <- aov(FirstHatch.s. ~ Age.d.*STIMULUS, data=data)
summary(aov1)

##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## Age.d.           1  77870   77870  25.158 5.31e-06 ***
## STIMULUS         2  45922   22961   7.418  0.00135 ** 
## Age.d.:STIMULUS  2  32665   16332   5.277  0.00785 ** 
## Residuals       58 179525    3095                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 17 observations deleted due to missingness

# IN CYCLES, Anova
aov2 <- aov(FirstHatch.cycles. ~ Age.d.*STIMULUS, data=data)
summary(aov2)

##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## Age.d.           1  10440   10440  17.838 8.59e-05 ***
## STIMULUS         2  18059    9029  15.429 4.24e-06 ***
## Age.d.:STIMULUS  2  10953    5476   9.358    3e-04 ***
## Residuals       58  33943     585                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 17 observations deleted due to missingness

Alternatively, we can use GLMs:

# IN SECONDS, GLM METHOD
glm1<-glm(FirstHatch.s.~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.s.~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.s.~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.s.~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
aictab(glms)

## Warning in aictab.AICglm.lm(glms): 
## Model names have been supplied automatically in the table

## 
## Model selection based on AICc :
## 
##      K   AICc Delta_AICc AICcWt Cum.Wt      LL
## Mod4 7 705.73       0.00   0.95   0.95 -344.87
## Mod3 5 711.47       5.73   0.05   1.00 -350.22
## Mod1 3 719.37      13.64   0.00   1.00 -356.48
## Mod2 4 729.28      23.55   0.00   1.00 -360.30

Anova(glm4)

## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            25.396  1  4.669e-07 ***
## STIMULUS          14.836  2  0.0006002 ***
## Age.d.:STIMULUS   10.553  2  0.0051100 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we plot the latencies for the first embryo in each tray to hatch:

In order to do this, we’ll get the SEs using our summarySE function.

LtoH_younger_errorstats_g8 <- summarySE(younger, measurevar="FirstHatch.s.", groupvars="STIMULUS", na.rm=TRUE)
LtoH_younger_errorstats_g8$AgeGroup<-"younger"

LtoH_older_errorstats_g8 <- summarySE(older, measurevar="FirstHatch.s.", groupvars="STIMULUS", na.rm=TRUE)
LtoH_older_errorstats_g8$AgeGroup<-"older"

Now we can make the plot!

ggplot(data, aes(x=as.factor(AgeGroup), y=FirstHatch.s., colour=STIMULUS)) + 
  geom_point(data=LtoH_younger_errorstats_g8 , size=3) +
  geom_errorbar(data=LtoH_younger_errorstats_g8, aes(ymin=FirstHatch.s.-se, ymax=FirstHatch.s.+se), width=0.05)+ 
  geom_point(data=LtoH_older_errorstats_g8, size=3) +
  geom_errorbar(data=LtoH_older_errorstats_g8, aes(ymin=FirstHatch.s.-se, ymax=FirstHatch.s.+se), width=0.05)+ 
  theme_bw(20)+
  ylab("Latency to hatch (seconds)\n")+
  xlab("\n Age group")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’ve replotted part B of this figure.

Recall that because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles, we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

## IN CYCLES, GLM METHOD

glm1<-glm(FirstHatch.cycles.~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.cycles.~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.cycles.~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.cycles.~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
aictab(glms)

## Warning in aictab.AICglm.lm(glms): 
## Model names have been supplied automatically in the table

## 
## Model selection based on AICc :
## 
##      K   AICc Delta_AICc AICcWt Cum.Wt      LL
## Mod4 7 599.13       0.00      1      1 -291.57
## Mod3 5 612.07      12.93      0      1 -300.52
## Mod2 4 621.83      22.70      0      1 -306.58
## Mod1 3 629.07      29.93      0      1 -311.33

Anova(glm4)

## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.cycles.
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            15.999  1  6.337e-05 ***
## STIMULUS          30.858  2  1.992e-07 ***
## Age.d.:STIMULUS   18.715  2  8.630e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The shortest latency times were for the LF stimulus (younger: 39.5 ± 10.1 s; older 18.6 ± 1.9 s), while latencies for the LS stimulus included the fewest cycles of the vibration pattern (younger: 6.5 ± 1.6; older: 1.9 ± 0.3 cycles).

HF<-subset(data, STIMULUS=="HF", na.rm=T)
LF<-subset(data, STIMULUS=="LF", na.rm=T)
LS<-subset(data, STIMULUS=="LS", na.rm=T)

HFyounger<-subset(HF, Age.d.==5.2, na.rm=T)
HFolder<-subset(HF, Age.d.==5.7, na.rm=T)
LFyounger<-subset(LF, Age.d.==5.2, na.rm=T)
LFolder<-subset(LF, Age.d.==5.7, na.rm=T)
LSyounger<-subset(LS, Age.d.==5.2, na.rm=T)
LSolder<-subset(LS, Age.d.==5.7, na.rm=T)

mean(LFyounger$FirstHatch.s., na.rm=T)

## [1] 39.45455

se(LFyounger$FirstHatch.s., na.rm=T)

## [1] 10.11692

mean(LFolder$FirstHatch.s., na.rm=T)

## [1] 18.64286

se(LFolder$FirstHatch.s., na.rm=T)

## [1] 1.877007

mean(LSyounger$FirstHatch.cycles., na.rm=T)

## [1] 6.529412

se(LSyounger$FirstHatch.cycles., na.rm=T)

## [1] 1.557356

mean(LSolder$FirstHatch.cycles., na.rm=T)

## [1] 1.918552

se(LSolder$FirstHatch.cycles., na.rm=T)

## [1] 0.3367673

We conducted analyses of latency both on the subset of trays from which at least one individual hatched and also on the full dataset, assigning a latency of 600 s (i.e., the full playback plus post-playback observation period) to trays in which no embryos hatched.

# IN SECONDS, Anova
aov1 <- aov(FirstHatch.s.600 ~ Age.d.*STIMULUS, data=data)
summary(aov1)

##                 Df  Sum Sq Mean Sq F value  Pr(>F)   
## Age.d.           1  497562  497562  11.521 0.00110 **
## STIMULUS         2  501036  250518   5.801 0.00455 **
## Age.d.:STIMULUS  2   51388   25694   0.595 0.55418   
## Residuals       75 3239146   43189                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# IN SECONDS, GLM METHOD
glm1<-glm(FirstHatch.s.600~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.s.600~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.s.600~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.s.600~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
aictab(glms)

## Warning in aictab.AICglm.lm(glms): 
## Model names have been supplied automatically in the table

## 
## Model selection based on AICc :
## 
##      K    AICc Delta_AICc AICcWt Cum.Wt      LL
## Mod3 5 1100.25       0.00   0.82   0.82 -544.72
## Mod4 7 1103.71       3.46   0.15   0.97 -544.09
## Mod1 3 1107.24       6.99   0.02   0.99 -550.46
## Mod2 4 1109.38       9.13   0.01   1.00 -550.43

Anova(glm4)

## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.600
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.           11.5207  1  0.0006883 ***
## STIMULUS         11.6011  2  0.0030259 ** 
## Age.d.:STIMULUS   1.1899  2  0.5516027    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we plot the latencies for the first embryo in each tray to hatch IN CYCLES:

In order to do this, we’ll get the SEs using our summarySE function.

cycles_younger_errorstats_g8 <- summarySE(younger, measurevar="FirstHatch.cycles.", groupvars="STIMULUS", na.rm=TRUE)
cycles_younger_errorstats_g8$AgeGroup<-"younger"

cycles_older_errorstats_g8 <- summarySE(older, measurevar="FirstHatch.cycles.", groupvars="STIMULUS", na.rm=TRUE)
cycles_older_errorstats_g8$AgeGroup<-"older"

Now we can make the plot!

ggplot(data, aes(x=as.factor(AgeGroup), y=FirstHatch.cycles., colour=STIMULUS)) + 
  geom_point(data=cycles_younger_errorstats_g8 , size=3) +
  geom_errorbar(data=cycles_younger_errorstats_g8, aes(ymin=FirstHatch.cycles.-se, ymax=FirstHatch.cycles.+se), width=0.05)+ 
  geom_point(data=cycles_older_errorstats_g8, size=3) +
  geom_errorbar(data=cycles_older_errorstats_g8, aes(ymin=FirstHatch.cycles.-se, ymax=FirstHatch.cycles.+se), width=0.05)+ 
  theme_bw(20)+
  ylab("Latency to hatch (cycles)\n")+
  xlab("\n Age group")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’ve replotted part C of this figure.

Report of results

Younger embryos showed a range of stages from intact yolks to curved furrows while older embryos had straight furrows to full coils (N = 39 and 42; modes: straight furrow(1) and full coil(3); Wilcoxon test, Z=6.606223, P =1.97e-11). Embryos grew between test ages. Hatchling total length increased from 11.1 ± 0.07 to 11.6 ± 0.05 mm (mean ± SE, N = 39 and 42 trays respectively; t-test, t79 = 6.4871, P =7.018e-9; Fig. 4B). Embryos also hatched spontaneously between test ages. From trays tested, more of the older embryos had hatched spontaneously before the test period (0.9 ± 0.1 vs. 0.5 ± 0.1 embryos per tray; Wilcoxon test, Z=2.439, P = 0.0074) and relatively fewer of the trays with older eggs had sufficient individuals to attempt setup (KMW unquantified personal observation). In addition, more of the older embryos hatched during setup and acclimation (2.5 ± 0.3 vs. 0.9 ± 0.2 per tray; Wilcoxon test, Z=404.5, P = 5.085e-5), resulting in smaller numbers of test eggs per tray (Wilcoxon test, Z=4.361872, P =6.448e-6; older 11.5 ± 0.3, younger 13.5 ± 0.2, range 8–15 at both ages).

Age, stimulus, and their interaction affected the hatching response of embryos in playbacks (Fig. 6A). Older embryos hatched more than did younger ones (binomial GLM, age effect: χ2 = 107.71, df = 1, P < 0.0001), and the LF stimulus elicited the strongest hatching response (stimulus effect: χ2 = 57.57, df = 2, P < 0.0001). However, the significant age × stimulus interaction revealed that the developmental increase in hatching was not uniform across stimuli (interaction effect: χ2 = 12.55, df = 2, P = 0.0019). Younger embryos showed equally little response to both the HF and LS stimuli but a substantial hatching response to the LF stimulus (Fig. 6A, stimulus effect: χ2 = 57.57, df = 2, P < 0.0001; HF–LS contrast χ2 = 1.48, P = 0.22; (HF+LS)–LF contrast χ2 = 370 56.75, P < 0.0001). In contrast, older embryos showed similarly strong hatching responses to both LF and LS stimuli and a weaker response to the HF stimulus (stimulus effect: χ2 = 73.84, df = 2, P < 0.0001; LF–LS contrast χ2 = 2.46, P = 0.12; (LF+LS)–HF contrast χ2 = 71.61, P < 0.0001).

In trays where hatching occurred, the latency until the first embryo hatched ranged from 9–420 s. Latency varied among stimuli and decreased with age, with a marginally non-significant interaction effect (Fig. 6B,C). This pattern held whether measuring latency in seconds or in cycles of vibration (seconds: age, F1,58 = 44.07, P < 0.0001; stimulus, F2,58 = 13.88, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625; cycles: age, F1,58 = 44.05, P < 0.0001; stimulus, F2,58 = 31.78, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625). The shortest latency times were for the LF stimulus (younger: 39.5 ± 10.1 s; older 18.6 ± 1.9 s), while latencies for the LS stimulus included the fewest cycles of the vibration pattern (younger: 6.5 ± 1.6; older: 1.9 ± 0.3 cycles). If we assign a latency of 600 s to trays in which no embryos hatched, age and stimulus effects remain strong and the interaction effect is much weaker (latency in seconds: age, F1,75 = 27.98, P < 0.0001; stimulus, F2,75 = 7.77, P = 0.0009; interaction F2,75 = 0.75, P = 0.47).

Discussion

Older embryos showed lower latency to hatch, indicating less cue sampling, and more hatching overall. Their similarly high responses to two of the stimuli suggest they ceased to discriminate using slow-to-assess properties as indicators of safety; however, they showed little hatching if either frequency spectrum or a fast temporal pattern allowed rapid assessment of low risk.

Conclusion

Developmental changes in behavior due to ontogenetic adaptation of decision processes are likely to be widespread. Vibration-cued hatching allows us to use the power of playback experiments to improve our understanding of the development of adaptive embryo behavior.

DataReAnalysis

Julie Jung

November 9, 2017

A Brief Introduction

Methods, again briefly

File logistics

User Defined Functions

Preparation: Load packages

Preparation: read in data

Results

Report of results

Discussion

Conclusion