Must be submitted for Peer Commentary by 8:00 pm Tuesday, November 9th.
Peer Commentary and Final Homework DUE at 5:00 pm Wednesday, November 22nd.
Create a new GitHub repo and git-referenced Rstudio Project called “AN588_Boots_BUlogin”. Within that repo, create a new .Rmd
file called “BUlogin_OriginalHomeworkCode_04”. Don’t forget to add your Peer Group and instructor as collaborators, and to accept their invitations to you. Making sure to push both the markdown and knitted .html
files to your repository, do the following:
You are welcome to work with your Peer Group together on this homework assignment or on your own. If you work with someone else, please include all of your names in the header information for your .Rmd
file.
When we initially discussed the central limit theorem and confidence intervals, we showed how we could use bootstrapping to estimate standard errors and confidence intervals around certain parameter values, like the mean. Using bootstrapping, we could also do the same for estimating standard errors and CIs around regression parameters, such as \(\beta\) coefficients.
[1] Using the “KamilarAndCooperData.csv” dataset, run a linear regression looking at log(HomeRange_km2) in relation to log(Body_mass_female_mean) and report your \(\beta\) coeffiecients (slope and intercept).
[2] Then, use bootstrapping to sample from your data 1000 times with replacement, each time fitting the same model and calculating the same coefficients. This generates a sampling distribution for each \(\beta\) coefficient.
Estimate the standard error for each of your \(\beta\) coefficients as the standard deviation of the sampling distribution from your bootstrap and determine the 95% CI for each of your \(\beta\) coefficients based on the appropriate quantiles from your sampling distribution.
How does the former compare to the SE estimated from your entire dataset using the formula for standard error implemented in lm()
?
How does the latter compare to the 95% CI estimated from your entire dataset?
EXTRA CREDIT
Write a FUNCTION that takes as its arguments a dataframe, “d”, a linear model, “m” (as a character string, e.g., “logHR~logBM”), a user-defined confidence interval level, “conf.level” (with default = 0.95), and a number of bootstrap replicates, “n” (with default = 1000). Your function should return a dataframe that includes: beta coefficient names; beta coefficients, standard errors, and upper and lower CI limits for the linear model based on your entire dataset; and mean beta coefficient estimates, SEs, and CI limits for those coefficients based on your bootstrap.
EXTRA EXTRA CREDIT
Graph each beta value from the linear model and its corresponding mean value, lower CI and upper CI from a bootstrap as a function of number of bootstraps from 10 to 200 by 10s. HINT: the beta value from the linear model will be the same for all bootstraps and the mean beta value may not differ that much!
.Rproj
and README
file):
NOTE: If you want your homework code to look nice (beyond being very well annotated and commented), and be easy to use by others, you can check out the relatively simply example R Markdown templates in the AN588_Week_3_caschmit repo.
Please also consider consulting the following helpful guidelines on how to write effective R Markdown documents (also available at the end of Module 03), which go well beyond the simple formatting of the templates.