Fall 2021, Seminar will be held on Fridays, 2:30 - 5:15pm in WED 206
(that’s Wheelock College of Education, 2 Silber Way)
Graduate Student, Department of Anthropology
Office: Stone Science Building (STO), 675 Commonwealth Ave, Rm 251
Office Hours: Thursdays 11am - Noon (with Zoom option)
Statistical methods are the backbone of scientific research, but are often given short shrift when designing research in biological anthropology. The purpose of this seminar is two-fold: 1) to familiarize students with the use of relevant statistical programming packages (primarily R), and 2) to discuss select advances in statistical techniques from related disciplines that may help students while designing and implementing their own research projects.
Potential foci of discussion may include statistical methods for accounting for small sample sizes or non-normal data, using power analyses and preliminary statistics to justify data collection design, and the use of mixed models and information theoretic approaches to analyze a number of different data types. Although there will be a discussion element to the seminar, students should see this course as a guided workshop or practicum in which we learn by working with both our own and previously published datasets to better understand hypothesis testing using statistical inference in biological anthropology.
This course is open to students outside of Anthropology willing to learn the methods involved. Past students include undergraduate and graduate students from Anthropology, Archaeology, Biology, and Economics.
BU HUB Learning Outcomes
This course has been accepted to the BU HUB Undergraduate General Education Curriculum as CAS AN/BI 588. It has several proposed Learning Outcomes related to its assigned Hub Capacities, including:
Scientific Inquiry II (SI2)
- Learning Outcome 1: Students in this course will learn to identify and apply appropriate methods of statistical inference to test hypotheses related to biological anthropology. This will de done in the R programming framework using published or simulated datasets from a number of scientific literatures, primarily from primate morphology and behavior. Throughout the semester, students will use these tests to appropriately frame and address established hypotheses in biological anthropology using increasingly complex statistical methods, including using t-tests to assess differences in body size between wild and anthropogenically impacted vervet monkey groups, testing hypothesized correlations between female body size and various life history traits across primates using multiple regression, and assessing shifts in male grooming attention towards periovulatory female chimpanzees using generalized linear mixed modeling, among others. This will culminate in a more advanced application of these methods by engaging in the R-based replication of a published paper with open access data in the student’s field of interest, and the development of their own teaching module demonstrating a novel statistical method and how it may be used to test a novel hypothesis in a new dataset. The replication assignment will facilitate a hands-on, critical assessment of how these authors used data processing, manipulation, analyses, and figures to reach their published conclusions. The teaching module will allow students to reflect critically on a novel method and how it might be used (or may be misused) to inform a hypothetical framework of their choosing in a novel dataset.
Quantitative Reasoning II (QR2)
- Learning Outcome 1: In this courses, students will learn how to frame questions germane to biological anthropology through the explicit testing of hypotheses using statistical inference. Using the R statistical programming framework, students will explore the underlying logic and mathematics of probability, hypothesis testing, linear modeling and regression, ANOVA, multiple regression, generalized linear modeling and mixed effects modeling, and then use these methods to solve complex problems in biological anthropology with both real and simulated data.
- Learning Outcome 2: Online, R-based statistical modules guide students through the development and use of the above methods in numerous datasets drawn from studies of wild primates and museum specimens to test hypotheses central to biological anthropology and evolutionary biology. These questions range from using the poisson distribution to predict the frequency of titi monkey morning duets, to using generalized linear modeling to assess whether male rank has an impact on fitness outcomes in a population of woolly monkeys.
- Learning Outcome 3: With every statistical module students will challenged to test established hypotheses using statistical inference. They are then challenged in homework assignments to investigate data structures and formulate and test their own hypotheses using the statistical methods learned in class. This culminates in a replication assignment from the primary literature, in which students critically evaluate the methods and conclusion of a published paper from the primary literature in their field.
- Learning Outcome 4: Through the use of GitHub-based shared repositories and the R Markdown language, students will learn to communicate via annotated coding chunks in data reports to explain their logic and choice of statistical coding options to address bi-weekly homework assignments and statistical module challenges. The Peer Commentary of homework assignments requires that students engage with each other to both improve their code and ensure that the symbolic, visual, numerical, and written account of their data processing and analysis are communicated effectively. The final assignments of the class, a group-based homework module teaching a novel statistical test not presented in the class and a replication of an analysis from the primary literature, both encourage students to design legible and engaging representations of the their data analysis process, and to adequately communicate these methods both verbally and visually.
- Learning Outcome 5: Through an emphasis on testing the assumptions underlying the statistical tests learned in class, students will learn to recognize and articulate both the capacity and limitations of these methods. This includes modules and discussions testing data distributions and model residuals for normality, using power tests to assess the effects of sample size and variance on parameter estimation, assessing the appropriateness of link functions for different data distributions in GLMMs, and more, complete with demonstrations of what poorly analyzed data looks like, and how that may negatively influence scientific inference.
CAS AN 102 or CAS BI 107/108 (for undergraduates) or graduate student standing, and/or consent of instructor. At least one semester of introductory statistics is recommended, but not required. Prior experience programming is helpful, but also not required.
This is a 4 credit seminar course. Seminar will be held once a week for a total of 3 hours. Please bring laptops or tablets to class loaded with appropriate software for course exercises (these can be found in Resources, above; packages required for the week will be noted in the Modules).
Performance in the class will be assessed on a gradeless
basis for the semester, with only a single final grade being assigned in consultation with each student. Assessment will entail the following assignments and considerations:
- Regular attendance and class participation (10%).
- On-time completion of assigned Modules prior to seminar meetings (10%).
- Programming homework sets associated with each Module, due Tuesday at 8:00 pm to your assigned peer commentary partner(s) (20%).
- Respectful Peer Commentary on homework coding, that puts into practice teamwork and pair programming practices discussed in class and in readings, to be submitted to me by 5:00pm Thursday (10%).
- One individual Analysis Replication Assignment based on a published paper with a publicly available dataset, chosen in consultation with the instructor (20%).
- One group presentation and written R vignette demonstrating the use of a particular statistical method chosen in consultation with the instructor (past examples available in the Modules). Group participation will be a large part of evaluation, and must also put into practice teamwork methods discussed in readings and class (20%).
- A final Self Evaluation written by you arguing for the grade you have earned through your progress and the quality of your work in the class (5%).
Kabacoff R. 2011. R in Action, 2nd Edition. New York: Manning Publications.
Tillman D. 2016. The Book of R: A First Course in Programming and Statistics. San Francisco: No Starch Press.
Tillman available in print or electronic format from No Starch Press and O’Reilly Media; Kabacoff available in print or electronic format from Manning Publications, where the third edition is almost complete and is available - I recommend using the third edition if you can (you can also find the second edition as a PDF here); both texts are available at Amazon.com.
Optional Texts Students Find Helpful
By the end of this course, you should:
be familiar with key concepts and methods in applied data science for acquiring and managing data, conducting exploratory data analyses, testing statistical hypotheses, building models to classify and make predictions about data, and evaluating model performance;
have a facility with modern tools for data analysis, (e.g., the Unix command line, version control systems, the R programming environment, web APIs) and be able to apply “best practices” in data science;
know how to interact with both local and remote data sources to store, query, process, and analyze data presented a variety of common formats (e.g., delimited text files, structured text files, various database systems);
be comfortable writing simple computer programs for data management, statistical analysis, visualization, and more specialized applications;
know how to design and implement reproducible data science workflows that take a project from data acquisition to analysis to presentation and be able to organize your work using a version control system;
be able to accurately assess, critique, and reproduce existing published works utilizing public and open source data repositories and analytical techniques;
be able to work as part of an effective team to problem solve and implement effective coding practices towards a group analytical goal;
and be able to apply all of these tools to questions of interest in the natural and social sciences.