AN/BI 588: Project Design and Statistics in Biological Anthopology

Spring 2025, Seminar will be held on Tuesdays, 3:30 - 6:15pm in CGS 113
(that’s on the first floor of the CGS Building, 871 Commonwealth Ave)

Faculty Instructor:

Christopher A. Schmitt
Associate Professor of Anthropology, Biology, Women’s Gender & Sexuality Studies
Office: Stone Science Building (STO), 675 Commonwealth Ave, Rm 247E
Office Hours: Mondays 1:00 - 3:00 pm, Tuesdays 1:00 - 2:00 pm
Web: http://www.evopropinquitous.net
Email: caschmit[at]bu[dot]edu
Social Media: fuzzyatelin

Course Outline

Modules

Assignments

Resources

Policies

Insurmountable Coding Problems

Course Description

Statistical methods are the backbone of scientific research, but are often given short shrift when designing research in biological anthropology. The purpose of this seminar is two-fold: 1) to familiarize students with the use of relevant statistical programming packages (primarily R), and 2) to discuss select advances in statistical techniques from related disciplines that may help students while designing and implementing their own research projects.

Potential foci of discussion may include statistical methods for accounting for small sample sizes or non-normal data, using power analyses and preliminary statistics to justify data collection design, and the use of mixed models and information theoretic approaches to analyze a number of different data types. Although there will be a discussion element to the seminar, students should see this course as a guided workshop or practicum in which we learn by working with both our own and previously published datasets to better understand hypothesis testing using statistical inference in biological anthropology.

This course is open to students outside of Anthropology willing to learn the methods involved. Past students include undergraduate and graduate students from Anthropology, Archaeology, Biology, and Economics.

Prerequisites

CAS AN 102 or CAS BI 107/108 (for undergraduates) or graduate student standing, and/or consent of instructor. At least one semester of introductory statistics is recommended, but not required. Prior experience programming is helpful, but also not required.

Course Format

This is a 4 credit seminar course. Seminar will be held once a week for a total of 3 hours. Please bring laptops or tablets to class loaded with appropriate software for course exercises (these can be found in Resources, above; packages required for the week will be noted in the Modules).

Assessment

Performance in the class will be assessed on a gradeless basis for the semester, with only a single final grade being assigned in consultation with each student. Assessment will entail the following assignments and considerations:

Regular attendance and class participation (10%).
On-time completion of assigned Modules prior to seminar meetings (10%).
Programming homework sets associated with certain Modules, due several days prior to the homework due date to your assigned peer commentary partner(s) (20%).
Respectful Peer Commentary on homework coding, that puts into practice teamwork and pair programming practices discussed in class and in readings, to be submitted to me by assignment due dates (10%).
One individual Analysis Replication Assignment based on a published paper with a publicly available dataset, chosen in consultation with the instructor (20%).
One group presentation and written R vignette demonstrating the use of a particular statistical method chosen in consultation with the instructor (past examples available in the Modules). Group participation will be a large part of evaluation, and must also put into practice teamwork methods discussed in readings and class (20%).
A final Self Evaluation written by you arguing for the grade you have earned through your progress and the quality of your work in the class (5%).

Required Texts

Kabacoff R. 2022. R in Action, 3rd Edition. New York: Manning Publications.

Tillman D. 2016. The Book of R: A First Course in Programming and Statistics. San Francisco: No Starch Press.

Tillman available in print or electronic format from No Starch Press and O’Reilly Media; Kabacoff available in print or electronic format from Manning Publications - I recommend using the third edition if you can (you can also find the second edition as a PDF here); both texts are also available at Amazon.com.

Optional Texts Students Find Helpful

Wickham H, Cetinkaya-Rundel M, & Grolemund G. 2023. R for Data Science. Boston: O’Reilly
Kuhn M, Silge J. 2023. Tidy Modeling with R. Boston: O’Reilly
Dalgaard P. 2008. Introductory Statistics with R, 2nd Edition. New York: Springer.
Crawley MJ. 2014. Statistics: An Introduction Using R, 2nd Edition. Chichester, UK: John Wiley & Sons, Inc.

Learning Objectives

By the end of this course, you should:

be familiar with key concepts and methods in applied data science for acquiring and managing data, conducting exploratory data analyses, testing statistical hypotheses, building models to classify and make predictions about data, and evaluating model performance;
have a facility with modern tools for data analysis, (e.g., the Unix command line, version control systems, the R programming environment, web APIs) and be able to apply “best practices” in data science;
know how to interact with both local and remote data sources to store, query, process, and analyze data presented a variety of common formats (e.g., delimited text files, structured text files, various database systems);
be comfortable writing simple computer programs for data management, statistical analysis, visualization, and more specialized applications;
know how to design and implement reproducible data science workflows that take a project from data acquisition to analysis to presentation and be able to organize your work using a version control system;
be able to accurately assess, critique, and reproduce existing published works utilizing public and open source data repositories and analytical techniques;
be able to work as part of an effective team to problem solve and implement effective coding practices towards a group analytical goal;
and be able to apply all of these tools to questions of interest in the natural and social sciences.

BU HUB Learning Outcomes (for enrolled undergraduates)

This course has been accepted to the BU HUB Undergraduate General Education Curriculum as CAS AN/BI 588. It has several proposed Learning Outcomes related to its assigned Hub Capacities, including:

Scientific Inquiry II (SI2)

Learning Outcome 1: Students in this course will learn to identify and apply appropriate methods of statistical inference to test hypotheses related to biological anthropology. This will de done in the R programming framework using published or simulated datasets from a number of scientific literatures, primarily from primate morphology and behavior. Throughout the semester, students will use these tests to appropriately frame and address established hypotheses in biological anthropology using increasingly complex statistical methods, including using t-tests to assess differences in body size between wild and anthropogenically impacted vervet monkey groups, testing hypothesized correlations between female body size and various life history traits across primates using multiple regression, and assessing shifts in male grooming attention towards periovulatory female chimpanzees using generalized linear mixed modeling, among others. This will culminate in a more advanced application of these methods by engaging in the R-based replication of a published paper with open access data in the student’s field of interest, and the development of their own teaching module demonstrating a novel statistical method and how it may be used to test a novel hypothesis in a new dataset. The replication assignment will facilitate a hands-on, critical assessment of how these authors used data processing, manipulation, analyses, and figures to reach their published conclusions. The teaching module will allow students to reflect critically on a novel method and how it might be used (or may be misused) to inform a hypothetical framework of their choosing in a novel dataset.

Quantitative Reasoning II (QR2)

Learning Outcome 1: In this courses, students will learn how to frame questions germane to biological anthropology through the explicit testing of hypotheses using statistical inference. Using the R statistical programming framework, students will explore the underlying logic and mathematics of probability, hypothesis testing, linear modeling and regression, ANOVA, multiple regression, generalized linear modeling and mixed effects modeling, and then use these methods to solve complex problems in biological anthropology with both real and simulated data.
Learning Outcome 2: Online, R-based statistical modules guide students through the development and use of the above methods in numerous datasets drawn from studies of wild primates and museum specimens to test hypotheses central to biological anthropology and evolutionary biology. These questions range from using the poisson distribution to predict the frequency of titi monkey morning duets, to using generalized linear modeling to assess whether male rank has an impact on fitness outcomes in a population of woolly monkeys.
Learning Outcome 3: With every statistical module students will challenged to test established hypotheses using statistical inference. They are then challenged in homework assignments to investigate data structures and formulate and test their own hypotheses using the statistical methods learned in class. This culminates in a replication assignment from the primary literature, in which students critically evaluate the methods and conclusion of a published paper from the primary literature in their field.
Learning Outcome 4: Through the use of GitHub-based shared repositories and the R Markdown language, students will learn to communicate via annotated coding chunks in data reports to explain their logic and choice of statistical coding options to address bi-weekly homework assignments and statistical module challenges. The Peer Commentary of homework assignments requires that students engage with each other to both improve their code and ensure that the symbolic, visual, numerical, and written account of their data processing and analysis are communicated effectively. The final assignments of the class, a group-based homework module teaching a novel statistical test not presented in the class and a replication of an analysis from the primary literature, both encourage students to design legible and engaging representations of the their data analysis process, and to adequately communicate these methods both verbally and visually.
Learning Outcome 5: Through an emphasis on testing the assumptions underlying the statistical tests learned in class, students will learn to recognize and articulate both the capacity and limitations of these methods. This includes modules and discussions testing data distributions and model residuals for normality, using power tests to assess the effects of sample size and variance on parameter estimation, assessing the appropriateness of link functions for different data distributions in GLMMs, and more, complete with demonstrations of what poorly analyzed data looks like, and how that may negatively influence scientific inference.

Intellectual Toolkit: Teamwork/Collaboration (TWC)

Learning Outcome 1: Students will engage in bi-weekly Peer Commentary assignments in which they must collaborate in shared GitHub repositories to share, annotate, and comment on each others’ coding-based homework assignments. These experiences will be supplemented by readings and small tutorials on productive peer coding and annotation practices. Students will share first drafts of homework, commented code by a Peer, and their final code. In their final code, students are encouraged to reflect on how the discussion and comments of their Peers improved their code, and what remained to be solved. Peer groups rotate throughout the course, so that students can identify and learn the traits of effective collaboration and commentary, culminating in choosing a group to develop and design an online teaching module for a novel statistical technique to be taught to the rest of the class. These comments will be critiqued and overseen in each bi-weekly session by the instructor to ensure productive teamwork.
Learning Outcome 2: In both the biweekly Peer Commentary and the final group Module assignment, students will develop and demonstrate an ability to use the tools and strategies of working successfully with a diverse group, based on course readings meant to foster productive peer review and coding practives. In each Module assignment, students will guided towards explicitly assigning roles and responsibilities to individual group members based on strengths identified in the Peer Commentary process. Upon presentation of the Module, in dialogue with the instructor, the class will assess the effectiveness and clarity of the module in teaching the method in question and offer productive suggestions for revision, making the course itself an exercise in teamwork. Both Peer Commentary and final Module assignment asks students to reflect on their role in the collaborative process and what were effective practices using the readings to guide these reflections.

AN/BI 588: Project Design and Statistics
in Biological Anthopology

Christopher A Schmitt
Boston University

January 15, 2025

Faculty Instructor:

Course Outline

Modules

Assignments

Resources

Policies

Insurmountable Coding Problems

Course Description

Prerequisites

Course Format

Assessment

Required Texts

Optional Texts Students Find Helpful

Learning Objectives

BU HUB Learning Outcomes (for enrolled undergraduates)

Scientific Inquiry II (SI2)

Quantitative Reasoning II (QR2)

Intellectual Toolkit: Teamwork/Collaboration (TWC)

AN/BI 588: Project Design and Statistics in Biological Anthopology

Christopher A SchmittBoston University

January 15, 2025

Faculty Instructor:

Course Outline

Modules

Assignments

Resources

Policies

Insurmountable Coding Problems

Course Description

Prerequisites

Course Format

Assessment

Required Texts

Optional Texts Students Find Helpful

Learning Objectives

BU HUB Learning Outcomes (for enrolled undergraduates)

Scientific Inquiry II (SI2)

Quantitative Reasoning II (QR2)

Intellectual Toolkit: Teamwork/Collaboration (TWC)

AN/BI 588: Project Design and Statistics
in Biological Anthopology

Christopher A Schmitt
Boston University