Course Overview

Syllabus

Exam

Policies

Final Project


Pre-Module: BU Shared Computing Cluster Tutorial

September 11


Material covered: Introduction to and tutorial on using the BU Shared Computing Cluster, via Linux-based SCC.

Readings: None.

Activities: We will create personal profiles using the SCC interface with the help of a representative from Research Computing. We’ll learn how to connect to SCC and some basic commands that will help us navigate the interface and access analytical software that will be used in the course.

Assignment: In-class worksheet based on activities will be graded.

Learning Outcomes:

Please download associated materials on Blackboard

RCS Tutorials: Research Computing Services offers many helpful (free) tutorials during the month of September that may make a huge difference for how well and quickly you are able to learn this material. I strongly recommend the tutorials Introduction to BU’s Shared Computing Cluster and Introduction to R (although they are not required for the course), and the remainder I recommend if you would like to learn more about the systems we’ll be working with:


Module 1: Accessing Human Candidate Gene Region DataACE2 and TMPRSS2

September 18


Material covered: Introduction to the 1000 Genomes Project dataset, and tutorial on using Ensembl to access the 1000 Genomes dataset. For illustrative purposes, we’ll focus on both the angiotensin-converting enzyme 2 (ACE2) gene and the transmembrane serine protease 2 (TMPRSS2) gene, each of which code for the key receptors the coronavirus SARS-CoV-2 uses to enter cells, leading to the disease known as COVID-19.

Readings:
Activities: We’ll learn how to use the Ensembl database to navigate our candidate genes, ACE2 and TMPRSS2, and find more information about them. Each student will be assigned a single 1000 Genomes sub-population that they will look at over the course of the modules, and we will use the Data Slicer within Ensembl to download data for each gene from those populations into our SCC accounts.

Assignment: Students must turn in a homework assignment – with questions related to ACE2 and TMPRSS2 variation in humans and related to the downloaded dataset – the following Friday Pre-Module Homework Assignment is due today.

Learning Outcomes:

Homework for Module 1: DUE Friday, September 25th at 5:00 pm


Module 2: ACE2/TMPRSS2 Variants and Hardy-Weinberg Equilibrium

September 25


Material covered: Using R and RStudio via the SCC to run pre-written code that will perform our analyses. Assessing allelic variation in SNPs within and across populations. Testing Hardy-Weinberg equilibrium (HWE) and understanding what it means if violated, which involves knowing the assumptions of the model. Using downloaded candidate region data from 1000 Genomes Project to assess HWE in living human populations using a Chi-Squared test. Using Ensembl to obtain genotype count information in order to use the Wigginton and Cutler method of HWE calculation on selected SNPs.

Readings:
Activities: We will use the R coding language to test HWE in the dataset on ACE2 we downloaded from Ensembl. We will assess whether or not SNPs in this genomic region are in Hardy-Weinberg equilibrium based on a Chi-Squared test in assigned human populations. We will then re-test selected SNPs using the “True HWE” method described in Wigginton and Cutler. We will then discuss what our results mean, in accordance with what we know about those populations, HWE, and the effects of these ACE2 variants on disease expression.

Assignment: Students must turn in a worksheet – with questions related to ACE2 variation in humans and related to the downloaded dataset – in class the following Monday. Module 1 Homework Assignment is due today.

Learning Outcomes:

Homework for Module 2: DUE Friday, October 2nd at 5:00 pm


Module 3: Linkage Disequilibrium (LD) in ACE2 and TMPRSS2

October 16


Material covered: In this module, we’ll be assessing linkage disequilibrium (LD) in the ACE2 genomic regions of the 1000 Genomes populations using R coding language. We’ll also work on calculating LD by hand between two known loci in ACE2. All of this will help us work towards understanding factors that increase LD in the human genome.

Readings:
Activities: We’ll assess linkage disequilibrium across the ACE2 and TMPRSS2 loci using our datasets downloaded from Ensembl, with a focus on the SNPs that are defined in Wooster et al. (2020) and Cheng et al. (2015). We will then discuss what high linkage disequilibrium in our populations could mean regarding potential for selection having occurred within our populations.

Assignment: Students must turn in homeowork – with questions related to ACE2 and TMPRSS2 variation in humans and related to the downloaded dataset – online by the following Friday. Module 2 Homework Assignment is due today.

Learning Outcomes:

Homework for Module 3: DUE Monday, October 26th


Module 4: Introduction to Nearest-Neighbor Joining and Phylogenetics

October 30th


Material covered: In this module, we’ll be using Nearest-Neighbor Joining to see which individuals within our assigned 1000 Genomes populations are most related to each other (at least insofar as ACE2 variation indicates). We’ll also plot these phylogenetic trees to better understand the patterns of molecular variation and amount of diversity in ACE2 within our populations.

Readings:


Activities: We’ll learn how to create a phylogenetic tree with simple neighbor-joining methods using the ape package in R. We’ll then learn how to make a tree from multiple populations, which will allow us to compare those different populations’ structures in a qualitative way. We will also use these phylogenetic trees to assess the diversity of ACE2 in each population, and discuss what that means.

Assignment: Students must turn in a worksheet – with questions related to ACE2 variation in humans and related to the downloaded dataset – in class the following Friday.

Learning Outcomes:

Homework for Module 4: DUE Friday, November 6th


Module 5: Introduction to Neutrality Statistics and Signs of Selection

November 6th


Material covered: This module is an introduction to statistical tests of neutrality that can be used in genomic studies. Tajima’s D, Fu and Li’s D and F, and iHS scores will be covered and discussed. We’ll work towards understanding what each of these tests do to measure selection, and what these statistics can tell us about population structure and history.

Readings:
Activities: We’ll use the packages PopGenome and pegas to cauclate Fu and Li’s D and F, as well as Tajima’s D for the severe COVID-19 susceptibility region (3p21.3) identified by The Severe COVID-19 GWAS Group in our respective 1000 Genomes populations. We’ll also look for positive selective sweeps in each of our 1000 Genomes populations using iHS score and EHH using the rehh package. Finally, we will take ample time to understand what the Fu and Li’s D and F and Tajima’s D test results tell us about how our populations are evolving, and use the example of iHS to predict whether or not our populations underwent a selective sweep in this susceptibility region.

Assignment: Students must turn in a homework assignment – with questions related to this module and associated readings in humans and related to the downloaded dataset – in class the following Friday. Module 4 Homework Assignment is due today.

Learning Outcomes:

Homework for Module 5: DUE Friday, November 13th


Module 6: A Brief Digression from COVID-19 for Quantitative Genetics

November 20


Material covered: Quantitative genetics and partitioning variance in phenotypes between genetic and environmental signals. To do this, we’ll be working with some captive vervet monkey (Chlorocebus sabaeus) data I collected at Wake Forest College of Medicine’s Vervet Research Colony (VRC). We’ll also learn about the SOLAR work environment, which is a (relatively) easy interface for doing quantitative genetics analysis in the SCC space. Through this, we’ll learn a bit about the quantitative genetics of BMI and body mass.

Readings:
Activities: There will be a brief discussion of quantitative genetics and the vervet monkey (Chlorocebus sabaeus) model as implemented in SOLAR using the Almasy & Blangero terminology and orientation to using SOLAR in the SCC environment. We will conduct in class exercises that will be used to answer questions in the Module 6 homework.

Assignment: Students must turn in a worksheet – with questions related to quantitative genetic variation in vervets related to BMI, body mass, and obesity – in class the following Friday. Module 5 Homework Assignment is due today.

Learning Outcomes:

Homework for Module 6: DUE Friday, November 27th


Module 7: Finding a New Locus…

November 02


Material covered: We’ll discuss the process of finding a new locus on which to conduct a population genetics study for your final project! Make sure the locus is related somehow to a trait you’re really interested in, and also preferably a trait that varies among contemporary human populations. A candidate gene for a particular trait would be great (i.e., a gene that’s been noted to perhaps be associated with a trait in a GWAS, but hasn’t really been tested across populations that vary in that trait). A trait that’s been noted to have been under selection previously would be great, but is by no means necessary.

Readings:
Activities: We will go over a brief tutorial on how to think about finding a new locus of interest to study on your own, and we will have a class discussion on final project topic ideas.

Assignment: Students must choose a gene of interest for their final project by the following Friday, November 13. Learning Outcomes: