Module 3: Homework

Introduction to Linkage Disequilibrium (LD)

Readings:

Wooster L, Nicholson CJ, Sigurslid HH, Lino Cardenas CL, Malhotra R. Preprint accessed 31AUG20. Polymorphisms in the ACE2 locus associate with severity of COVID-19 infection. medRxiv doi:10.1101/2020.06.18.20135152

Cheng Z., et al. 2015. Identification of TMPRSS2 as a susceptibility gene for severe 2009 pandemic A(H1N1) influenza and A(H7N9) influenza. The Journal of Infectious Diseases, 212(8):1214–1221.

Slatkin M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nat Rev Genet 9: 477-485.

Zeberg H, Pääbo S. 2020. The major genetic risk factor for severe COVID-19 is inherited from Neanderthals. Nature 2998

Claiborne Stephens J, Schneider JA, Tanguay DA, Choi J, Acharya Y, Stanley SE, Jiang R, et al. 2001. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293: 489-93.

For our homework, we’ll be digging a little further into linkage disequilibrium (LD). Now, this is a sometimes difficult concept to grasp (even I needed a refresher in order to teach it), so we’ll be focusing on the assigned readings in this homework to hopefully working through this concept and fill any gaps in understanding left by leture and the lab.

As before, you’ll be turning in your homework via an online interface. I recommend writing your essays in a document on your laptop and only engaging with the online interface when you’re ready to turn in all of your completed answers.

Access the online interface to submit your homework answers here.

This homework will be due before class (1:25pm) on Monday, October 26th.

According to Slatkin (2008), what is the definition of linkage disequilibrium (LD)?
Slatkin (2008) differentiates between what LD can tell at the genomic scale (i.e., across the genome), and at the gene-region scale (i.e., within a gene). What different information are these two types of LD telling us?
We already know what HWE (and deviations from it) is telling us about a particular locus. According to Slatkin (2008), how is LD telling us both similar - and different - things about the loci we’re investigating?
What is the effect of genetic drift (like population bottlenecks) on LD? Why is this important to for understanding human evolutionary history?
According to the Claiborne Stephens et al. (2001) article why are rare variants (with low MAF) important to understanding recent population processes?
According to the readings, will a haplotype be a region with high or low mutual LD between all the loci that comprise it?
According to the readings, are loci that are physically close to each other (i.e., only a few base pairs away) more likely to be in LD than loci that are further away from each other on the chromosome? Why might this be?
Based on the D’ LD heatmap you created in Module 3 for your 1000 Genomes population, which SNPs are linked? Please take a screenshot or save a PNG or PDF file of your heatmap and upload it as part of your answer. NOTE: If you’re having trouble interpreting the LD heatmaps, this link can help!
Take a look at the LD matrix you created in the last portion of Module 3. List the linkage block that each SNP of interest is a part of, and which SNPs are in LD with each other among our four SNPs of interest.
In the Zeberg & Pääbo (2020) paper, they focus on a linkage block around an index SNP rs35044562, the minor allele of which has been found to be associated with severe COVID-19 symptoms. Which genomic region is associated with increased severity of COVID-19 (please give a range in Mb)? Name the genes found in this region.
According to Zeberg & Pääbo (2020), what is evolutionarily interesting about 1) the haplotype (or linkage block) containing the risk allele, and 2) the populations that have this particular haplotype?

Extra Questions for Graduate-Level Students

The following questions are for those students registered for AN 733. Students registered for AN 333 may also answer these questions for extra credit.

According to Claiborne Stephens et al. (2001), what might explain their findings regarding LD across the human genome, and how does that have bearing on the discovery of SNPs associated with various phenotypes of interest?
What is the Hill-Robertson effect? According to Slatkin (2008), why is this an important phenomenon to understand and take into account when inferring the evolution of gene function?
What effect does selection have on LD? How might you use LD to detect natural selection in a genomic locus?
Create a VCF file of SLC6A20 for your 1000 Genomes population, and re-run the linkage analysis from Module 3 with the index SNP rs35044562 in mind, using D’. Is it in linkage with our SNP of interest from the midterm, rs202158371? How about the risk allele for severe COVID-19 in Italian and Spanish populations, rs11385942?

Module 3: Homework

Christopher A Schmitt

October 22, 2020

Introduction to Linkage Disequilibrium (LD)

Access the online interface to submit your homework answers here.

Extra Questions for Graduate-Level Students