For our homework, we’ll be digging a little further into linkage disequilibrium (LD). Now, this is a sometimes difficult concept to grasp (even I needed a refresher in order to teach it), so we’ll be focusing on the assigned readings in this homework to hopefully working through this concept and fill any gaps in understanding left by leture and the lab.
As before, you’ll be turning in your homework via an online interface. I recommend writing your essays in a document on your laptop and only engaging with the online interface when you’re ready to turn in all of your completed answers.
This homework will be due before class (1:25pm) on Monday, October 26th.
According to Slatkin (2008), what is the definition of linkage disequilibrium (LD)?
Slatkin (2008) differentiates between what LD can tell at the genomic scale (i.e., across the genome), and at the gene-region scale (i.e., within a gene). What different information are these two types of LD telling us?
We already know what HWE (and deviations from it) is telling us about a particular locus. According to Slatkin (2008), how is LD telling us both similar - and different - things about the loci we’re investigating?
What is the effect of genetic drift (like population bottlenecks) on LD? Why is this important to for understanding human evolutionary history?
According to the Claiborne Stephens et al. (2001) article why are rare variants (with low MAF) important to understanding recent population processes?
According to the readings, will a haplotype be a region with high or low mutual LD between all the loci that comprise it?
According to the readings, are loci that are physically close to each other (i.e., only a few base pairs away) more likely to be in LD than loci that are further away from each other on the chromosome? Why might this be?
Based on the D’ LD heatmap you created in Module 3 for your 1000 Genomes population, which SNPs are linked? Please take a screenshot or save a PNG or PDF file of your heatmap and upload it as part of your answer. NOTE: If you’re having trouble interpreting the LD heatmaps, this link can help!
Take a look at the LD matrix you created in the last portion of Module 3. List the linkage block that each SNP of interest is a part of, and which SNPs are in LD with each other among our four SNPs of interest.
In the Zeberg & Pääbo (2020) paper, they focus on a linkage block around an index SNP rs35044562, the minor allele of which has been found to be associated with severe COVID-19 symptoms. Which genomic region is associated with increased severity of COVID-19 (please give a range in Mb)? Name the genes found in this region.
According to Zeberg & Pääbo (2020), what is evolutionarily interesting about 1) the haplotype (or linkage block) containing the risk allele, and 2) the populations that have this particular haplotype?
The following questions are for those students registered for AN 733. Students registered for AN 333 may also answer these questions for extra credit.
According to Claiborne Stephens et al. (2001), what might explain their findings regarding LD across the human genome, and how does that have bearing on the discovery of SNPs associated with various phenotypes of interest?
What is the Hill-Robertson effect? According to Slatkin (2008), why is this an important phenomenon to understand and take into account when inferring the evolution of gene function?
What effect does selection have on LD? How might you use LD to detect natural selection in a genomic locus?
Create a VCF file of SLC6A20 for your 1000 Genomes population, and re-run the linkage analysis from Module 3 with the index SNP rs35044562 in mind, using D’. Is it in linkage with our SNP of interest from the midterm, rs202158371? How about the risk allele for severe COVID-19 in Italian and Spanish populations, rs11385942?