Introduction to Hardy-Weinberg Equilibrium (using Human Genome Data)


Hardy-Weinberg Equilibrium in Population Genetics


In modern population genetics studies, Hardy-Weinberg Equilibrium has a slightly different meaning than what we have talked about so far in class. A population that is in true Hardy-Weinberg Equilibrium has to fit five conditions:

  • the population must be extremely large (no genetic drift)
  • it must be isolated from other populations (no gene flow)
  • there must be no mutations
  • there must be random mating (no assortative mating or sexual selection)
  • there must be no natural selection

Obviously, holding a population of any species to this standard is pretty much impossible, so we will use Hardy-Weinberg differently to understand real populations (like humans) in our own population genetics study.

You will often see Hardy-Weinberg Equilibrium being used in modern population genetics to simply measure whether the observed genotype frequencies of a specific SNP or microsatellite are the same as the estimated genotype frequencies. Population geneticists will often use this measure of deviance as a “starting point,” so to speak, for their studies. They use Hardy-Weinberg as their starting point for a few reasons; to check the genotype quality of their data, to get a general sense of the amount of allelic variability and selection happening at specific loci, or, most commonly, as an assumption about the frequency of disease causing genotypes in their population. The assigned Rose et al. (2011) paper uses Hardy-Weinberg in the latter manner, as a check to make sure there was no selection happening at their two loci of interest.

In order to determine how much the observed phenotype deviates from the expected phenotype, we’ve been using a Chi-Squared Goodness-of-Fit test, the output of which is a P-value that will tell you how much the observed genotype frequency deviates from the expected frequency. However, we learned from the Wigginton reading that the Chi-Squared test has a tendency to give false positives, meaning it will tell us something is significantly different when it’s actually not; this is also called a Type I Error). In their paper, they propose a new mathematical method of calculating the “True” Hardy-Weinberg value, which fixes the problem of false positives. We will not need to get in to the specifics of the equation to calculate the “True” Hardy-Weinberg Equilibrium, but you will be calculating your population’s “True” Hardy-Weinberg Equilibrium as well as using a traditional Chi-Squared test.

Now that we have learned more about the applications of Hardy-Weinberg Equilibrium in modern population genetics, let’s get to the lab!

Learning Outcomes

  • Use the SCC and R coding language to observe and understand population differences in UCP1 variation.

  • Calculate Hardy-Weinberg Equilibrium for all UCP1 SNPs in individual populations using a traditional Chi-Squared test.

  • Perform a check on all SNPs not in Hardy-Weinberg Equilibrium by calculating “True” Hardy-Weinberg with the built-in Shiny App.

  • Research the consequence types of these SNPs in order to better understand how these SNPs might affect the genome itself, and also how they might affect subsequent phenotypes.

  • Calculate the “True” Hardy-Weinberg Equilibrium using the Shiny App for A-3826G (SNP ID rs1800592), an upstream SNP that has known phenotypic consequences, and determine what the Hardy-Weinberg Equilibrium says about the population.

Step 1: Accessing R Studio and your Data


The first thing we need to do is access R Studio within the SCC environment. If you remember from the tutorial, there are a few steps you need to do in order to log in.

module load gcc #we need this module to load R packages
module load R
rstudio & #this will launch R Studio

Step 2: Calculating Hardy-Weinberg Equilibrium Using Chi-Squared


Now that we are in R Studio and have our data in place in our working directory (from Lab 1), we can get started. The first thing we need to do here is install two packages.

In R space, packages are essentially bundles of functions that R users have written to perform analyses that do not come with the base program. In this class we will be using packages that were designed specifically for population genetics analysis. To install these packages, run these two lines of code in the console:

install.packages("vcfR")
install.packages("pegas")

We only need to install these packages once, but we need to load them in to our workspace every time we open the program. To do this, use the following code:

library(vcfR)
library(pegas)

Now that we have all our packages loaded, we can get to the analysis! In the “files” tab in the bottom left quadrant, you should see the VCF files we downloaded in your folder. We will be working with the larger VCF file today. We will need to load this file in to our workspace much like we did with our R packages. Unlike the packages, however, we need to give a name to the file. To make things simple, it would be a good idea to name this fiile in accordance with the acronym of your population. I will use the YRI population as an example. Use one of the two options below to enter your VCF data into the R space:

#If your file is zipped (e.g., ends in '.vcf.gz') then use this code:

YRI <- read.vcfR("YRI.vcf.gz", verbose = TRUE)
#If your file is unzipped (e.g., ends in just '.vcf') then use this code:

YRI <- read.vcfR("YRI.vcf"", verbose = TRUE, nrows = 206)

This will give you a file in your workspace called a vcfR file. Most genetics-based packages have their own way of formatting and working with genetic data, and this is the format R can use. The next line of code will seem a little counterintuitive, because we have to change the format of the file again to work with functions in the pegas package. Doing this will not delete the original file (as long as you name the file something different), so we can use the vcfR file again in later analyses. The type of file we have to convert to for pegas is called “genind,” so we will do that here:

pegas.YRI <- vcfR2genind(YRI, sep = "[|/]")

Finally, now that we have our data in the format that we need, we can do our Hardy-Weinberg tests. This function will test whether all of your population’s SNPs in the coding region of UCP1 are in Hardy-Weinberg Equilibrium. As an example, I will run the funtion with my pegas.YRI file. Remember to save your results by assigning them a name:

HWE <- hw.test(pegas.YRI, B = 0) #B MUST equal 0 here, changing this value will affect the result!
HWE #Running the name of your results file will print the results
##                    chi^2 df Pr(chi^2 >)
## rs577971821 0.000000e+00  0  1.00000000
## rs538981371 0.000000e+00  0  1.00000000
## rs558086468 0.000000e+00  0  1.00000000
## rs201346519 1.031052e+00  1  0.30991183
## rs79001246  1.211511e-01  1  0.72778940
## rs142885075 0.000000e+00  0  1.00000000
## rs146114327 2.336398e-03  1  0.96144822
## rs142145526 0.000000e+00  0  1.00000000
## rs543991038 0.000000e+00  0  1.00000000
## rs562565974 6.064554e-02  1  0.80547854
## rs117949489 0.000000e+00  0  1.00000000
## rs551356095 2.336398e-03  1  0.96144822
## rs145918476 0.000000e+00  0  1.00000000
## rs527960564 0.000000e+00  0  1.00000000
## rs78724285  8.816327e-02  1  0.76652530
## rs6536991   8.716426e-04  1  0.97644700
## rs538400756 0.000000e+00  0  1.00000000
## rs550015648 0.000000e+00  0  1.00000000
## rs571666957 0.000000e+00  0  1.00000000
## rs726989    3.513099e+00  1  0.06088545
## rs145573095 9.433138e-03  1  0.92262765
## rs5862487   1.070577e+01  3  0.01342810
## rs182883393 0.000000e+00  0  1.00000000
## rs534102407 0.000000e+00  0  1.00000000
## rs187220083 0.000000e+00  0  1.00000000
## rs192528115 0.000000e+00  0  1.00000000
## rs544349406 0.000000e+00  0  1.00000000
## rs148892327 0.000000e+00  0  1.00000000
## rs530434699 0.000000e+00  0  1.00000000
## rs2043124   3.513099e+00  1  0.06088545
## rs544637184 0.000000e+00  0  1.00000000
## rs537726509 0.000000e+00  0  1.00000000
## rs117858590 0.000000e+00  0  1.00000000
## rs35243591  3.844785e-02  1  0.84454659
## rs568352524 2.336398e-03  1  0.96144822
## rs549281392 0.000000e+00  0  1.00000000
## rs561587316 0.000000e+00  0  1.00000000
## rs2043123   3.513099e+00  1  0.06088545
## rs550345422 0.000000e+00  0  1.00000000
## rs571684831 0.000000e+00  0  1.00000000
## rs56220366  1.072646e+00  1  0.30034851
## rs547884746 0.000000e+00  0  1.00000000
## rs183105785 0.000000e+00  0  1.00000000
## rs143604530 0.000000e+00  0  1.00000000
## rs567550714 0.000000e+00  0  1.00000000
## rs76122323  0.000000e+00  0  1.00000000
## rs556306518 0.000000e+00  0  1.00000000
## rs147980705 0.000000e+00  0  1.00000000
## rs553417300 0.000000e+00  0  1.00000000
## rs142396002 0.000000e+00  0  1.00000000
## rs140138182 0.000000e+00  0  1.00000000
## rs2270565   3.844785e-02  1  0.84454659
## rs148598275 0.000000e+00  0  1.00000000
## rs142052143 0.000000e+00  0  1.00000000
## rs565496720 0.000000e+00  0  1.00000000
## rs532718970 0.000000e+00  0  1.00000000
## rs202197292 0.000000e+00  0  1.00000000
## rs528186894 0.000000e+00  0  1.00000000
## rs566134493 0.000000e+00  0  1.00000000
## rs144659598 0.000000e+00  0  1.00000000
## rs1494808   6.656483e-02  1  0.79640566
## rs114470895 9.433138e-03  1  0.92262765
## rs538120229 0.000000e+00  0  1.00000000
## rs556153775 0.000000e+00  0  1.00000000
## rs192632180 0.000000e+00  0  1.00000000
## rs377019499 9.433138e-03  1  0.92262765
## rs538406188 0.000000e+00  0  1.00000000
## rs553454250 0.000000e+00  0  1.00000000
## rs571804645 0.000000e+00  0  1.00000000
## rs542254778 0.000000e+00  0  1.00000000
## rs184911762 0.000000e+00  0  1.00000000
## rs138611550 0.000000e+00  0  1.00000000
## rs544110432 0.000000e+00  0  1.00000000
## rs188223067 0.000000e+00  0  1.00000000
## rs577520173 0.000000e+00  0  1.00000000
## rs200874530 0.000000e+00  0  1.00000000
## rs559928395 0.000000e+00  0  1.00000000
## rs371409039 0.000000e+00  0  1.00000000
## rs548319096 0.000000e+00  0  1.00000000
## rs192333501 0.000000e+00  0  1.00000000
## rs530774350 0.000000e+00  0  1.00000000
## rs201976256 0.000000e+00  0  1.00000000
## rs201558037 0.000000e+00  0  1.00000000
## rs538491644 0.000000e+00  0  1.00000000
## rs549815493 0.000000e+00  0  1.00000000
## rs536022160 0.000000e+00  0  1.00000000
## rs547179592 0.000000e+00  0  1.00000000
## rs565467296 0.000000e+00  0  1.00000000
## rs535964691 0.000000e+00  0  1.00000000
## rs554334922 0.000000e+00  0  1.00000000
## rs575998718 0.000000e+00  0  1.00000000
## rs536598972 0.000000e+00  0  1.00000000
## rs559250260 0.000000e+00  0  1.00000000
## rs12502572  6.656483e-02  1  0.79640566
## rs541682474 0.000000e+00  0  1.00000000
## rs559962748 0.000000e+00  0  1.00000000
## rs575055039 0.000000e+00  0  1.00000000
## rs141663630 9.433138e-03  1  0.92262765
## rs563588752 0.000000e+00  0  1.00000000
## rs530936498 0.000000e+00  0  1.00000000
## rs76031700  1.211511e-01  1  0.72778940
## rs564876296 0.000000e+00  0  1.00000000
## rs532257668 0.000000e+00  0  1.00000000
## rs150658745 0.000000e+00  0  1.00000000
## rs565553947 0.000000e+00  0  1.00000000
## rs536021363 0.000000e+00  0  1.00000000
## rs548271310 0.000000e+00  0  1.00000000
## rs569650793 0.000000e+00  0  1.00000000
## rs188940594 0.000000e+00  0  1.00000000
## rs558574177 0.000000e+00  0  1.00000000
## rs566209912 3.513099e+00  1  0.06088545
## rs577642090 0.000000e+00  0  1.00000000
## rs535359138 0.000000e+00  0  1.00000000
## rs58739151  1.072646e+00  1  0.30034851
## rs575093372 2.336398e-03  1  0.96144822
## rs115099464 2.142432e-02  1  0.88362892
## rs11932232  3.513099e+00  1  0.06088545
## rs575639599 0.000000e+00  0  1.00000000
## rs139894150 2.142432e-02  1  0.88362892
## rs564161905 2.336398e-03  1  0.96144822
## rs185601174 0.000000e+00  0  1.00000000
## rs149419054 0.000000e+00  0  1.00000000
## rs559176907 0.000000e+00  0  1.00000000
## rs1430583   3.513099e+00  1  0.06088545
## rs144354281 3.833648e+00  1  0.05023348
## rs548075978 0.000000e+00  0  1.00000000
## rs145372470 0.000000e+00  0  1.00000000
## rs530503560 0.000000e+00  0  1.00000000
## rs552271035 0.000000e+00  0  1.00000000
## rs147250793 0.000000e+00  0  1.00000000
## rs189312305 0.000000e+00  0  1.00000000
## rs13128910  0.000000e+00  0  1.00000000
## rs568656079 0.000000e+00  0  1.00000000
## rs536085043 0.000000e+00  0  1.00000000
## rs180868482 0.000000e+00  0  1.00000000
## rs140779087 2.142432e-02  1  0.88362892
## rs545998612 0.000000e+00  0  1.00000000
## rs6822807   1.335078e-02  1  0.90801270
## rs572906198 0.000000e+00  0  1.00000000
## rs540230710 0.000000e+00  0  1.00000000
## rs186534214 0.000000e+00  0  1.00000000
## rs529933317 0.000000e+00  0  1.00000000
## rs6818140   3.513099e+00  1  0.06088545
## rs563362938 0.000000e+00  0  1.00000000
## rs142354839 0.000000e+00  0  1.00000000
## rs72943535  1.072646e+00  1  0.30034851
## rs570514022 0.000000e+00  0  1.00000000
## rs528137782 0.000000e+00  0  1.00000000
## rs546609310 0.000000e+00  0  1.00000000
## rs568769182 0.000000e+00  0  1.00000000
## rs368381285 0.000000e+00  0  1.00000000
## rs560294200 0.000000e+00  0  1.00000000
## rs79246635  0.000000e+00  0  1.00000000
## rs539599064 0.000000e+00  0  1.00000000
## rs557878166 0.000000e+00  0  1.00000000
## rs527512114 0.000000e+00  0  1.00000000
## rs533967364 0.000000e+00  0  1.00000000
## rs555457768 0.000000e+00  0  1.00000000
## rs190126099 0.000000e+00  0  1.00000000
## rs2071416   9.226991e-01  1  0.33676726
## rs181061571 0.000000e+00  0  1.00000000
## rs7688743   3.531150e-01  1  0.55235508
## rs546015639 0.000000e+00  0  1.00000000
## rs564208960 2.336398e-03  1  0.96144822
## rs528178499 0.000000e+00  0  1.00000000
## rs377616609 0.000000e+00  0  1.00000000
## rs45489596  3.844785e-02  1  0.84454659
## rs200660278 0.000000e+00  0  1.00000000
## rs550995710 0.000000e+00  0  1.00000000
## rs377137285 0.000000e+00  0  1.00000000
## rs539685692 0.000000e+00  0  1.00000000
## rs143645398 0.000000e+00  0  1.00000000
## rs185921413 0.000000e+00  0  1.00000000
## rs138241273 0.000000e+00  0  1.00000000
## rs555800104 0.000000e+00  0  1.00000000
## rs79610439  0.000000e+00  0  1.00000000
## rs375694859 0.000000e+00  0  1.00000000
## rs45539933  5.090844e-01  1  0.47553547
## rs141520915 2.336398e-03  1  0.96144822
## rs200693427 0.000000e+00  0  1.00000000
## rs150886806 0.000000e+00  0  1.00000000
## rs573045664 0.000000e+00  0  1.00000000
## rs539961851 0.000000e+00  0  1.00000000
## rs114928132 6.064554e-02  1  0.80547854
## rs539493010 2.142432e-02  1  0.88362892
## rs557718796 0.000000e+00  0  1.00000000
## rs528929543 0.000000e+00  0  1.00000000
## rs550542331 0.000000e+00  0  1.00000000
## rs190380413 3.844785e-02  1  0.84454659
## rs573033816 1.540520e+00  1  0.21454049
## rs373376893 0.000000e+00  0  1.00000000
## rs551633105 0.000000e+00  0  1.00000000
## rs566741523 0.000000e+00  0  1.00000000
## rs183997397 0.000000e+00  0  1.00000000
## rs2071415   2.044174e+00  1  0.15278991
## rs567743945 0.000000e+00  0  1.00000000
## rs188981515 0.000000e+00  0  1.00000000
## rs556504882 0.000000e+00  0  1.00000000
## rs569203186 0.000000e+00  0  1.00000000
## rs539766026 0.000000e+00  0  1.00000000
## rs558049733 0.000000e+00  0  1.00000000
## rs573078239 0.000000e+00  0  1.00000000
## rs150067245 0.000000e+00  0  1.00000000
## rs555213120 0.000000e+00  0  1.00000000
## rs200045723 0.000000e+00  0  1.00000000
## rs73856656  1.494810e+00  1  0.22147171
## rs144172393 1.211511e-01  1  0.72778940

What you’ll get is this table. In your TERMINAL space, open the text editor program gedit:

gedit &

Once it’s open, go back to your R Studio space and scroll over your results table to highlight them. Copy them using ‘ctrl C’, paste them into gedit, and save them as a file named “YRI.HWE.txt” (but with your population name instead of YRI). Now, you’ll have a saved copy of the results of your HWE analysis of all the variants in UCP1 for your population in your workspace!

If you’ve got the text file saved, you can close R Studio and complete the rest of this exercise using the analysis results in the text file. You can look at it either in gedit:

gedit YRI.HWE.txt &

Or in the terminal space itself using the less command (don’t forget, to leave the less output, press ‘q’):

less YRI.HWE.txt

Now, on the right of your table is a list of P-values. Look over your P-values to see if any of them are significant (p < 0.05). If any of your P-values are significant, this means that that SNP is not in Hardy-Weinberg Equilibrium! Record the SNP ID numbers that have significant P-values, because they will be important for what we do next.

Hopefully, some of you have found that there are SNPs in your population that are not in Hardy-Weinberg Equilibrium. That’s exciting! But, think back to the Wigginton paper. There is a large Type 1 Error rate for Hardy-Weinberg tests performed by using a Chi-Squared test. To account for this, we will test our significant SNPs for Type 1 Error using the Shiny App calculator Becca De Camp created for us. Here’s how we will do that:

Step 3: Finding the “True” Hardy-Weinberg Equilibrium


No more coding will happen in this lab tutorial. Instead, you will be interacting with a user interface on this webpage that was created to calculate “true” Hardy-Weinberg Equilibrium P-values for one SNP at a time. The input for this UI will be the genotype frequencies from your population (i.e. the number of people from the population with the heterozygous genotype, the number of people with one homozygous genotype, etc.). You will find this information on Ensembl by doing the following:




Record your results, and make sure to record the SNP ID of each SNP you are testing. The last thing we’ll do is think about what the calculations you did today mean.

Step 4: What Do Your Results Mean? Discuss with a partner from class:

Think about the results you produced today in the context of your population. Here are some guiding questions to help you:

PRO-TIP: HWE Using VCF Tools

You can actually run the same analysis we just did using a single line of code in vcftools, thanks to the hardy option.

If you’d like to try, load the vcftools module:

module load vcftools

Now, enter the following code. This is essentially telling vcftools to run the “true” HWE test on each variant site, and to give the output as a file called “YRI.hwe”:

vcftools --vcf YRI.vcf --hardy --out YRI

Once it’s run, you can read the output file using the less command:

less YRI.hwe


Notice that the output is giving us both more and less information than our analysis in R Studio. Rather than the variant name, we’re given just the variant position (‘POS’) on which chromosome (‘CHR’). We’re also, however, given the observed (‘OBS(HOM1/HET/HOM2)’) and expected (‘E(HOM1/HET/HOM2)’) counts of individuals under “true” HWE with each genotype, along with the “true” HWE Chi-Squared (‘ChiSq_HWE’) and P-values (‘P_HWE’). It’s also giving us P-values regarding whether there is a significant deficit (‘P_HET_DEFICIT’) or excess (‘P_HET_EXCESS’) in heterozygotes, which can tell us a bit more about why the population is out of Hardy-Weinberg equilibrium (we’ll learn more about this in class).