1. Describe the format of the PED and MAP file formats from plink’s documentation

  2. Use R to create a PED/MAP dataset for \(N = 500\) cases + \(N = 500\) controls and \(n = 2\) SNPs so that

    1. sex is chosen randomly
    2. each individual is randomly assigned to one of two separate subpopulations \(A\) or \(B\) (aka stratification)
    3. the first SNP conforms to the Hardy-Weinberg principle in the whole population \(A \cup B\) with a unique allele frequency \(f = 0.3\)
    4. the second SNP conforms does conform to the HW principle in each separate subpopulation \(f_A = 0.3\) and \(f_B = 0.6\)
  3. Compute both the observed and the expected genotype frequencies: the Wahlund effect should manifest through a reduction in heterozygosity for the second polymorphism but not for the first one

Install plink v1.9 either

  • on your computer (choose the relevant stable version from plink’s page) or
  • on JupyterHub (see infra)
  1. Using plink, compute the allele frequencies of the two SNPs and import the data into your R session as a frq variable.

  2. Using plink, compute the Hardy-Weinberg test statistic and import the results into R. In your R session, filter the results to only keep the statistics regarding the whole population \(A \cup B\).

(Note that the alleles as well as the chromosome location are irrelevant for this application so make arbitrary choices.)