Describe the format of the PED and MAP file formats from plink’s documentation
Use R to create a PED/MAP dataset for \(N = 500\) cases + \(N = 500\) controls and \(n = 2\) SNPs so that
- sex is chosen randomly
- each individual is randomly assigned to one of two separate subpopulations \(A\) or \(B\) (aka stratification)
- the first SNP conforms to the Hardy-Weinberg principle in the whole population \(A \cup B\) with a unique allele frequency \(f = 0.3\)
- the second SNP conforms does conform to the HW principle in each separate subpopulation \(f_A = 0.3\) and \(f_B = 0.6\)
Compute both the observed and the expected genotype frequencies: the Wahlund effect should manifest through a reduction in heterozygosity for the second polymorphism but not for the first one
Install plink v1.9 either
- on your computer (choose the relevant stable version from plink’s page) or
- on JupyterHub (see infra)
Using plink, compute the allele frequencies of the two SNPs and import the data into your R session as a
frq
variable.Using plink, compute the Hardy-Weinberg test statistic and import the results into R. In your R session, filter the results to only keep the statistics regarding the whole population \(A \cup B\).
(Note that the alleles as well as the chromosome location are irrelevant for this application so make arbitrary choices.)
Addendum: Installing plink on JypyterHub
Copy the link to the latest stable version corresponding to Linux 64-bit and
URL=http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20200219.zip
mkdir -p work/bin
wget "$URL" && unzip *.zip
Then make sure plink is in your PATH variable