Basic statistics with plink

Describe the format of the PED and MAP file formats from plink’s documentation
Use R to create a PED/MAP dataset for \(N = 500\) cases + \(N = 500\) controls and \(n = 2\) SNPs so that
1. sex is chosen randomly
2. each individual is randomly assigned to one of two separate subpopulations \(A\) or \(B\) (aka stratification)
3. the first SNP conforms to the Hardy-Weinberg principle in the whole population \(A \cup B\) with a unique allele frequency \(f = 0.3\)
4. the second SNP conforms does conform to the HW principle in each separate subpopulation \(f_A = 0.3\) and \(f_B = 0.6\)
Compute both the observed and the expected genotype frequencies: the Wahlund effect should manifest through a reduction in heterozygosity for the second polymorphism but not for the first one

Install plink v1.9 either

Using plink, compute the allele frequencies of the two SNPs and import the data into your R session as a frq variable.
Using plink, compute the Hardy-Weinberg test statistic and import the results into R. In your R session, filter the results to only keep the statistics regarding the whole population \(A \cup B\).

(Note that the alleles as well as the chromosome location are irrelevant for this application so make arbitrary choices.)

Copy the link to the latest stable version corresponding to Linux 64-bit and

URL=http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20200219.zip
mkdir -p work/bin
wget "$URL" && unzip *.zip

Then make sure plink is in your PATH variable

export PATH="$PATH:$HOME/work/bin"