Let’s load the truncdist package (you can install it using install.packages()):

We set the random seed

Simulated genotypes

We draw \(p = 1000\) allele frequencies:

  • What’s the histogram

An individual can be thought of as a list (data frame or tibble) of three vectors

pat mat dos
0 0 0
1 1 2
0 0 0
1 1 2
1 1 2
0 0 0

where dos is the dosage of the ALT allele and can take the value 0, 1 or 2.

  • Create two individuals;

    • What’s the correlation between their dosage vectors?
    • At how many SNPs do their dosage vectors are in agreement?
    • To derive a statistically meaningful picture, replicate and plot a histogram
  • Create a family of three: two parents and one child, assuming that SNPs are independent;

    • What’s the correlation between their dosage vectors?
    • At how many SNPs do their dosage vectors are in agreement?
    • To derive a statistically meaningful picture, replicate and plot a histogram
  • Create a family of three: two parents and one child, assuming that SNPs are in full LD;

    • What’s the correlation between their dosage vectors?
    • At how many SNPs do their dosage vectors are in agreement?
    • To derive a statistically meaningful picture, replicate and plot a histogram
  • Discuss the two models and their implications and their most obvious limitations