Let’s load the truncdist
package (you can install it using install.packages()
):
We set the random seed
Simulated genotypes
We draw \(p = 1000\) allele frequencies:
- What’s the histogram
An individual can be thought of as a list (data frame or tibble) of three vectors
pat | mat | dos |
---|---|---|
0 | 0 | 0 |
1 | 1 | 2 |
0 | 0 | 0 |
1 | 1 | 2 |
1 | 1 | 2 |
0 | 0 | 0 |
where dos
is the dosage of the ALT allele and can take the value 0, 1 or 2.
Create two individuals;
- What’s the correlation between their dosage vectors?
- At how many SNPs do their dosage vectors are in agreement?
- To derive a statistically meaningful picture, replicate and plot a histogram
Create a family of three: two parents and one child, assuming that SNPs are independent;
- What’s the correlation between their dosage vectors?
- At how many SNPs do their dosage vectors are in agreement?
- To derive a statistically meaningful picture, replicate and plot a histogram
Create a family of three: two parents and one child, assuming that SNPs are in full LD;
- What’s the correlation between their dosage vectors?
- At how many SNPs do their dosage vectors are in agreement?
- To derive a statistically meaningful picture, replicate and plot a histogram
Discuss the two models and their implications and their most obvious limitations