library("tidyverse")
Download the data here and load them using:
data = read_rds("assets/genotoy.rds")
The data
variable is a list containing the following objects
genotypes
the genotype matrix (row = individual, column = SNP)
Q1
and Q2
two quantitative traits (vectors)
B1
and B2
two binary traits (vectors)
The Q1
and B1
correspond to highly polygenic phenotypes whereas Q2
and B2
correspond to mildly polygenic phenotypes.
- Plot the distribution of the minor allelic frequencies
For each phenotype, analyse the whole dataset:
- Compute the effect sizes and the \(p\)-values
- What polymorphisms are significant at the 5% family-wise error rate?
- What polymorphisms are significant at the 25% FDR?
- Plot the quantile plot for the \(p\)-values
For each phenotype:
- Compute a polygenic score based on the polymorphims such that \(p < .003\), \(p < .01\), \(p < .03\), \(p < .1\) and \(p < .3\)
- Evaluate its accuracy using either Pearson’s \(\rho\) for quantitative traits and the area under the curve for binary traits
- What’s the best choice of threshold for the \(p\)-value?