Enrichment test as a Monte Carlo experiment

Suppose, in a transcriptomics setup, we’ve tested 1000 genes and 150 happen to be differentially expressed (DEGs). Suppose furthermore that 20 out of those 150 DEGs belong to a 100-gene set of core genes among the 1000 tested genes.

In order to answer the question: “Can we reasonably expect that the 20/150 (13%) of core genes among DEGs is an overrepresentation compared tp the overall 100/1000 (10%) fraction of core genes?” assume \(H_0\) “Core genes are represented no less and no more among DEGs than they are in the rest of the dataset”.

Calculate a \(p\)-value using a Monte Carlo simulation.

Hypergeometric distribution

Use the hypergeometric distribution functions in R to calculate the \(p\)-value.

Gene Ontology

The Gene Ontology resource provides a representation of scientific knowledge about the functions of genes, coding and non-coding, from many organisms. Ontologies consist of a set of classes/terms/concepts with relations that operate between them.

GO: Hexose biosynthetic process

Three ontologies are provided: (1) Molecular Function, (2) Cellular Component and (3) Biological Process. Annotations can be downloaded from the Gene Ontology website.

Use the biomaRt package to obtain the GO term of the Ensembl genes ENSG00000008735, ENSG00000040608, ENSG00000069998, ENSG00000070010 and ENSG00000073146.