I have a broad range of research interests in bioinformatics, computational biology and machine learning, and welcome applications from students in any of these areas. In biology, my main interest is to understand gene regulation and how it is affected by genetic variation. In other words, how does the genome determine which genes are expressed (active) in different cell types, and how do genetic differences between individuals lead to differences in gene expression and ultimately to differences in health and disease traits? My group uses machine learning approaches and large sets of genetic and molecular data to answer these questions. Machine learning is a field at the interface of computer science and statistics that aims to identify correlations and other meaningful patterns in large data sets. Biology is an ideal area for testing and developing new machine learning algorithms, because in biology correlations alone are never enough. For instance, to know that high cholesterol and high blood pressure are often seen together in people with diabetes or heart disease is not very useful, until we establish that in fact, high cholesterol causes high blood pressure, and should therefore be the therapeutic target. To establish similar causal relations at the level of genes, where thousands of genes are expressed in every cell of the human body, influencing each other in untold ways through complex, unknown networks of genetic interactions, is the challenge that my group and I aim to address. In short, to paraphrase a well-known saying: nothing in biology makes sense, except in the light of causal inference.
I've also contributed to some NORBIS courses:
- NORBIS Summer School 2021
- NORBIS Course Genomics for Precision Medicine (2021)
- NORBIS Course Computational Approaches in Transcriptome Analysis (2019)
- (2022). eQTLs as causal instruments for the reconstruction of hormone linked gene networks. Frontiers in Endocrinology.
- (2022). Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders. G3: Genes, Genomes, Genetics.
- (2021). Variation in the SERPINA6/SERPINA1 locus alters morning plasma cortisol, hepatic corticosteroid binding globulin expression, gene expression in peripheral tissues, and risk of cardiovascular disease. Journal of Human Genetics. 625-636.
- (2021). Changes in the gene expression profile during spontaneous migraine attacks. Scientific Reports. 10 pages.
- (2021). A Graph Feature Auto-Encoder for the Prediction of Unobserved Node Features on Biological Networks. BMC Bioinformatics. 17 pages.
- (2020). Model-based clustering of multi-tissue gene expression data. Bioinformatics. 1807-1813.
- (2020). Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast. Molecular Biosystems.
- (2019). High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Frontiers in Genetics. 13 pages.
- (2018). Causal Transcription Regulatory Network Inference Using Enhancer Activity as a Causal Anchor. International Journal of Molecular Sciences.