Emil Michael Pedersen
Position:
Developing a Multi-trait liability model for gene discovery in large biobanks
Categories:
Fellows, Postdoc Fellows 2024
Location:
National Centre for Register-based Research
Abstract:
Modern biomedical and health science research leans heavily on data science to extract meaning from large-scale biobanks like UK biobank, iPSYCH, CHB/DBDS, etc.. These biobanks offer large collections of genetic and phenotypic data and are an invaluable source of insight into the human genome, aetiology of complex disorders, and inspiration for methodological developments. My project aims to develop and apply a new method for estimating genetic liability from broad, complex register, and biobank data. Using this (genetic) liability in genome-wide association studies (GWAS) should increase power for gene discovery and prediction.
We aim to develop a multi-trait extension of my dissertation work developing the age-dependent liability threshold model (ADuLT) – multi-trait ADuLT (mADuLT). mADuLT will incorporate multiple genetically correlated phenotypes, age of onset, and family history, at the same time, in the liability estimation. We will compare the extension to other state of the art multi-trait GWAS methods in extensive simulations and develop selection criteria for when and how researchers can best utilise multiple phenotypes for gene discovery. I will apply mADuLT to several Danish biobanks and UK biobank in a series of increasingly complex GWAS. We will explore mADULT’s utility for single-trait analysis, e.g. MDD. Then we will explore mADULT’s utility for detecting variants with shared effects, moving to disease domains, e.g. psychiatric disorders, and finally general health, e.g. all disease. The results from each step will be scrutinised with post-GWAS analysis with FUMA, which assess results’ plausibility via biological and functional information. Finally, we train polygenic scores (PGS) on multi-trait GWAS to assess increases in predictive power and whether the correlation structure of multi-trait trained PGS is consistent with single-trait trained PGS.
My proposed method will allow researchers to extract even more information from modern, large-scale biobanks. My proposed applications will further our understanding of specific and shared factors in disease aetiology and could assist in identifying new drug targets or enhance predictive models for clinical application. We will develop an open source software package to support the continued free access to research and to ensure replicability.