Psychiatric disorders are complex, influenced by genetic, environmental, and socioeconomic factors. Current risk prediction models primarily rely on clinical diagnoses and polygenic risk scores (PRS), which do not fully capture the spectrum of psychiatric liability. This PhD project aims to improve psychiatric risk assessment by developing novel risk scores that integrate familial, genetic, and socioeconomic information using data science methodologies.
The first research aim focuses on quantifying the risk for psychiatric disorders through the development of Familial Risk Scores (FRS), combining genetic liability, family history of mental and somatic disorders, socioeconomic status, and healthcare utilization. Using administrative registers with family linkage, we will construct FRS to maximize predictive value by integrating a range of variables and accounting for time-varying factors. Dimensionality reduction techniques and clustering will be applied to aggregate these variables into family susceptibility scores, which could be used as underlying susceptibility scores for further epidemiologic studies. By extending existing models such as ADuLT and LT-FH++, this approach will enhance psychiatric risk estimation beyond PRS alone. Structural Equation Modeling (SEM) will decompose FRS variance into genetic and environmental components, refining liability models and improving understanding of how familial environments contribute to psychiatric disorders.
The second research aim involves generating synthetic data to study mental health across a broader spectrum, especially for individuals with mild or undiagnosed symptoms. Current genetic models are limited by their reliance on clinical diagnoses, which overlook individuals who are not diagnosed. To address this, we propose integrating self-reported symptoms from questionnaires, National Registers, and genetic data. As the current overlap between these data sources is small (see Figure 2), we will generate synthetic data to test our hypotheses. These inferred phenotypes will be integrated into genome-wide association studies (GWAS) within the iPSYCH cohort, generating synthetic PRS (e.g., POP-GWAS).
By combining statistical genetics, epidemiology, and machine learning, this project will advance early detection, precision medicine, and public health strategies. Its interdisciplinary approach aligns with the DDSA’s mission to advance data-driven solutions in health research.