Project 7: Methodology for discovery and validation of omics-based predictors for follow-up data in large population-based biobanks

Anastassia Kolde

University of Tartu
Advisor: Krista Fischer

Project description

Integral parts of the modern precision medicine are the ability to identify biomarkers that are associated with a specific phenotype of interest and ability to translate those findings. One way to do latter is to develop personalized risk scores. This project focuses on modification of statistical methodology for time-to-event outcomes so that it enables biomarker discovery for personalized risk prediction for diseases and mortality in large population-based biobank cohorts. For that purpose, one needs to develop a framework that addresses right-censoring and left-truncation (typical in biobank cohorts) as well as possible genetic relatedness of the subjects, but also accounts for high dimensionality of the data. One also needs to account for the differential time-dependence in different layers of –omics data: while DNA data (from genotyping or WGS) remains unchanged through the lifetime, the levels of gene expression and methylation, as well as microbiome profiles depend on the sample collection time. Modified methodology would be used to develop personalized risk prediction algorithms for Type 2 Diabetes and Coronary Artery disease based on Estonian Biobank data.


  1. Scaling causality discovery to -omics data
    Host: Max Planck Institute for Intelligent Systems
    Planned date: November 2020 – January 2021 Postponed due to Corona
  2. Scaling causality discovery to population-scale data
    Host: IBM
    Planned date: December 2021 – February 2022