Project 5: Deep representations of somatic mutations and germline variants for cancer research

Giovanni Visonà

Max Planck Institute for Intelligent Systems
Advisors: Gabriele Schweikert & Bernhard Schölkopf

Project description

The challenge of precision oncology is the data-driven identification of disease states and the translation of this information into actionable treatment options tailored to individual patients. Tumor sequencing has generated massive inventories of somatic mutations, changed epigenomic states and aberrant expression patterns that occur within tumor cells. This wealth of data harbors unique opportunities for the identification of clinically relevant drug targets using Machine learning methods [1]. Importantly, the effect of perturbing a gene’s activity depends on the cellular context, i.e. on the activity of other genes, as the specific phenotype arises from the interaction of a functionally connected network of genes. A crucial example of pairwise interacting genes is synthetic lethality, where joint inactivity of two genes but not inactivity of individual genes is associated with cell death. Thus, if one of the genes is cancer specific, the other becomes a target to specifically eliminate cancer cells. Finding tumor-specific contexts therefore offers the opportunity to identify corresponding vulnerabilities for personalized treatment [2]. It also offers the possibility to overcome the problem of heterogeneity within cancer cells, as varying combinations of genetic and epigenetic alterations in different cells of the same tumor poses a particular challenge to successful treatment. To identify tumor-specific genetic and epigenetic contexts we will explore deep representation learning technologies on tumor sequencing data (The Cancer Genome Atlas, TCGA).

In addition, we will aim to identify genetic-chemical interaction maps for pharmacologically active compounds using high-throughput imaging data of cancer cell lines which harbour activating or inactivating mutations in key oncogenic signalling pathways. We will initially test a number of methods on a published data set [3]. In a later stage, we will take advantage of the cell painting facility in Dundee in collaboration with Prof Jason Swedlow to create additional data, where we can ultimately combine genomic, epigenomic, proteomic and morphological profiling data. In this context we will explore multi-view learning techniques.

[1] Azuaje et al. Precision Oncology (2019)
[2] Parameswaran, et al, Trends in Cancer (2019)
[3] Breinig et al, Mol Syst Biol (2015)


  1. Deep Learning for Integrated Network Analysis
    Host: Université de Paris
    Planned date: February 2021 – April 2021 Postponed due to COVID
  2. Applying network mining to biobank-scale datasets
    Host: Qlucore
    Planned date: March 2022 – May 2022