1st MLFPM Summer School

News: The slides of most of the presentations are now available for download: https://polybox.ethz.ch/index.php/s/Qoydox1xBEkD7HI

General information

Date: September 9 to 13, 2019

Location: Room 02.S.21 at FHNW Muttenz (Hofackerstrasse 30, 4132 Muttenz)

Registration fee: CHF 350 (including lunches, coffee breaks and conference dinner)

Tentative schedule

Monday 09.09Tuesday 10.09Wednesday 11.09Thursday 12.09Friday 13.09
09:00-10:30K. Borgwardt (1)T. El-Hay (1)L. MilaniK. Fischer (2)T. Heimann, V. Tresp (2)
10:30-11:00Coffee breakCoffee breakCoffee breakCoffee breakCoffee break
11:00-12:30K. Borgwardt (2) // H. StockingerT. El-Hay (2)N. Rajewsky (tbc)T. Heimann, V. Tresp (1)R. Grossmann
12:30-13:30LunchLunchLunchLunchLunch
13:30-15:00B. Müller-Myhsok (1)K. Fischer (1)M. Rodríguez MartínezL. Maier-Hein K. Van Steen
15:00-15:30Coffee breakCoffee breakCoffee breakCoffee breakCoffee break
15:30-17:00B. Müller-Myhsok (2)B. ElgerSocial Event
17:00-Social Event

List of lectures and talks

  • Big data – big ethics: an introduction to bioethics
    Bernice Elger (University of Basel)

    In the 1970, bioethics became an academic discipline in the US. Why ethics in science and medicine? Should students in science recite an oath? What does bioethics «do» ? The talk will present different types of argumentation in ethics, with a particular focus on the famous book written by T.L. Beauchamp and J.F. Childress “Principles of biomedical ethics”. In the second part an overview will be provided on the ethical issues raised by big data analysis, including machine learning.
  • Introduction to Causal Inference
    Tal El-Hay (IBM Research – Haifa)

    Real world evidence is medical data on patients collected routinely from various sources, such as electronic health records, insurance claims, medical and mobile devices. While predictive machine learning readily exploits such data in order to construct diagnosis tools, causal inference – involving treatment effect assessment – requires careful adjustments for biases in the data. In this talk, I will introduce the field of causal inference from real world evidence and its relation to machine learning and precision medicine. I will describe potential applications, discuss sources of biases, and show under which assumptions causal inference is feasible. The second part of this talk will provide the mathematical formulation of causal inference, present estimation methods that deal with different sources of biases, and in combination with machine learning demonstrate how to deal with high dimensional data. 
  • Statistical analysis of follow-up data in large biobank cohorts
    Krista Fischer (University of Tartu)

    The first part of the lecture will introduce the basic principles of the statistical analysis of time to event data (also called as survival or follow-up data). One of the specific features of such data is censoring – the event of interest is not observed for a part of the sample. The concepts of survival function and hazard function, as well as the Kaplan-Meier method to obtain and plot the estimated survival function will be introduced. Also, an overview of the Cox proportional hazards model and parametric models for survival analysis will be given with several practical examples and some guidelines for conducting the analysis with R software.

    The second part will focus on the issues one needs to account for, while analysing biobank data. One of these would be left-truncation or “survival bias” – people joining the biobank at given age do not form a random subset of their birth cohort, but they represent the individuals who have survived to that age. We will study the implication of left-truncation and ways to correct for it by choosing a proper timescale for the analysis. Also some other issues, like competing risks as well as covariate choice while studying the effects of –omics data will be discussed.
  • Digitalization and Big Data in Healthcare 
    Tobias Heimann (Siemens Healthineers), Volker Tresp (Siemens and Ludwig-Maximilians-Universität München)

    Part I “Clinical Data Intelligence” by Volker Tresp:

    In the first part of my presentation I will discuss some of the recent trends towards digitalization and large-scale data analytics in healthcare, which might eventually lead to significant advances. The visions are real: It will be possible to provide best possible patient treatment at the most competent provider based on complete patient information. Medical research will have immediate impact on patient care and results from clinical practice will immediately feedback to clinical research, thereby closing the loop. Disease outbreak can be prevented by continuous health monitoring. It will be possible to  provide continuous care for the elderly and the chronically ill.  Affordability of healthcare will be guaranteed by value-based healthcare and population management. Finally, social and information network will enable the patients to become more active players in managing their health.  In the second part I will focus on a core issue, i.e., how clinical decisions can be modelled and optimized with machine learning using observational data.  I discuss the integration of different data sources (structured and unstructured), and how one can deal with mixed dynamic and static data. Important issues in practice are confounding variables, explainability and validation of decision recommendations, and issues with missing data. I present applications to a breast cancer tumor board, to the care of  nephrology patients, and to  medication recommendations in ophthalmology.

    Part II “Medical Image Analytics” by Tobias Heimann:

    Imaging technology, from X-ray to Magnetic Resonance Imaging, is an essential part of clinical diagnostics. While most images are now acquired and stored digitally, automatic analysis of the data is challenging due to its inherently unstructured nature and the “signal-to-symbol gap”. In recent years, deep machine learning has pushed the field of automatic image understanding forward significantly. In healthcare, a new generation of software is appearing that supports clinicians reliably in their decisions. Integrating the output of the underlying algorithms with clinical and omics data has the potential to boost precision medicine to the next level.

    In my presentation, I will introduce the challenge of medical image understanding, present different technologies for this task, and give examples of existing and upcoming products. Finally, I will present the Digital Health Twin as a concept that integrates imaging information with additional data to predict patient outcomes for different clinical applications.
  • Clinical Trials featuring Good Clinical Practice (GCP)
    Regina Grossmann (Clinical Trials Center Zürich)

    Clinical trials as part of the drug development process of therapeutic products are conducted according to the principles of an international guideline called Good Clinical Practice (GCP). GCP represents an ethical and scientific quality standard that describes rights and duties of study participants, the responsibilities of other involved actors/parties in clinical trials as well as formal and content requirements of main study documents. After a first reflection of typical phases and criteria of the developmental process of therapeutic products and essential aspects of design and methodology of clinical trials, the historical development of GCP gets outlined, a rough overview of its contents conveyed and related guidelines considered. Finally, remaining ethical challenges and frequent GCP violations/breaches are discussed.
  • Surgical data science
    Lena Maier-Hein (German Cancer Research Center (DKFZ))

    Surgical Data Science (SDS) is an emerging scientific field that focuses on the acquisition, modeling and analysis of data in order to improve the quality of interventional healthcare. It encompasses all clinical disciplines which involve interventions to manipulate anatomical structures with a diagnostic, therapeutic or prognostic goal. In this paradigm, data may relate to any step of the patient care, may concern the patient, caregivers, as well as technology used to deliver care, and may be analyzed in the context of generic domain-specific knowledge. The unique scientific challenges related to the analysis of data from interventions include those related to speed, robustness as well as the heterogeneity and complexity of the procedures. This lecture will introduce basic concepts related to SDS and present state-of-the-art approaches to address core remaining research challenges. Particular focus will be put on methods for dealing with limited annotated training data, uncertainty quantification and compensation, as well as meaningful performance assessment.
  • Personalised Medicine based on common and rare genetic variants in Estonia
    Lili Milani (Estonian Genome Center)

    The Estonian Biobank was founded in 2000 as a volunteer-based biobank. Today, it contains a collection of health and genetics data of close to 200 000 individuals, approximately 20% of the adult population of the country. The Human Genes Research Act (passed in 2000) and the broad consent form signed by all participants allows regular updating of data through linking to national electronic health databases and disease registries. This enables long-term follow-up of the cohort, including diagnoses, drug prescriptions, lab tests and medical procedures. To date, 152 000 individuals have been genotyped with Illumina’s Global Screening Array, and the genomes of 3,000 individuals and exomes of 2500 individuals have been sequenced. This serves as a population-based imputation reference for rare and common genetic variants.

    I will present three pilot projects of personalized medicine in Estonia. By re-contacting biobank participants with specific high-risk mutations, we were able to prove the utility of a “genetics first approach” for familiar hypercholesterolemia and breast cancer and evaluate the response of participants who consented to receiving results and counseling. This has led to the next level of validation projects run by the two major hospitals in Estonia, in collaboration with primary care physicians, testing the implementation of high-risk mutations and polygenic risk scores for breast cancer and cardiovascular disease.

    We have also developed and tested algorithms for translation of preexisting genotype data of biobank participants into pharmacogenetic recommendations. We compared the results obtained by genome sequencing, exome sequencing, and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants.

    Overall, the expectations for personalized medicine are very high in Estonia, and several implementation projects will be launched in the national healthcare system within the next few years.
  • Principles of Statistical Genetics 
    Bertram Müller-Myhsok (Max Planck Institute of Psychiatry München)

    In this talk I will lay out the principles of statistical genetics, a discipline on science that has enjoyed enormous success in the last years in unraveling the genetic basis of medical phenotypes and diseases, as well as elucidating genetic architecture of non-medical traits as well as traits in non-human biological systems. 

    The talk will cover methods aimed at showing and describing a genetic basis for a given trait or set of traits, a realm of methodology called (complex) segregation analysis. It will then move over to family based analysis, such as linkage analysis, which have been and still are enormously successful in monogenic genetics, but also in sequencing studies. Finally I will discuss genome-wide association analyses, both from a methodological aspect as well as a historical perspective, these two being intertwined. The talk will conclude with methods working on large sets of polymorphisms simultaneously, and show how they can be used to answer some of the question asked in e.g. complex segregation analysis, thus closing a loop in time. 
  • Single Cell Multi-omics, Machine Learning, and Human Disease Models Transform Medicine
    Nikolaus Rajewsky (Max Delbrück Center Berlin)

    I will explain recent advances, including our own contributions, in single-cell (multi)omics. I will present unpublished data and show how we can discover design principles of how gene expression drives life in (tissue-)space & time.  I will argue that these approaches will transform not only basic science but also clinical pathology, diagnosis, and therapy. I will discuss the specific challenges for Machine Learning in this transformation. I will then present LifeTime, a pan-European Consortium of 90 research institutions and 80 companies that aims to improve healthcare by mapping, understand, and target human cells in disease progression by integrating Machine Learning with single-cell multiomics and organoids. 
  • Artificial Intelligence approaches for personalized medicine
    María Rodríguez Martínez (IBM Research – Zurich)

    In recent years, AI has become a very active field in computer science and models with astounding performances in a broad area of applications such as computer vision, speech recognition and natural language processing have been developed. In computational biology, the recent availability of large amounts of data generated by large international consortia together with technical developments facilitating the implementation and training of more performant models have made possible the broad application of deep learning and machine learning approaches to a vast set of problems. In this talk, I will present current activities at the Computational Systems Biologygroup in IBM Research, Zurich, that illustrate the application of AI approaches to unravel disease mechanisms and develop personalized patient models. For instance, I will show how models for text ingestion can be used to automatically extract knowledge from biomedical publications and obtain comprehensive maps of molecular interactions. I will also show how multi-modal neural networks can be trained to ingest disparate data types, such as compound molecular structures, transcriptomic data and prior molecular knowledge, to predict drug sensitivity in cancer cell lines. Finally, I will illustrate how deep learning models can be adapted to characterize tumor heterogeneity in single-cell data.
  • Effective Scientific Communication Workshop 
    Kristel Van Steen (University of Liège)

    This course is a gentle introduction to scientific reading, writing and presenting. Understanding the format of a scientific paper facilitates browsing over a scientific manuscript and getting your own work published. Practical tips are provided to enhance making summaries of key points in scientific documents with minimal effort, and to critically evaluate them. In addition, several tips and tricks are given towards effective oral communication and the development of excellent presentation skills. In summary, this course is all about understanding and delivering scientific messages, with impact!
  • Data Privacy and IT Security
    Heinz Stockinger (SIB Swiss Institute of Bioinformatics)

    Many researchers use patient data (i.e., confidential personal data) in their research projects. Dealing with confidential personal data requires awareness of data privacy, respective laws and information security. This lecture explains what should be done in practice to protect the patients’ privacy when performing biomedical research on human data.

    Presentation slides for download
  • Time Series Mining for Precision Medicine
    Karsten Borgwardt (ETH Zürich)

    Mining time series for reoccurring patterns has a long history in Data Mining. Still, the digitalization of medical records and the wide-spread availability of personal digital devices is now bringing about datasets, whose scale, length and annotation detail create new Data Mining challenges on time series. In my talk, I will describe these, in particular the problem of finding reoccurring motifs in time series and assessing their statistical significance, and its application in digital biomarker detection from intensive care unit records.

Application

Apply here to participate in the summer school. Application deadline is June 9, 2019.

After the application deadline, we will review all applications. The selected participants will receive a confirmation and an invoice for the registration fee (CHF 350 including lunches, coffee breaks and the conference dinner) by June 30, 2019. The registration is only complete and valid after timely payment of the registration fee.