Project 14: Development and application of causal inference methods to high dimensional healthcare data

Vesna Resende Barros

IBM Research – Haifa, Israel
Advisors: Michal Rosen-Zvi and Tal El-Hay

Project description

Over the last years, more and more healthcare data are available in digital form: electronic health records (EHR) of tens of millions of patients, large-scale datasets with genetic profiles and disease status, and medical imaging data for clinical analysis. Novel intersections between medical knowledge with machine learning, deep learning and causal inference have been enabling the investigation of this health data to review new insights into disease mechanisms and therapy outcomes. Ultimately, the goal is to exploit these insights to offer a better sustainable care.

The availability of such big data also allows the investigation of causal questions, such as estimating treatment effect on disease outcome in a real-world setup and finding causal relationships between features in high-dimensional cases. This research project will explore such questions in order to create algorithms for accurate estimations of causal effect from rich, high dimensional health data. More specifically, we hope to learn causal directions to develop machine learning models that can be accurate and flexible enough to allow for predictions in different diseases, taking into consideration the scale of such data. Throughout the research period we will focus on cancer – we expect to understand cancer progression through model analysis and medical data, not only clinical and images but genome sequences as well.

Investigating and understanding such causalities can help us take actions and intervene on a diseased population. Moreover, recent advances in causality discovery and in methods for assessing statistical significance in high-dimensional feature selection may hold the key to making statements about significance and causality in a diseased population-scale data. Understanding these concepts will progressively support the development of new strategies to treat human cancer, ultimately allowing us to deploy more efficient and impactful interventions at exactly the right moment in a patient’s care.


  1. Causal inference at population-scale using kernel methods
    Host: Max Planck Institute for Intelligent Systems
    Planned date: February 2021 – April 2021 Postponed due to COVID
  2. Improving disease-network discovery through causality discovery
    Host: Université de Paris
    Planned date: March 2022 – May 2022


Michal Chorev, Yoel Shoshan, Adam Spiro, Shaked Naor, Alon Hazan, Vesna Barros, Iuliana Weinstein, Esma Herzel, Varda Shalev, Michal Guind, and Michal Rosen-Zvi. (2020). The Case of Missed Cancers: Applying AI as a Radiologist’s Safety Net. In: Martel A.L. et al. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science, vol 12266. Springer, Cham.

Parthasarathy Suryanarayanan, Ching-Huei Tsou, Ananya Poddar, Diwakar Mahajan, Bharath Dandala, Piyush Madan, Anshul Agrawal, Charles Wachira, Osebe Mogaka Samuel, Osnat Bar-Shira, Clifton Kipchirchir, Sharon Okwako, William Ogallo, Fred Otieno, Timothy Nyota, Fiona Matu, Vesna Resende Barros, Daniel Shats, Oren Kagan, Sekou Remy, Oliver Bent, Pooja Guhan, Shilpa Mahatma, Aisha Walcott-Bryant, Divya Pathak, and Michal Rosen-Zvi. (2021). AI-assisted tracking of worldwide non-pharmaceutical interventions for COVID-19. Scientific Data 8, 94.