Speakers
Description
Understanding causal relationships in oncology is essential for improving treatment strategies and generating testable medical hypotheses. We present CaDSIm (Causal Discovery with Simultaneous Imputation), a new method for learning causal structures and associated Structural Equation Models from real world pan-cancer data, which is typically high dimensional, noisy, and incomplete.
Our approach addresses three main goals: Validation, Identification, and Counterfactual Reasoning. First, we evaluate the method’s ability to recover known causal relationships in oncology. Second, we aim to identify novel and testable associations among patient characteristics, tumor biology, and treatment variables. Third, we use the learned model to answer counterfactual questions, such as estimating the potential impact of different treatments on patient outcomes.
CaDSIm is based on pairwise independence testing under the assumption of additive noise models. Unlike traditional methods that require complete data preprocessing, CaDSIm directly handles data that is missing at random by performing imputation and causal discovery at the same time. It infers a locally consistent ordering of variables using overlapping clusters, allowing for robust inference despite missing values and hidden structure.
We apply CaDSIm to a pan-cancer dataset of more than 15,000 patients across 38 tumor types. Our method lays the groundwork for causal analysis in precision oncology by providing insights that are both explainable and experimentally testable, thereby helping to connect advances in machine learning with biomedical research.