From Raw Data to Measurement Results – Epistemological Problems in Data Analysis

Europe/Berlin
JvF25/3-303 - Conference Room (Lamarr/RC Trust Dortmund)

JvF25/3-303 - Conference Room

Lamarr/RC Trust Dortmund

30
Show room on map
Description

In physics, the experimenters have been gathering and analyzing vast amounts of data for decades, employing data analysis methods such as Monte Carlo simulations and, more recently, machine learning. However, the philosophical implications of processing the recorded signals and transforming them into measured physical quantities have only recently be recognized (e.g., Beauchemin 2017, Karaca 2018, Leonelli & Tempini 2020; cf. also Falkenburg 2007). Recorded signals are inherently discrete and possess limited spatial and temporal resolutions. They are subject to background contamination, noise, and measurement errors. Their relationship to physical quantities (which  are commonly represented by continuous functions) depends on the specific characteristics of the detectors with which the raw data are taken.

If we knew the underlying theoretical function f(x) of a physical quantity and the response function A(x,y) that translates this function into a detectable signal by taking into account all possible physical processes occuring in the detector, we could derive the expected signal g(y) in the presence of a background b(y) by solving a Fredholm integral equation:

g(y) = ∫A(x,y)  f(x) dx  + b(y)

In fact, physicists face the inverse challenge of determining a theoretical function f(x) from a recorded distribution of detected events g(y). This requires the inversion of the above Fredholm integral equation. From a mathematical point of view, this is an ill-posed problem; therefore its solution is unstable under small variations of the function f(x), which is often not exactly known, and the response function A(x,y), which is never precisely known. Physicsts resort to Monte Carlo simulations in order to determine the response function numerically. This involves calculating the possible particle interactions and their cross-sections in quantum field theory, and simulating individual trajectories from randomly generated initial conditions (Morik & Rhode 2023). The indispensable inclusion of Monte Carlo simulations into experimental practice and data analysis in physics raises new philosophical questions regarding the kind of knowledge gained from them (Beisbart 2012, Boge 2019) and how this affects the trust in experimental results (Boge 2024). Furthermore, each component of the Fredholm integral equation poses its own epistemological challenges. Accordingly, we identify five basic epistemological problems of data analysis in contemporary physics:

1) What can be measured? What is merely derived from the raw data and to what extent is the derivation theory-laden?

2) How can we determine the background and how can we separate it from specific signals? How should we balance purity vs. sensitivity of the signal background separation?

3) What is to be included into the response function A(x,y) (e.g., theoretical knowledge about particle interactions, knowledge about the function and geometry of the detectors)? Which conceptual and practical difficulties arise when simulating what is happening in the detector?

4) What initial assumptions do we make regarding the theoretical function f(x)? How can we ensure that these initial assumptions do not unduly bias the final results?

5) How should we deal with the fact that the inverse problem is ill-posed? How can we prevent that the results crucially depend on precise information about the response function?

Our workshop seeks to foster interdisciplinary discussions between philosophers, physicists, and computer scientists about these issues. We welcome submissions for short presentations (30 minutes talk plus 15 minutes for discussion).

Please submit abstracts in PDF-format of up to 500 words (references excluded). Submissions should be prepared for blind review. The deadline for submission is September 15th. We will inform you of our decision by September 20th.
Participation in the workshop will be free of charge.

The workshop is part of the research project "Data, Theories, and Scientific Explanation: The Case of Astroparticle Physics" funded by the German Research Foundation (DFG).
https://app.physik.tu-dortmund.de/en/research/philosophy-of-astroparticle-physics/
It is organized in cooperation with the Lamarr Institute for Machine Learning and Artificial Intelligence.

Organizers:
Prof. Dr. Dr. Brigitte Falkenburg, Dr. Johannes Mierau, Prof. Dr. Dr. Wolfgang Rhode

 

Literature:
Antoniu, A. (2021): What is a data model? An anatomy of data analysis in high energy physics. European Journal for Philosophy of Science 11: 101. https://doi.org/10.1007/s13194-021-00412-2
Beauchemin, P. (2017) Autopsy of measurements with the ATLAS detector at the LHC, Synthese, 194, 275-312.
Beisbart, C. (2012) How can computer simulations produce new knowledge? European Journal for Philosophy of Science, 2(3), 395–434.
Beisbart, C. (2018): Are computer simulations experiments? And if not, how are they related to each other? European Journal for Philosophy of Science 8, 171–204. https://doi.org/10.1007/s13194-017-0181-5
Beisbart, C., & J. Norton (2012): Why Monte Carlo Simulations Are Inferences and Not Experiments. Int. Studies in the Philosophy of Science 26 (4):403-422.
Boge, F.J. (2019) Why computer simulations are not inferences, and in what sense they are experiments? European Journal for Philosophy of Science 9, 13.
Boge, F.J. (2024) Why Trust a Simulation? Models, Parameters, and Robustness in Simulation-Infected Experiments, The British Journal for the Philosophy of Science, 75(4), 843-870.
Falkenburg, B. (2007) Particle Metaphysics. A Critical Account of Subatomic Reality, Heidelberg: Springer.
Falkenburg, B. (2024): Computer simulation in data analysis: A case study from particle physics. In: Studies in History and Philosophy of  Science 105, 99–108. https://www.sciencedirect.com/science/article/pii/S0039368124000530?via%3Dihub
Karaca, K. (2013): The Strong and Weak Senses of Theory-Ladenness of Experimentation: Theory-Driven versus Exploratory Experiments in the History of High-Energy Particle Physics. Science in Context 26, 93–136.
Karaca, K. (2018) Lessons from the Large Hadron Collider for model-based experimentation: the concept of a model of data acquisition and the scope of the hierarchy of models, Synthese, 195, 5431-5452.
Leonelli, S. & N. Tempini. (2020) Data Journeys in the Sciences, Cham: Springer.
Morik, K. & W. Rhode. (2023) Machine Learning under Resource Constraints - Discovery in Physics, Berlin/Boston: de Gruyter.

Registration
Registration
  • Thursday 13 November
    • Introduction JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 1
        Welcome
        Speaker: Prof. Brigitte Falkenburg (TU Dortmund)
      • 2
        From Raw Data to Measurement Results in Astroparticle Physics
        Speaker: Prof. Wolfgang Rhode (TU Dortmund)
    • Monte Carlo Simulations JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 3
        Monte Carlo In Silico. A Travel Guide

        Monte Carlo computer simulations and methods have been widely used in physics for many decades, particularly in data analysis. Despite that, the growing philosophical literature about computer simulations has largely bracketed them. This talk aims to fill in this lacuna. I provide an overview of various Monte Carlo techniques as they are applied in physics, covering Monte Carlo integration, direct, and indirect Monte Carlo simulation. I then provide an epistemological analysis of direct Monte Carlo simulations with a focus on data analysis. It turns out that they don’t entirely fit the idea that simulations trace the development of a system by outputting a series of state descriptions of the system. Furthermore, Monte Carlo simulations raise interesting questions about the meaning of the probabilities involved.

        Speaker: Prof. Claus Beisbart (University of Bern)
      • 4
        What is so special about Monte Carlo Simulations?

        Monte Carlo simulations (MCS) are the method of choice for the simulation chain in (astro-)particle physics. Even though there exists an extensive philosophical debate on the epistemic nature of computer simulations, the reasons for preferring MCSs have not been addressed. In my talk I claim that analyzing these reasons sheds light on the debate on the epistemic nature.
        In (astro-)particle physics, several MCSs are combined to a simulation chain to simulate the entire process from the first interaction of the primary particle to the registration event inside the detector. This produces synthetic data with complete information on the physical properties of the original particle and the involved secondary particles. The synthetic data can thus serve as labelled training and test data for machine learning methods that are among other things used for signal-background-separation in the raw data.
        Furthermore, the Monte Carlo simulation chain can be used for an inverse analysis. With a sufficient amount of simulated data, probabilistic conclusions can be drawn about the physical processes that have resulted in a particular detector image. This twofold function enables MCSs to serve as both surrogates for experiments and theoretical inference.

        Speaker: Johannes Mierau
    • 12:30
      Lunch break
    • The Inverse Problem JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 5
        A Machine Learning Perspective on the Inverse Problem

        Physicists aim to reconstruct the distribution of physical quantities from the vast amounts of data collected by telescopes, as a means to better understand the physical processes of the Universe. This
        reconstruction involves solving an inverse problem, specifically the Fredholm integral equation highlighted in the overview of this workshop. Methods for finding such a solution are not only studied in physics but also in computer science and machine learning research. In these fields, the cognitive interest lies not in the physical implications of the reconstructed distribution but in the properties and reliability of the reconstruction methods themselves. For instance: Can a given method guarantee a certain level of accuracy? What conditions must be met to ensure such guarantees? This talk invites participants to discuss the epistemological issues related to these machine learning questions.

        Speaker: Mirko Bunse (Lamarr Institute, TU Dortmund University)
      • 6
        Diagonalizing the Unfolding Problem

        Assuming that the detector response is known and linear, and that indivdual events are IID distributed measurements, the continuous unfolding problem can be formulated as an infinite dimensional eigenvalue problem. This representation identifies the observable features of the unknown truth and quantifies the information content of a given measurement. The talk will present the underlying math and illustrate it by numerical examples, which show what can and what cannot be learned when solving inverse problems.

        Speaker: Prof. Michael Schmelling
    • 15:30
      Coffee break JvF25/3-302 - Co-Working Space

      JvF25/3-302 - Co-Working Space

      Lamarr/RC Trust Dortmund

      40
      Show room on map
    • Observation JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 7
        The Explanation–Justification Gap in Deep Learning Astroparticle Physics

        Deep learning (DL) models are increasingly used in astroparticle physics for tasks such as gamma–hadron separation, neutrino event reconstruction, and cosmic-ray classification. While these models achieve remarkable predictive accuracy, their opacity poses a challenge to the epistemic standards of discovery. Heatmap-based explainable AI (XAI) techniques—such as heatmaps—promise insight into model reasoning, yet visualization alone cannot justify scientific claims. This paper identifies the explanation–justification gap and proposes epistemic preconditions for closing it. By situating these conditions within contemporary practices of detector-based inference, the paper clarifies when heatmaps contribute to justified knowledge in astroparticle physics.

        Speaker: Dr Koray Karaca (University of Twente)
      • 8
        Re-Assessing the Experiment / Observation-Divide

        My talk reevaluates the distinction between experiment and observation. I first argue that to get clear on what role observation plays in the generation of scientific knowledge, we need to distinguish “experiential observation” as a concept closely connected to experience from “observation” in a technical sense and from “field observation”, as a concept that reasonably contrasts with “experiment.” I then argue that observation construed as field observation can enjoy systematic epistemic advantages over experiment, contrary to appearances.

        Speaker: Prof. Florian Boge (TU Dortmund)
    • 17:30
      Coffee break JvF25/3-302 - Co-Working Space

      JvF25/3-302 - Co-Working Space

      Lamarr/RC Trust Dortmund

      40
      Show room on map
    • Observation JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 9
        Observational Data in Astronomy

        Astronomy as a field has always been strongly driven by advances in instrumentation and the wealth of new observational data obtained with ever more powerful observatories.
        To ensure the equal and lasting ability of scientists to analyze data from instruments that may only exist at one observatory, and from astronomical events that statistically may not repeat during human lifetimes, the community places a strong emphasis on efficient, well-documented, and sustainable pathways for research data management and preservation.
        In this talk, I aim to give an overview of the data flow in modern astronomy, and of key points in the chain where – sometimes irrevocable – decisions have to be taken. I furthermore will discuss the role of artificial intelligence methods in this picture, as well as open questions that will need tob e addressed for the next generation of observatories.

        Speaker: Dominik Elsässer
    • 19:30
      Dinner
  • Friday 14 November
    • Observation JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map
      • 10
        The Causal Structure of Neutrino Observations
        Speaker: Prof. Brigitte Falkenburg (TU Dortmund)
      • 11
        The Epistemology of Multi-Messenger Observations

        I propose a general methodological framework for astrophysical observations that can accommodate both single and multi-messenger observations. This is an eliminative inferential process aimed at identifying the source system from two directions, namely from date to phenomenon and from fundamental theory to phenomenon. That is, the process draws from both observational data and theoretical principles to justify the identification of a source system. The framework allows for a case-by-case allocation of the epistemic emphasis given to theoretical and empirical evidence in the process of justifying an observation. Furthermore, the framework addresses the problem of unconceived alternatives in the observation of astrophysical events by augmenting the meta-empirical assessments developed by Dawid (2013) to extend the notion of evidence beyond the empirical detections.

        Speaker: Sarwar Ahmed (University of Wuppertal)
    • 12:30
      Lunch break
    • Panel Discussion AI in Physics JvF25/3-303 - Conference Room

      JvF25/3-303 - Conference Room

      Lamarr/RC Trust Dortmund

      30
      Show room on map