18–20 Feb 2025
Lamarr/RC Trust Dortmund
Europe/Berlin timezone

ErUM-Data Proposal: Anomaly Detection with Dataset Shift (ADDS)

19 Feb 2025, 14:35
25m
JvF25/3-302 - Co-Working Space (Lamarr/RC Trust Dortmund)

JvF25/3-302 - Co-Working Space

Lamarr/RC Trust Dortmund

Joseph-von-Fraunhofer-Str. 25 44227 Dortmund
40
Show room on map

Speaker

Mirko Bunse (Lamarr Institute, TU Dortmund University)

Description

Anomaly and signal detection is one of the most important use cases of machine learning (ML) both in scientific and in commercial applications. Anomalous signals are measured relative to an expected behavior of data, i.e., relative to the background or to the priors. Relevant examples of anomalies and signals in physics can be: an excess of gamma-rays near the center of our Galaxy (a possible signature of annihilating dark matter); an excess of unassociated gamma-ray sources for a particular range of source parameters (a new class of gamma-ray sources); and an excess of events in collider experiments for a particular value of invariant mass (a signature of a new particle, i.e., a particle beyond the Standard Model).

A confident detection of such anomalies or signals has a significant impact on scientific developments. The problem is that, in many cases, the priors against which the anomalies are evaluated have uncertainties. In ML, the anomaly is modeled as a difference between the distribution of the training dataset and the distribution of the target dataset, which is generally referred to as dataset shift. However, this shift can not only stem from the presence of an anomaly, but also from a change in the distribution of background events. Data analysis and ML methods must ensure that a plain change in the background distribution is not falsely mistaken for an actual anomaly or signal.

In many cases, distinguishing an anomalous signal from a change in the background requires domain knowledge to constrain the possible changes in the background, so that the remaining excess events can be attributed to a signal (or anomaly) in the data. Within ML, research questions concerning such an attribution for quantification methods are extensively studied. Quantification research is actively developed in the research communities of ML and computer science but, so far, has had little application in the ErUM fields.

The main goal of this project is to adapt existing ML methods and to further develop the methods of quantification learning for the detection and analysis of anomalies and signal in the presence of uncertain backgrounds for research questions from astroparticle physics. Subsequent possible uses comprise several ErUM fields and commercial applications.

Presentation materials