This presentation will introduce the Hybrid ML research area, which aims to integrate deep learning with structured knowledge from mathematics and the natural and social sciences.
Hybrid ML is guided by the observation that both mathematics and the sciences can be seen as generators of compressed pattern representations. We will explain that the central goal of Hybrid ML is to align the...
The Planning & Logistics area within the Lamarr Institute focuses on transferring AI research into real-world logistics applications. Logistics offers a rich field for AI with significant impact on both society and sustainability. Key topics include scalable multi-criteria optimization for efficiency and environmental performance (such as route planning, fleet management, and navigation),...
Area presentation: "Resource-Aware Machine Learning at Lamarr: A Guided Tour"
In this talk, we offer a guided overview of the resource-aware machine learning (RAML) research taking place at the Lamarr Institute. RAML aims to make machine learning systems not only accurate, but also efficient in terms of energy, latency, and computational resources. We highlight ongoing efforts within the...
The interdisciplinary research area of physics at the Lamarr Institute leverages advanced mathematical and machine learning methods to deepen our understanding of nature. By combining simulation-based approaches with sophisticated data analysis techniques, this area addresses fundamental questions across diverse physics domains. This presentation will introduce the research area, its...
Within the Lamarr Institute, the topic of trustworthy AI is being explored across diverse application contexts and scientific disciplines. Lamarr researchers focus on areas such as developing effective certification and verification procedures for AI systems, ensuring explainability and robustness, as well as advancing trustworthy AI in domains like physics, life sciences, engineering, and...
For more than two decades, the MAGIC telescopes continuously accumulate
significant amounts of data. However, the analysis of this data poses critical
problems due to its volume exceeding existing data curation capacities. This
criticality induces the demands for the utilization of AI methods to enhance and
accelerate the analysis process. Thus, MAGIC utilizes random forests for an...
This work explores the applicability of synthetic data for training deep learning models aimed at real-time classification of astronomical radio signals. Building on previous research where lightweight convolutional neural networks (CNNs) using DM-time representations showed promising performance in detecting transient signals, we now turn to the question of whether synthetic datasets can...
Tractography enables the reconstruction of white matter pathways from diffusion MRI and is a key tool for studying brain connectivity in both research and clinical contexts. Within the overall tractography pipeline, the parcellation step assigns individual streamlines to specific anatomical bundles, or discards them as false positive detections. We introduce PETParc (Parallel Efficient...
Stochastically sampling word segmentations from a subword tokeniser, also called subword regularisation, is a known way to increase robustness of language models to out-of-distribution inputs, such as text containing spelling errors. Recent work has observed that usual augmentations that make popular deterministic subword tokenisers stochastic still cause only a handful of all possible...
Multi-Agent Path Finding (MAPF) focuses on determining conflict-free paths for multiple agents navigating through a shared space to reach specified goal locations. This problem becomes computationally challenging, particularly when handling large numbers of agents, as frequently encountered in practical applications like coordinating autonomous vehicles. Quantum Computing (QC) is a promising...
Forecasting astrophysical flares in blazars presents a unique challenge due to their irregular temporal dynamics and strong variability. While deep neural networks have shown promise for modeling such complex time series, their predictions often lack alignment with established physical knowledge, limiting trust and interpretability. In this work, we propose a domain-informed deep learning...
Hyperbolic representations are effective in modeling knowledge graph data which is prevalently used to facilitate multi-hop reasoning. However, a rigorous and detailed comparison of the two spaces for this task is lacking. In this paper, through a simple integration of hyperbolic representations with an encoder-decoder model, we perform a controlled and comprehensive set of experiments to...
The AI research ecosystem is a demanding, high-pressure environment that profoundly shapes the future of technology. Its effectiveness and sustainability depend not only on technical innovation but also on the people who sustain its progress. Investigating the psychosocial factors that link individual traits to work experiences and mental health is therefore essential for enabling sustainable,...
Traditional interpretability techniques such as rule-based models and feature attribution methods, each offer complementary strengths, however are often applied in isolation. Rule-based approaches are intuitive and logically structured, making them easy to understand, but they often struggle to scale effectively. On the other hand, feature attribution techniques like SHAP are well-suited to...
This abstract outlines my current research for my PhD thesis, focusing specifically on creating a synthetic dataset for multi-camera multi-object tracking (MCMOT) within logistics applications.
Motivation: Tracking moving assets such as trucks, trailers, or containers in logistics yards is crucial for developing digital twins, measuring key performance indicators, and enhancing operational...
Understanding causal relationships in oncology is critical for optimizing treatment strategies and generating testable biomedical hypotheses. We present CaDSIm (Causal Discovery with Simultaneous Imputation), a novel method for learning causal structures and associated Structural Equation Models (SEMs) from real-world data.
Our approach addresses three key objectives: Validation,...
Cross-knowledge-graph (KG) learning is hindered because embeddings trained independently occupy incompatible vector spaces, while pre-merging KGs to enforce consistency is computationally infeasible at web scale. We present WHALE-embeddings, a continuously updated resource derived from Web Data Commons (~98B RDF triples across ~22M domains). By partitioning the corpus by website and training...
Dynamical systems governed by ordinary differential equations (ODEs) serve as models for a vast number of natural and social phenomena. In this work, we offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. Specifically, we revisit ideas from amortized inference and neural operators, and propose...
The Lamarr Scientific Forum is rounding off the first day with a closing and all information needed on dinner plans.
We begin program day number 2 with a short look back at the previous day and ahead at today's program.
Our research in Human-Centered AI focuses on enabling domain experts to actively guide and interpret machine learning (ML) processes through interactive, knowledge-driven methods. We develop visual analytics (VA) techniques that support expert involvement in both the construction and interpretation of ML models, with the goal of improving transparency, trust, and alignment with human...
The Lamarr interdisciplinary research area "AI in Life Sciences and Health" will provide an overview of its organization and scientific focal points in the life sciences including drug discovery, medicine, and health. We will introduce research groups participating in this area and key collaborations with external partners and institutions. In addition, recent research progress will be...
This talk provides an overview of recent research in the area of Embodied AI at Lamarr. Embodied Artificial Intelligence refers to AI that is embedded in physical systems, such as robots, and can interact with the surroundings. In contrast to classic Machine Learning in robotics, embodied AI encapsulates all aspects of interacting and learning in an environment: from perception, via...
Over the past year, the Natural Language Processing (NLP) research area at the Lamarr Institute has made significant strides toward building more robust, context-aware, and aligned language technologies. This talk will provide an overview of key developments in this area and our future plans. We will highlight flagship publications, newly funded projects, strategic collaborations and...
The Industry and Production research area focuses on the integration of artificial intelligence and machine learning (ML) into production technology. The main objectives are to ensure consistent product quality while minimizing the use of resources such as machine time, tools, materials and energy. This presentation provides an overview of the main research topics of the area, which are...
Recent works for time-series forecasting more and more leverage the high predictive power of Deep Learning models.
With this increase in model complexity, however, comes a lack in understanding of the underlying model decision process, which is problematic for high-stakes application scenarios. At the same time, simple, interpretable forecasting methods such as ARIMA still perform very...
As a major unsupervised learning method, clustering has received a lot of attention over multiple decades. The various clustering problems that have been studied intensively include, e.g., the k-means problem and the k-center problem. How- ever, in applications, it is common that good clusterings should optimize multiple objectives (e.g., visualizing data on a map by clustering districts into...
Chirality information (i.e., information that allows distinguishing left from right) is ubiquitous for various data modes in computer vision, including images, videos, point clouds, and meshes. Contrary to symmetry, for which there has been a lot of research in the image domain, chirality information in shape analysis (point clouds and meshes) has remained underdeveloped. Although many shape...
Despite advances in conversational systems, the evaluation of such systems remains a challenging problem. Current evaluation paradigms often rely on costly homogeneous human annotators or oversimplified automated metrics, leading to a critical gap in socially aligned conversational agents, where pluralistic values (i.e., acknowledging diverse human experiences) are essential to reflect the...
We explore what it means to build a scientific "theory" of a black-box model, drawing on van Fraassen's Constructive Empiricism (CE), and demonstrate how such a theory can be used for explainable AI (XAI).
A scientific theory is more than just an explanation: it not only has value in its own right, but also serves as a robust framework for answering different questions.
According to CE, a...
Service robots operating in cluttered human environments such as homes, offices, and schools cannot rely on predefined object arrangements and must continuously update their semantic and spatial estimates while dealing with possible frequent rearrangement. Identifying all objects in cluttered, occlusion-heavy environments, such as shelves, requires selecting informative viewpoints and...
Surgical gauze is an essential part of surgical procedures, which is primarily used for controlling bleeding and absorbing bodily fluids. The post-surgical retention of gauze can lead to serious complications in the patient’s health and necessitate additional surgery for gauze removal. In the wake of data scarcity, the research on gauze segmentation on the real-world surgical data remains...
In this work, we address unsupervised temporal action segmentation, which segments a set of long, untrimmed videos into semantically meaningful segments that are consistent across videos. While recent approaches combine representation learning and clustering in a single step for this task, they do not cope with large variations within temporal segments of the same class. To address this...
Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EMA): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a...
Large Language Models (LLMs) remain vulnerable to adversarial jailbreaks, yet existing attacks rely on handcrafted priors or require white-box access for gradient propagation. We show that token-level iterative optimization can succeed without gradients and introduce RAILS (RAndom Iterative Local Search), a simple yet effective method using only model logits with a query budget comparable to...
In the healthcare domain, sensitive patient data is inherently decentralized across institutions and cannot be centralized due to strict privacy regulations. Federated learning offers a collaborative model training without explicitly sharing patient data by communicating model parameters or soft labels. These approaches, however, are still vulnerable to privacy leakage and often limit model...
Social sciences define values as preferred behaviors or outcomes that motivate an individual's actions or judgments.
While LLMs often reflect biases from their training data, it remains unclear what values underlie their generation processes, and whether such internal value systems can be measured or modified.
In this paper, we investigate whether fine-tuning can steer a model’s internal...
In this article, we propose a novel quantum regression model by extending the Real-Part Quantum SVM. We apply our model to the problem of stability limit prediction in milling processes, a key component in high-precision manufacturing. To train our model, we use a custom data set acquired by an extensive series of milling experiments using different spindle speeds, enhanced with a custom...
Detecting temporal abnormal patterns over streaming data is challenging due to volatile data properties and the lack of real-time labels. The abnormal patterns are usually hidden in the temporal context, which cannot be detected by evaluating single points. Furthermore, the normal state evolves over time due to concept drifts. A single model does not fit all data over time. Autoencoders are...
While many have analyzed the resource efficiency of trained models, an important question remains: How can one be sustainable and resource-aware during AI development, or in other words, when looking for a suitable model to train on a specific learning task? AutoML can help with finding well-performing models on given data, however these frameworks overly focus on predictive quality and...
Pallets are one of the most important load carriers for international supply chains. Yet, continuously tracking activities such as driving, lifting or standing along their life cycle is hardly possible. As part of a preliminary project, it was shown that it is possible to develop a prediction model for pallet activities using data from inertial measurements units mounted on a pallet. A...
As part of the The Institute for Science and Ethics (IWE), the Bonn Sustainable AI Lab postualtes sustainable AI as AI for sustainability and sustainability of AI. It aims to measure and assess the diverse environmental impacts of AI, research ways of making AI systems more sustainable, and address AI in the context of the Sustainable Development Goals.
The Bonn Sustainable AI Lab and its...