Natural Language Processing
[lamarr-nlp] Guest Talk by Ivaxi Sheth from CISPA Helmholtz Center for Information Security | AI Safety beyond a Single Response: Persistent Memory to Open-Endedness
by
→
Europe/Berlin
Description
As part of the Lamarr NLP Colloquium, we have the pleasure to host Ivaxi Sheth from the CISPA Helmholtz Center for Information Security. Ivaxi will give a talk on safety in the context of open-ended AI.
Title: AI Safety beyond a Single Response: Persistent Memory to Open-Endedness
Abstract:
AI systems are increasingly moving beyond static, task-bounded interaction toward agents that adapt, remember, and act across long horizons. This shift creates a new safety problem: risks are no longer confined to a single prompt or output, but can emerge from accumulated state, evolving behavior, and delayed influence over future decisions. In this talk, I connect three lines of work that frame this challenge and study it empirically. First, I discuss why open-ended AI requires safety mechanisms that anticipate unpredictability, emergent misalignment, and loss of control before deployment, rather than treating safety as a post-hoc constraint. I then focus on persistent memory as a concrete instance of this broader problem. Through PersistBench, we show that long-term memory can improve personalization but also causes intrusive response and memory-induced sycophancy. Finally, I present sleeper memory poisoning, a delayed attack in which adversarial content corrupts what an assistant remembers, allowing malicious influence to reappear across future conversations and agentic actions. Together, these works argue for a lifecycle view of AI safety: safe systems must govern not only what models generate, but also what they store, retrieve, retain, forget, and use as they evolve over time.
Bio:
Ivaxi Sheth is a PhD student at the CISPA Helmholtz Center for Information Security, supervised by Prof. Mario Fritz. Her research focuses on agents for scientific discovery and safety and ethics of self-evolving AI systems. More recently, she has been investigating persistent memory for LLMs, studying both its benefits for personalization along with the potential risks, including cross-domain leakage, sycophancy, sleeper agents, and long-term behavioral manipulation. She has published at major ML and NLP venues, including NeurIPS, ICML, ACL, EMNLP, NAACL, and EACL, and has co-organized the Women in Computer Vision (WiCV) workshops at CVPR 2022 and CVPR 2023.
Ivaxi Sheth is a PhD student at the CISPA Helmholtz Center for Information Security, supervised by Prof. Mario Fritz. Her research focuses on agents for scientific discovery and safety and ethics of self-evolving AI systems. More recently, she has been investigating persistent memory for LLMs, studying both its benefits for personalization along with the potential risks, including cross-domain leakage, sycophancy, sleeper agents, and long-term behavioral manipulation. She has published at major ML and NLP venues, including NeurIPS, ICML, ACL, EMNLP, NAACL, and EACL, and has co-organized the Women in Computer Vision (WiCV) workshops at CVPR 2022 and CVPR 2023.
Looking forward to your participation.
Date: Wednesday, June 10, 2026
Time: 11:00 - 12:00 pm (CET).
Location: Friedrich-Hirzebruch-Allee 6, 53115 Bonn, Germany
Room: 2.122 + Zoom
Zoom: https://uni-bonn.zoom-x.de/j/63819604806?pwd=64PSGa9HyTym9j1bjy6jhcJF3eHebi.1
Date: Wednesday, June 10, 2026
Time: 11:00 - 12:00 pm (CET).
Location: Friedrich-Hirzebruch-Allee 6, 53115 Bonn, Germany
Room: 2.122 + Zoom
Zoom: https://uni-bonn.zoom-x.de/j/63819604806?pwd=64PSGa9HyTym9j1bjy6jhcJF3eHebi.1