Lamarr Scientific Forum

Name: Lamarr Scientific Forum
Start: 2025-09-03T08:30:00+02:00
End: 2025-09-04T18:00:00+02:00
Location: Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Sep 3 – 4, 2025

Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Europe/Berlin timezone

Contact

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

NLP.2.2

Sep 4, 2025, 1:00 PM

1h 15m

Open Space (first floor)

Poster Natural Language Processing Poster Session

Zhakshylyk Nurlanov (Learning and Optimisation for Visual Computing Group, University of Bonn)

Large Language Models (LLMs) remain vulnerable to adversarial jailbreaks, yet existing attacks rely on handcrafted priors or require white-box access for gradient propagation. We show that token-level iterative optimization can succeed without gradients and introduce RAILS (RAndom Iterative Local Search), a simple yet effective method using only model logits with a query budget comparable to gradient-based approaches. To improve attack success rates (ASRs), we incorporate a novel auto-regressive loss and history buffer-based candidate selection for few-shot attacks, achieving near 100\% ASRs on robust open-source models. By eliminating token-level gradients, RAILS enables cross-tokenizer attacks. Notably, attacking ensembles of diverse models significantly enhances adversarial transferability, as demonstrated on closed-source systems such as GPT-3.5, GPT-4, and Gemini Pro. These findings demonstrate that handcrafted priors and gradient access are not necessary for successful adversarial jailbreaks, highlighting fundamental vulnerabilities in current LLM alignment.

Zhakshylyk Nurlanov (Learning and Optimisation for Visual Computing Group, University of Bonn)

Florian Bernard Dr Frank R. Schmidt (Bosch Center for Artificial Intelligence)

There are no materials yet.

Lamarr Scientific Forum

Contact

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks

Open Space (first floor)

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Lamarr Scientific Forum

Contact

Speaker

Description

Author

Co-authors

Presentation materials