Lamarr Scientific Forum

Name: Lamarr Scientific Forum
Start: 2025-09-03T08:30:00+02:00
End: 2025-09-04T18:00:00+02:00
Location: Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Sep 3 – 4, 2025

Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Europe/Berlin timezone

Contact

Pairwise-TD: A Bellman Operator for Relative Value Learning

EMB.3.2

Sep 4, 2025, 1:00 PM

1h 15m

Open Space (first floor)

Poster Embodied AI Poster Session

Marc Höftmann

Reinforcement learning traditionally learns absolute state values, estimating how good a particular situation is in isolation. Yet in both biological systems and practical decision-making, what often matters is not the absolute value of a state, but how it compares to alternatives. Motivated by empirical findings in neuroscience, we introduce \textbf{Pairwise-TD}, a novel framework that learns \emph{value differences} directly.
Our method defines a new pairwise Bellman operator that estimates the relative value $\Delta(s_i, s_j) = V(s_i) - V(s_j)$, bypassing the need to ever compute $V(s)$ explicitly. We prove that this operator is a $\gamma$-contraction in a structured function space, ensuring convergence to a unique fixed point. Pairwise-TD integrates naturally into on-policy actor-critic methods and enables exact recovery of Generalized Advantage Estimation (GAE) using only pairwise differences. Hereby, we derive a pseudo-value approach that yields an unbiased policy gradient estimator despite the absence of an explicit value baseline. To address pair-wise comparisons in episodic environments with terminal states, we introduce a principled scheme for computing Bellman targets using only observable quantities, ensuring correct learning even when episode lengths vary. Finally, we present a lightweight neural network architecture that enforces antisymmetry via a shared encoder and linear projection, further improving the structure of our relative value function. Together, these contributions offer a biologically inspired, practically effective, and theoretically grounded alternative to traditional value learning.

Marc Höftmann

Jan Robine (TU Dortmund) Stefan Harmeling

There are no materials yet.

Lamarr Scientific Forum

Contact

Pairwise-TD: A Bellman Operator for Relative Value Learning

Open Space (first floor)

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Lamarr Scientific Forum

Contact

Speaker

Description

Author

Co-authors

Presentation materials