Sep 3 – 4, 2025
Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn
Europe/Berlin timezone

Pairwise-TD: A Bellman Operator for Relative Value Learning

Not scheduled
1h 30m
Open Space (first floor)

Open Space (first floor)

Poster Embodied AI Poster Session

Speaker

Marc Höftmann

Description

Reinforcement learning traditionally learns absolute state values, estimating how good a particular situation is in isolation. Yet in both biological systems and practical decision-making, what often matters is not the absolute value of a state, but how it compares to alternatives. Motivated by empirical findings in neuroscience, we introduce \textbf{Pairwise-TD}, a novel framework that learns \emph{value differences} directly.
Our method defines a new pairwise Bellman operator that estimates the relative value $\Delta(s_i, s_j) = V(s_i) - V(s_j)$, bypassing the need to ever compute $V(s)$ explicitly. We prove that this operator is a $\gamma$-contraction in a structured function space, ensuring convergence to a unique fixed point. Pairwise-TD integrates naturally into on-policy actor-critic methods and enables exact recovery of Generalized Advantage Estimation (GAE) using only pairwise differences. Hereby, we derive a pseudo-value approach that yields an unbiased policy gradient estimator despite the absence of an explicit value baseline. To address pair-wise comparisons in episodic environments with terminal states, we introduce a principled scheme for computing Bellman targets using only observable quantities, ensuring correct learning even when episode lengths vary. Finally, we present a lightweight neural network architecture that enforces antisymmetry via a shared encoder and linear projection, further improving the structure of our relative value function. Together, these contributions offer a biologically inspired, practically effective, and theoretically grounded alternative to traditional value learning.

Author

Co-authors

Presentation materials

There are no materials yet.