Lamarr Scientific Forum

Name: Lamarr Scientific Forum
Start: 2025-09-03T08:30:00+02:00
End: 2025-09-04T18:00:00+02:00
Location: Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Sep 3 – 4, 2025

Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Europe/Berlin timezone

Contact

LLM Value Alignment

HCAI.1.2

Sep 4, 2025, 1:00 PM

1h 15m

Open Space (first floor)

Poster Human-centered AI Systems Poster Session

Shangrui Nie (Bonn-Aachen International Center for Information Technology (b-it))

Social sciences define values as preferred behaviors or outcomes that motivate an individual's actions or judgments.
While LLMs often reflect biases from their training data, it remains unclear what values underlie their generation processes, and whether such internal value systems can be measured or modified.
In this paper, we investigate whether fine-tuning can steer a model’s internal moral preferences and whether such changes manifest in downstream behavior.
Building on a taxonomy of 20 human values, we fine-tune models using two approaches: supervised fine-tuning (SFT) with scalar value ratings in a survey; and direct preference optimization (DPO) with contrastive sentence pairs.
Each method downgrades a target value while keeping others fixed.
We evaluate models on moral judgments in the Am I The Asshole subreddit, using GPT-labeled examples with high vs. low value standards.
We measure both prediction change rate and directional consistency with expected value shifts.
Results show that SFT is more effective than DPO at inducing value-aligned behavioral changes, especially for values with sufficient evaluation data. These findings suggest that value-specific instruction tuning offers a promising path for aligning LLMs' moral behavior.

Shangrui Nie (Bonn-Aachen International Center for Information Technology (b-it))

There are no materials yet.

Lamarr Scientific Forum

Contact

LLM Value Alignment

Open Space (first floor)

Speaker

Description

Author

Presentation materials

Choose timezone

Lamarr Scientific Forum

Contact

Speaker

Description

Author

Presentation materials