Lamarr Scientific Forum

Name: Lamarr Scientific Forum
Start: 2025-09-03T08:30:00+02:00
End: 2025-09-04T18:00:00+02:00
Location: Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Sep 3 – 4, 2025

Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn

Europe/Berlin timezone

Contact

CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration

NLP.1.2

Sep 4, 2025, 1:00 PM

1h 15m

Open Space (first floor)

Board: NLP.1

Poster Natural Language Processing Poster Session

Mr Vahid Sadiri Javadi (University of Bonn)

Despite advances in conversational systems, the evaluation of such systems remains a challenging problem. Current evaluation paradigms often rely on costly homogeneous human annotators or oversimplified automated metrics, leading to a critical gap in socially aligned conversational agents, where pluralistic values (i.e., acknowledging diverse human experiences) are essential to reflect the inherently subjective and contextual nature of dialogue quality. In this paper, we propose CINEMETRIC, a novel framework that operationalizes pluralistic alignment by leveraging the perspectivist capacities of large language models. Our approach introduces a mechanism where LLMs simulate a diverse set of evaluators, each with distinct personas constructed by matching real human annotators to movie characters based on both demographic profiles and annotation behaviors. These role-played characters independently assess subjective tasks, offering a scalable and human-aligned alternative to traditional evaluation. Empirical results show that our approach consistently outperforms baseline methods, including LLM as a Personalized Judge, across multiple LLMs, showing high and consistent agreement with human ground truth. CINEMETRIC improves accuracy by up to 20% and reduces mean absolute error in toxicity prediction, demonstrating its effectiveness in capturing human-like perspectives. We further extend CINEMETRIC with a causal analysis pipeline to identify how latent factors such as cultural background and personality traits cause systematic differences in toxicity perception across perspectives, bridging pluralistic alignment with interpretability.

Mr Vahid Sadiri Javadi (University of Bonn) Mr Zain Ul Abedin Prof. Lucie Flek (University of Bonn)

LAMARR - Poster - CINEMETRIC.pdf

Lamarr Scientific Forum

Contact

CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration

Open Space (first floor)

Speaker

Description

Authors

Presentation materials

Choose timezone

Lamarr Scientific Forum

Contact

Speaker

Description

Authors

Presentation materials