R3: robust rubric-agnostic reward models
A novel reward modeling framework that is rubric-agnostic, generalizable, and provides reasoned score assignments.
Reward models are essential for aligning language model outputs with human preferences, yet existing approaches often lack both controllability and interpretability. These models are typically optimized for narrow objectives, limiting their generalizability to broader downstream tasks. Moreover, their scalar outputs are difficult to interpret without contextual reasoning. To address these limitations, we introduce R3, a novel reward modeling framework that is rubric-agnostic, generalizable across evaluation dimensions, and provides interpretable, reasoned score assignments. R3 enables more transparent and flexible evaluation of language models, supporting robust alignment with diverse human values and use cases.
Latest publications
BEDTime: A unified benchmark for automatically describing time series
The first benchmark dataset to assess models on each task, comprising four datasets reformatted for these tasks.
NeurIPSViCrit: a verifiable reinforcement learning proxy task for visual perception in VLMs
An RL proxy task that trains VLMs to localize synthetic hallucinations injected into human-written captions.
NeurIPSSoTA with less: MCTS-guided sample selection for data-efficient visual reasoning self-improvement
Visual reasoning models that achieve SoTA performance using an order of magnitude fewer training samples.
NeurIPS