Refusal tokens: a simple way to calibrate refusals

Refusal tokens enable controlling a single model's refusal rates without the need of any further fine-tuning.

COLM
October 7, 2025

Latest publications