Refusal tokens: a simple way to calibrate refusals in LLMs

A simple technique using refusal tokens to control and calibrate refusal behavior in large language models.

NeurIPS
December 24, 2024

Latest publications