GenARM: Reward guided generation with autoregressive reward model for Test-time Alignment

A test-time alignment approach that leverages the Autoregressive Reward Model.


Latest publications