Immune: Improving safety against jailbreaks in Multi-modal LLMs via Inference-Time Alignment

An inference-time defense framework that leverages a safe reward model to defend against jailbreak attacks.


Latest publications