Is poisoning a real threat to LLM alignment? Maybe more so than you think.

The vulnerabilities of DPO to poisoning attacks and the effectiveness of preference poisoning.


Latest publications