RainbowPO: A unified framework for combining improvements in Preference Optimization
This new framework enhances preference optimization for better AI alignment with human values.
Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern which components genuinely enhance downstream performance. In this work, we propose RainbowPO, a unified framework that demystifies the effectiveness of existing DPO methods by categorizing their key components into seven broad directions. We integrate these components into a single cohesive objective, enhancing the performance of each individual element. Through extensive experiments, we demonstrate that RainbowPO outperforms existing DPO variants. Additionally, we provide insights to guide researchers in developing new DPO methods and assist practitioners in their implementations.
Latest publications
Alignment-weighted DPO
A DPO that targets the most problematic parts of an output by assigning different preference weights.
ICLREPSVec: Efficient and Private Synthetic Data Generation
A private text generation method that steers LLM generation using dataset vectors. (ICLR)
ICLRYour model diversity determines reasoning strategy
A framework decomposing reasoning uncertainty and deriving conditions where depth refinement outperforms parallel sampling. (ICLR)
ICLR