Critique-guided distillation for robust reasoning via refinement
A training framework that decouples critique consumption from critique generation.
Supervised fine-tuning (SFT) with expert demonstrations often suffers from the \textit{imitation problem}, where models reproduce correct responses without internalizing the underlying reasoning process. We propose \textsc{Critique-Guided Distillation (CGD)}, a training framework that teaches models to self-correct by augmenting SFT with teacher-generated explanatory critiques. Instead of directly imitating teacher outputs, the student learns to map the triplet of (prompt, initial student response, teacher critique) to a refined teacher response, thereby capturing both the \textit{cause} of the error and the \textit{logic} of the correction. On mathematical reasoning benchmarks, \textsc{CGD} achieves substantial gains across the LLaMA and Qwen families (e.g., +15.0\% on AMC23 and +12.2\% on MATH-500), while successfully avoiding the format drift and forgetting observed in prior methods. Cross-family validation on Qwen2.5-Math-7B demonstrates robustness to teacher variance (from Claude Sonnet 3.7 to weaker open-source models), achieving state-of-the-art performance (50.4 avg, +22.6\% over base) that rivals complex reinforcement learning baselines while requiring 144$\times$ less compute. Notably, \textsc{CGD} exhibits strong out-of-distribution generalization: despite training on data containing no code, it improves zero-shot HumanEval performance by +4.88\% and maintains robustness on general benchmarks (GPQA, TruthfulQA) where baselines suffer catastrophic forgetting (-21.3\% on IFEval). These results establish $\text{CGD}$ as a cost-effective intermediate training paradigm that can serve as a warm-start before reasoning SFT or RL, offering a scalable enhancement to modern LLM training workflows.
Latest publications
Dynamic guardian models: realtime content moderation with user-defined policies
Specialized classifiers that evaluate text based on predefined trustworthiness objectives.
ICMLLLM-SRBench: A new benchmark for scientific equation discovery with Large Language Models
A comprehensive benchmark designed to evaluate LLM-based scientific equation discovery methods.
ICMLEPSVec: Efficient and private synthetic data generation via dataset vectors
A differentially-private lightweight alternative that steers LLM generation using dataset vectors.
ICML