Optimizing reasoning efficiency through prompt difficult prediction
A routing approach that assigns each problem to the smallest model likely to solve it, reducing compute.
Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment, and correctness-based routing matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing offers an effective strategy for cost-efficient deployment of reasoning models.
Latest publications
Leveraging parameter space symmetries for reasoning skill transfer in LLMs
Utilizing an alignment-first strategy to transfer advanced reasoning skills to a non-reasoning model.
NeurIPSEconWebArena: Benchmarking autonomous agents on economic tasks in realistic web environments
A benchmark for evaluating autonomous agents on complex, multimodal economic tasks in realistic web environments.
NeurIPSBEDTime: A unified benchmark for automatically describing time series
The first benchmark dataset to assess models on each task, comprising four datasets reformatted for these tasks.
NeurIPS