Optimizing reasoning efficiency through prompt difficult prediction

A routing approach that assigns each problem to the smallest model likely to solve it, reducing compute.

NeurIPS

December 2, 2025

Topics:

Benchmark Reasoning & Chain-of-thought (CoT)

Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment, and correctness-based routing matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing offers an effective strategy for cost-efficient deployment of reasoning models.

View article