Routing with generated data
A setting in which routers are trained on generated queries and answers produced from high-level task descriptions. (ACL)
Large Language Models (LLMs) routers dynamically select optimal models for given inputs. Existing approaches typically assume access to ground-truth labeled data, which is often unavailable in practice, especially when user request distributions are heterogeneous and unknown. We introduce Routing with Generated Data (RGD), a challenging setting in which routers are trained exclusively on generated queries and answers produced from high-level task descriptions by generator LLMs. We evaluate query-answer routers (using both queries and labels) and query-only routers across four diverse benchmarks and 12 models, finding that query-answer routers deteriorate faster than query-only routers as generator quality decreases. We further propose CASCAL, a novel query-only router that estimates model correctness through consensus voting and identifies model-specific skill niches via hierarchical clustering. CASCAL is substantially more robust to generator quality, outperforming the best query-answer router by 4.6% absolute accuracy when trained on weak generator data. Our analysis reveals two crucial characteristics of effective generators: they must accurately respond to their own questions, and their questions must produce sufficient performance differentiation among the model pool. We then show how filtering for these characteristics can improve the quality of generated data.
Latest publications
Macaron: Controlled, human-written benchmark
A template-first benchmark that factorizes reasoning type and cultural aspect across question languages. (ACL)
ACLCommonLID: Re-evaluating language identification performance
A community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. (ACL)
ACLDo language models understand honorific systems in Javanese?
The ability of LMs to process Javanese honorifics through classification and machine translation tasks.
ACL