Does thinking more always help? Understanding test-time scaling in reasoning models

A study across models and benchmarks and an alternative test-time scaling approach.


Latest publications