GPO: learning from critical steps to improve LLM reasoning

A novel fine-tuning strategy designed to improve LLM multi-step reasoning capabilities by focusing on pivotal moments.


Latest publications