Training dynamics underlying language model scaling laws: loss deceleration and zero-sum learning

Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws.


Latest publications