μLO: Compute-Efficient Meta-Generalization
A simple meta-training recipe for μ-parameterized LOs (μLOs). (ICLR)
Topics:
Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can struggle to optimize unseen tasks (meta-generalize), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization (μP) for two state-of-the-art learned optimizer architectures and propose a simple meta-training recipe for μ-parameterized LOs (μLOs). Our empirical evaluation demonstrates that LOs meta-trained with our recipe substantially improve meta-generalization to wider unseen tasks when compared to LOs trained under standard parametrization (SP) using the same compute budget. We also empirically observe that μLOs exhibit unexpectedly improved meta-generalization to deeper networks (5× meta-training) and surprising generalization to much longer training horizons (25× meta-training) when compared to SP LOs.
Latest publications
Multi-label node classification with label influence propagation
A novel GNN-based model for multi-label node classification that propagates label influences on graphs.
ICLRTransfer learning with deep tabular models
Deep learning for tabular data research shows deep tabular models help bridge gaps between decision trees and neural networks.
ICLRAlignment-weighted DPO
A DPO that targets the most problematic parts of an output by assigning different preference weights.
ICLR