Searching for efficient linear layers over a continuous space of structured matrices

Searching for efficient linear operators with optimal scaling laws leading to the development of the BTT-MoE architecture.


Latest publications