SAEfarer: exploring text classification models with sparse autoencoders

Leveraging SAEs to analyze the behavior of text classification LMs.


Latest publications