Academics can contribute to domain-specialized language models
Argues for academic focus on domain-specific language models, where they can excel beyond general-purpose leaderboards.
Topics:
Commercially available models dominate academic leaderboards. While impressive, this has concentrated research on creating and adapting general-purpose models to improve NLP leaderboard standings for large language models. However, leaderboards collect many individual tasks and general-purpose models often underperform in specialized domains; domainspecific or adapted models yield superior results. This focus on large general-purpose models excludes many academics and draws attention away from areas where they can make important contributions. We advocate for a renewed focus on developing and evaluating domain- and task-specific models, and highlight the unique role of academics in this endeavor
Latest publications
GRAID: Synthetic data generation with geometric constraints and multi-agentic reflection for harmful content detection
A novel pipeline that leverages Large Language Models (LLMs) for dataset augmentation.
EMNLPMINERS: multilingual language models as semantic retrievers
A benchmark to evaluate multilingual language models for retrieving semantic similarities across 200+ languages.
EMNLPRe-evaluating evaluation for multilingual summarization
Standard metrics fail in non-English summarization, prompting a need for more nuanced evaluation frameworks.
EMNLP