EconWebArena: Benchmarking autonomous agents on economic tasks in realistic web environments

A benchmark for evaluating autonomous agents on complex, multimodal economic tasks in realistic web environments.


Latest publications