EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments

Zefang Liu1,*, Yinzhu Quan1,*,
1Georgia Institute of Technology
*These authors contributed equally to this work.

EconWebArena is a benchmark for evaluating LLM agents on realistic economic tasks that require web navigation, data extraction, and reasoning. It includes 360 curated tasks from 82 real-world websites. Current models struggle with accuracy, grounding, and complex web interactions, revealing the need for more capable, domain-aware agents.

Demo Videos

The following videos demonstrate agent behavior on economic data tasks. Playback is shown at 5× speed.

Task Categories

Task categories in EconWebArena

Website Examples

Examples of websites used in EconWebArena tasks

Task Examples

Category Task Description Start URL Answer Domain
Government As published by the Office for National Statistics, what was the CPIH annual inflation rate for all items (2015=100) in the United Kingdom in March 2025? Provide only the number as a decimal with one digit after the decimal point, without percent symbols or other units. ons.gov.uk 3.4 ons.gov.uk
Energy As reported by the U.S. Energy Information Administration, what was the average retail price of regular gasoline in California during the week of March 24, 2025, in dollars per gallon? Provide only the number as a decimal with three digits after the decimal point, without currency symbols, commas, or other units. eia.gov 4.418 eia.gov
Markets As reported by Cox Automotive, what was the total number of unsold used vehicles in the United States as of March 31, 2025? Provide only the number as a decimal with two digits, in millions, without commas or other units. coxautoinc.com 2.14 coxautoinc.com
Banking As reported by the Federal Reserve Bank of New York, what was the effective federal funds rate on January 10, 2025? Provide only the number as a decimal with two digits, without percent symbols or other units. newyorkfed.org 4.33 newyorkfed.org

Model Performance

Category Tasks o4-mini GPT-4.1 GPT-4o Claude-Sonnet-4 Llama-4-Maverick
Banking 60 41.7% 23.3% 18.3% 38.3% 21.7%
Finance 21 33.3% 14.3% 14.3% 23.8% 9.5%
Government 138 56.5% 44.9% 36.2% 47.1% 26.1%
Labor 24 25.0% 0.0% 8.3% 12.5% 4.2%
Markets 60 48.3% 35.0% 33.3% 41.7% 15.0%
Other* 57 38.6% 22.8% 21.1% 29.8% 12.3%
All 360 46.4% 31.4% 27.2% 38.3% 18.9%

*Other categories: Energy, RealEstate, Trade, Education, and Health.

Video

BibTeX

@article{liu2025econwebarena,
  title={EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments},
  author={Zefang Liu and Yinzhu Quan},
  year={2025}
}