Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Age
📚 Cite paper. 🔥 Mar 26: Andrew Ng gave a shoutout to AutoGen in What's next for AI agentic workflows at Sequoia Capital's AI Ascent. 🔥 Mar 3: What's new in AutoGen? 📰Blog; 📺Youtube. 🔥 Mar 1: the first AutoGen multi-agent experiment on the challenging GAIA benchmark achieved the No. 1 accuracy in all the three levels. 🎉 Jan 30: AutoGen is highlighted by Peter Lee in Microsoft Research Forum
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く