SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

テクノロジーカテゴリーの変更を依頼記事元:

arxiv.org

4users がブックマークコメント

コメント

1

記事へのコメント1件

注目コメント
新着コメント

misshiki 論文 “SkillsBench：多様なタスクにおけるエージェントスキルの有効性を評価するためのベンチマーク”

人工知能

2026/02/19 リンク

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

規約違反を報告

アプリのスクリーンショット

いまの話題をアプリでチェック！

バナー広告なし
ミュート機能あり
ダークモード搭載

アプリをダウンロード

関連記事

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Agent Skills are structured packages of procedural knowledge that augment LLM agents at inference... Agent Skills are structured packages of procedural knowledge that augment LLM agents at inference time. Despite rapid adoption, there is no standard way to measure whether they actually help. We present SkillsBench, a benchmark of 86 tasks across 11 domains paired with curated Skills and deterministic verifiers. Each task is evaluated under three conditions: no Skills, curated Skills, and self-gen

ブックマークしたユーザー

misshiki2026/02/19
tasukuchan2026/02/17
ktykogm2026/02/17

同じサイトの新着

同じサイトの新着をもっと読む

いま人気の記事

いま人気の記事をもっと読む

いま人気の記事 - テクノロジー

いま人気の記事 - テクノロジーをもっと読む

新着記事 - テクノロジー

新着記事 - テクノロジーをもっと読む

いま人気の記事 - 企業メディア

企業メディアをもっと読む

設定を変更しましたx