How we compare model quality in Cursor · Cursor

テクノロジーカテゴリーの変更を依頼記事元:

cursor.com

3users がブックマークコメント

記事へのコメント1件

注目コメント
新着コメント

misshiki CursorがAIコーディングエージェント評価ベンチマーク「CursorBench」を公開。実際のリポジトリ操作や複数ステップの開発タスクで、エージェントがどれだけ問題解決できるかを測定する。GPTが強い。

2026/03/13 リンク

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

規約違反を報告

いまの話題をアプリでチェック！

バナー広告なし
ミュート機能あり
ダークモード搭載

アプリをダウンロード

How we compare model quality in Cursor · Cursor

Developers are asking coding agents to take on longer, more complex tasks that span multiple file... Developers are asking coding agents to take on longer, more complex tasks that span multiple files, tools, and steps. As these requests grow in scope, the evals that measure agent performance need to evolve with them. At Cursor, we use a hybrid online-offline eval process to keep our understanding of model quality aligned with what developers actually do. The offline part uses CursorBench, our int