[B! LanguageModel][benchmark] efclのブックマーク

efcl id:efcl

LanguageModelとbenchmarkに関するefclのブックマーク (2)

SWE-bench Leaderboard
We generated 50k+ task instances with SWE-smith to train SWE-agent-LM-32B (open-weight SotA on Verified). More in the paper!
efcl 2025/06/21
LLMのコードベンチマーク

LanguageModel

benchmark
リンク
Aider LLM Leaderboards
Aider LLM Leaderboards Aider excels with LLMs skilled at writing and editing code, and uses benchmarks to evaluate an LLM’s ability to follow instructions and edit code successfully without human intervention. Aider’s polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust. Aider polyglot coding leaderboard
efcl 2025/04/29
LLMのベンチマークとコスト

LanguageModel

benchmark
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx