benchの人気記事 19件 - はてなブックマーク

1 - 19 件 / 19件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

benchの検索結果1 - 19 件 / 19件

タグ検索の該当結果が少ないため、タイトル検索結果を表示しています。

benchに関するエントリは19件あります。金融、未分類、日本語などが関連タグです。人気エントリには『有価証券報告書を用いた日本語金融LLMベンチマーク「EDINET-Bench」、Sakana AIが公開／高度な金融タスクにてAIがどの程度対応できるかを評価』などがあります。

有価証券報告書を用いた日本語金融LLMベンチマーク「EDINET-Bench」、Sakana AIが公開／高度な金融タスクにてAIがどの程度対応できるかを評価
- 128 users
- forest.watch.impress.co.jp
- テクノロジー
- 2025/06/09
- AI
- あとで読む
- llm
- 金融
- 未分類
- 日本語
- finance
- 投資
SWE-bench Leaderboards
- 19 users
- www.swebench.com
- テクノロジー
- 2024/05/31
SWE-bench Bash Only uses the SWE-bench Verified dataset with the mini-SWE-agent environment for all models [Post]. SWE-bench Lite is a subset curated for less costly evaluation [Post]. SWE-bench Verified is a human-filtered subset [Post]. SWE-bench Multimodal features issues with visual elements [Post]. Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full,
- LLM
- Benchmark
- ai
- github
- development
- あとで読む
特殊部隊の証！【アメリカ軍装備品】陸軍特殊部隊記念ナイフ（グリーンベレー隊員用・ベンチメイド社製）とは？ 0824 🇺🇸 ミリタリー US ARMY SPECIAL FORCE SOG KNIFE（BENCH MADE）1970S - いつだってミリタリアン！
- 18 users
- www.military-spec-an.com
- 暮らし
- 2021/07/28
今回は、1970年代のアメリカ陸軍特殊部隊記念ナイフを分析します。その刻印などから、グリーンベレー隊員用のようですね。おそらく当時ものだと思われますが、詳細は不明です。何故か日本人（日系人？）の名前が刻印されていました。中古品で使用感もありますが、程度は良好ですよ！目次１アメリカ陸軍特殊部隊記念ナイフ（グリーンベレー隊員用）とは？２全体及び細部写真です！３その特徴とは？４製造とサイズのデータです！５まとめスポンサーリンクスポンサーリンク１アメリカ陸軍特殊部隊記念ナイフ（グリーンベレー隊員用）とは？以前、有名なアメリカナイフメーカープロデュースのグリーンベレー記念ナイフを分析しました。比較的近年に製造された、美しいモデルでしたね。アメリカ陸軍の特殊部隊「グリーンベレー」については、こちらをご覧ください。⬇︎ アメリカ陸軍特殊部隊群 -
LICENSE updated to template · microsoft/grpc_bench@04c7143
- 10 users
- github.com/microsoft
- テクノロジー
- 2021/12/26
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- microsoft
- あとで読む

AtCoderとSakana AI、組合せ最適化問題におけるAIのアルゴリズムエンジニアリング能力を測るベンチマーク「ALE-Bench」を共同開発
- 6 users
- prtimes.jp
- テクノロジー
- 2025/06/18
AtCoderとSakana AI、組合せ最適化問題におけるAIのアルゴリズムエンジニアリング能力を測るベンチマーク「ALE-Bench」を共同開発 AtCoder株式会社（本社：東京都新宿区/代表取締役社長：高橋直大、以下AtCoder）は、Sakana AI株式会社（本社：東京都港区/David Ha CEO、以下Sakana AI）と共同で、AIによるアルゴリズム開発能力を評価する新たなベンチマーク「ALE-Bench（ALgorithm Engineering Benchmark）」を開発しました。 ALE-Benchは、AtCoderが主催する「AtCoder Heuristic Contest（以下、AHC）」の最適化問題をもとに構成されており、既存のベンチマークでは評価が難しかった、AIが開発した最適化アルゴリズムの性能を客観的・定量的に測ることを可能にしました。またALE
- あとで読む
GitHub - openai/mle-bench: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
- 5 users
- github.com/openai
- テクノロジー
- 2024/10/17
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
GitHub - pfnet-research/pfgen-bench: Preferred Generation Benchmark
- 5 users
- github.com/pfnet-research
- テクノロジー
- 2024/12/27
pfgen-benchmark is a benchmark designed to evaluate Japanese text generation specifically for pretrained models. Unlike conventional benchmarks that use templates containing instructions, this benchmark relies solely on providing numerous examples. By conveying expectations such as the question-answering nature of the task, responses of approximately 100 characters, and outputs resembling formal p
kube-benchの実行結果をAWS Security Hubに連携して管理する - Qiita
- 4 users
- qiita.com/hayao_k
- テクノロジー
- 2020/12/14
この記事は AWS Advent Calendar 2020 14日目の記事です。はじめに 2020/12/4 に AWS Security Hub に統合可能な 3rd Party パートナーの製品として Aqua Security の kube-bench が追加されたことが発表されました。 AWS Security Hub adds open source tool integrations with Kube-bench and Cloud Custodian https://aws.amazon.com/jp/about-aws/whats-new/2020/12/aws-security-hub-adds-open-source-tool-integration-with-kube-bench-and-cloud-custodian/ この統合により kube-bench で
- aws
- あとで読む
GitHub - aquasecurity/chain-bench: An open-source tool for auditing your software supply chain stack for security compliance based on a new CIS Software Supply Chain benchmark.
- 4 users
- github.com/aquasecurity
- テクノロジー
- 2022/07/07
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- security
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
- 4 users
- arxiv.org
- テクノロジー
- 2024/10/17
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering. To this end, we curate 75 ML engineering-related competitions from Kaggle, creating a diverse set of challenging tasks that test real-world ML engineering skills such as training models, preparing datasets, and running experiments. We establish human baselines for each competition using Ka
LLMの日本語ロールプレイ能力を計測するベンチマーク「Japanese-RP-Bench」の概要と評価結果などのまとめ
- 3 users
- zenn.dev/aratako_lm
- テクノロジー
- 2024/09/30
はじめに LLMのマルチターン対話における日本語ロールプレイ能力を計測するベンチマーク「Japanese-RP-Bench」を構築し、以下のリポジトリにて公開しました。本記事では、構築に至った経緯やベンチマークの概要、評価結果などをまとめます。ベンチマークの実行方法についてはリポジトリをご確認ください。また、結果だけを見たい方は結果のセクションをご覧ください。概要構築に至った背景今回、以下のような背景・考えからこのベンチマークの構築に至りました。 LLMのロールプレイ的な用途での需要は比較的高いが、このタスクでの性能を計測するようなベンチマークが現状日本語では存在しない Japanese MT-BenchにはRoleplayのカテゴリが存在するが、大したロールプレイにはなっていないロールプレイタスクに限らず、「対話の楽しさ」のような抽象的なものを測ろうとするオープンなLLMベン
BIRD-bench
- 3 users
- bird-bench.github.io
- テクノロジー
- 2024/06/17
BIRD-SQL A Big Bench for Large-Scale Database Grounded Text-to-SQLs About BIRD BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs, 95 big databases with a total size of 33.4 GB. It also covers more t
GitHub - Danau5tin/terminal-bench-rl: GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
- 3 users
- github.com/Danau5tin
- テクノロジー
- 2025/07/29
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Multi-SWE-bench
- 3 users
- multi-swe-bench.github.io
- テクノロジー
- 2025/04/15
Multi-SWE-bench: A Multi-Lingual GitHub Issue Resolving Benchmark
- performance
- Github
GitHub - cvilsmeier/go-sqlite-bench: Benchmarks for Golang SQLite Drivers
- 3 users
- github.com/cvilsmeier
- テクノロジー
- 2023/12/14
For benchmarks I used the following libraries: craw, crawshaw.io/sqlite, a CGO-based solution. This is not a database/sql driver. eaton, github.com/eatonphil/gosqlite, a CGO-based solution. This is not a database/sql driver. (addded by @c4rlo) glebarez, github.com/glebarez/go-sqlite, a pure Go solution. This is a newer library, based on the SQLite C code re-written in Go (added by @dcarbone). matt
“公園のベンチ座りっぱなしおじいさん”体験ゲーム『The Bench』正式発表。ベンチに座って新聞のパズルを解いたりハトに餌をあげたり、歩き回って探索したり - AUTOMATON
- 3 users
- automaton-media.com
- アニメとゲーム
- 2025/01/24
難問データセットSWE-benchとは？AIによるプログラミング能力の新たな評価基準
- 3 users
- blog.asial.co.jp
- テクノロジー
- 2024/04/20
KMeans gives slightly different result for n_jobs=1 vs. n_jobs > 1 <!-- If your issue is a usage question, submit it here instead: - StackOverflow with the scikit-learn tag: http://stackoverflow.com/questions/tagged/scikit-learn - Mailing List: https://mail.python.org/mailman/listinfo/scikit-learn For more information, see User Questions: http://scikit-learn.org/stable/support.html#user-questions
GitHub - seddonm1/sqlite-bench: Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions
- 3 users
- github.com/seddonm1
- テクノロジー
- 2024/07/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
フィリピンの２大カジュアルブランドBENCHとPENSHOPPEをはしごしてお買い物(∩´∀｀)∩ - happykanapyのCebuライフ
- 3 users
- www.happykanapy.com
- 学び
- 2021/11/21
みなさん、おはようございます。昨日の記事のブックマークありがとうございます。ショッキングピンクはインドネシアにもありますか(*´艸`*) そして、私がこのブログで幾度となく挙げているので「紫＝ウベ」のイコール関係がすぐに思いつくかも知れませんね～さて、フィリピンは年中ずっと夏ということもあり服ってあまり買う必要がないんです。日本にいると季節ごと、また季節に関係なく流行りとか日本なら日本人向けの服があるのでつい欲しくなることもあるのではないでしょうか。私はセブに来てから日本にいた時に比べると圧倒的に服を買うことはなくなりました。一番の理由は年中夏だからということですが、会社勤めしていた時から仕事着を着ていく必要がなく、あまり服が必要なかったんです。そしてコロナになってからはずっと服は買っていなかったんですね。たぶん２年近く服はまったく買っていませんでした💦 ですが、ここ最近

新着記事

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx