evaluationの人気記事 51件 - はてなブックマーク

1 - 40 件 / 51件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

evaluationの検索結果1 - 40 件 / 51件

タグ検索の該当結果が少ないため、タイトル検索結果を表示しています。

evaluationに関するエントリは51件あります。 LLM、組織、マネジメントなどが関連タグです。人気エントリには『エンジニア組織30人の壁を超えるための評価システムとマネジメントのスケール / Scaling evaluation system and management』などがあります。

エンジニア組織30人の壁を超えるための評価システムとマネジメントのスケール / Scaling evaluation system and management
- 128 users
- speakerdeck.com/yoshikiiida
- テクノロジー
- 2024/08/08
2024夏のジンジニアMeetup!　〜みんなで学ぼう！開発組織の評価制度と運用〜 https://jinjineer.connpass.com/event/323746/
新米マネージャーの初めての目標設定と評価 / New manager's first goal setting and evaluation
- 58 users
- speakerdeck.com/kaminashi
- テクノロジー
- 2024/03/02
2024/03/01： EMゆるミートアップ vol.6 〜LT会〜 https://em-yuru-meetup.connpass.com/event/308552/ 新米マネージャーの初めての目標設定と評価倉澤直弘 EM
定量データと定性評価を用いた技術戦略の組織的実践 / Systematic implementation of technology strategies using quantitative data and qualitative evaluation
- 50 users
- speakerdeck.com/chaspy
- テクノロジー
- 2024/06/15
CNDS2024 https://event.cloudnativedays.jp/cnds2024/
Best Practices for LLM Evaluation of RAG Applications
- 39 users
- www.databricks.com
- テクノロジー
- 2023/09/16
Unified governance for all data, analytics and AI assets
- rag
- LLM
- AI
- あとで読む

作るだけなら簡単なLLMを“より優れたもの”にするには　「Pretraining」「Fine-Tuning」「Evaluation & Analysis」構築のポイント
- 39 users
- logmi.jp
- テクノロジー
- 2023/12/05
オープンLLMの開発をリードする現場の視点から、開発の実情や直面する課題について発表したのは、Stability AI Japan株式会社の秋葉拓哉氏。Weights & Biasesのユーザーカンファレンス「W＆Bカンファレンス」で、LLM開発のポイントを紹介しました。全2記事。前半は、より優れたLLMを作るために必要なこと。前回はこちら。より優れたLLMを作るために必要なこと秋葉拓哉氏：めでたくFine-Tuningもできた。これけっこう、びっくりするかもしれません。コードはさすがにゼロとはいかないと思いますが、ほとんど書かずに実はLLMは作れます。「さすがにこんなんじゃゴミみたいなモデルしかできないだろう」と思われるかもしれませんが、おそらく余計なことをしなければこれだけでも、まあまあそれっぽいLLMにはなるかなと思います。なので、ちょっと、先ほどの鈴木先生（鈴木潤氏）の話と
- llm
- あとで読む
- ai
Off-Policy Evaluationの基礎とZOZOTOWN大規模公開実データおよびパッケージ紹介 - ZOZO TECH BLOG
- 34 users
- techblog.zozo.com
- テクノロジー
- 2020/09/03
※AMP表示の場合、数式が正しく表示されません。数式を確認する場合は通常表示版をご覧ください ※2020年11月7日に、「Open Bandit Pipelineの使い方」の節に修正を加えました。修正では、パッケージの更新に伴って、実装例を新たなバージョンに対応させました。詳しくは対応するrelease noteをご確認ください。今後、データセット・パッケージ・論文などの更新情報はGoogle Groupにて随時周知する予定です。こちらも良ければフォローしてみてください。また新たに「国際会議ワークショップでの反応」という章を追記しました。 ZOZO研究所と共同研究をしている東京工業大学の齋藤優太です。普段は、反実仮想機械学習の理論と応用をつなぐような研究をしています。反実仮想機械学習に関しては、拙著のサーベイ記事をご覧ください。本記事では、機械学習に基づいて作られた意思決定の性能をオフラ
GitHub - yahoojapan/JGLUE: JGLUE: Japanese General Language Understanding Evaluation
- 24 users
- github.com/yahoojapan
- テクノロジー
- 2022/06/02
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- 自然言語処理
- NLP
- 日本語
- yahoo
- search
- AI
GitHub - Stability-AI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models.
- 21 users
- github.com/Stability-AI
- テクノロジー
- 2023/06/07
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- LLM
- あとで読む
GitHub - Arize-ai/phoenix: AI Observability & Evaluation
- 15 users
- github.com/Arize-ai
- テクノロジー
- 2023/06/19
Phoenix provides MLOps and LLMOps insights at lightning speed with zero-config observability. Phoenix provides a notebook-first experience for monitoring your models and LLM Applications by providing: LLM Traces - Trace through the execution of your LLM Application to understand the internals of your LLM Application and to troubleshoot problems related to things like retrieval and tool execution.
- MLOps
- 機械学習
- LLM
- library
- github
- AI
- あとで読む
GitHub - confident-ai/deepeval: The LLM Evaluation Framework
- 13 users
- github.com/confident-ai
- テクノロジー
- 2023/08/17
DeepEval is a simple-to-use, open-source LLM evaluation framework. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., which uses LLMs and various other NLP models that runs locally on your machine for evaluation. Whether your applicatio
- LLM
COVID-19 vaccine efficacy summary | Institute for Health Metrics and Evaluation
- 10 users
- www.healthdata.org
- 世の中
- 2021/05/06
- COVID-19
- data
GitHub - explodinggradients/ragas: Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
- 9 users
- github.com/explodinggradients
- テクノロジー
- 2023/09/11
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- RAG
- LLM
PMスキル・評価制度を導入し、アウトカムを生み出すプロダクトマネジメント集団へ進化する道のりの共有 / How we introduced the PM skills and evaluation system and evolved into a product management group that produces outcomes
- 9 users
- speakerdeck.com/roki_n_
- テクノロジー
- 2021/10/26
pmconf2021登壇スライド。 Rettyがプロジェクトマネジメント一辺倒な組織から、アウトカムドリブンな開発ができるプロダクトマネジメントが根付いた組織に至るまでの成長の経過について。具体的な取り組みの一例を挙げると、私たちは力のあるプロダクトマネージャーを育てるために、PMのスキル…
Estimation of total and excess mortality due to COVID-19 | Institute for Health Metrics and Evaluation
- 9 users
- www.healthdata.org
- 世の中
- 2021/05/10
Estimation of total and excess mortality due to COVID-19 Published October 15, 2021 This page was updated on October 15, 2021 to reflect changes in our modeling strategy. View our previous methods published May 13, 2021 here. In our October 15 release, we introduced three major changes. First, we have very substantially updated the data and methods used to estimate excess mortality related to the
- COVID-19
- 統計
Evaluation of science advice during the COVID-19 pandemic in Sweden - Humanities and Social Sciences Communications
- 8 users
- www.nature.com
- 学び
- 2022/04/01
Sweden was well equipped to prevent the pandemic of COVID-19 from becoming serious. Over 280 years of collaboration between political bodies, authorities, and the scientific community had yielded many successes in preventive medicine. Sweden’s population is literate and has a high level of trust in authorities and those in power. During 2020, however, Sweden had ten times higher COVID-19 death rat
- 生物
GitHub - st-tech/zr-obp: Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
- 8 users
- github.com/st-tech
- テクノロジー
- 2020/08/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- oss
Top Evaluation Metrics for RAG Failures
- 7 users
- towardsdatascience.com
- テクノロジー
- 2024/02/12
Figure 1: Root Cause Workflows for LLM RAG Applications (flowchart created by author) If you have been experimenting with large language models (LLMs) for search and retrieval tasks, you have likely come across retrieval augmented generation (RAG) as a technique to add relevant contextual information to LLM generated responses. By connecting an LLM to private data, RAG can enable a better response
- あとで読む
雰囲気で理解するtidy evaluation(1): tidy evaluationの導入 - Qiita
- 6 users
- qiita.com/uri
- テクノロジー
- 2019/12/27
Rユーザの皆さん、rlangパッケージないしtidy evaluation (tidy eval)についてどれだけご存知でしょうか。rlangパッケージは昨日バージョン0.4.2がリリースされました。まだ1.0.0には至ってはいませんが、CRANに登録されて2年以上経つので、本腰を入れて学んでいきたいと思っているところです。今回から数回、そんな私自身のrlang、rlangによるtidy evalの学習ついでに、やんわりとした解説をしようという試みで記事を書きます。本来であれば用語の定義や背景についての解説をしなければならないと思います。しかしここでは、まずは、tidy evalを学ぶことでどのようなことが可能になるのか、rlangパッケージを使うとどのような利点があるのか、その雰囲気の理解に重きを置きます。詳細を知りたくなった方はぜひドキュメントや参考資料を読んでください。私もまだ道
GitHub - pfnet-research/japanese-lm-fin-harness: Japanese Language Model Financial Evaluation Harness
- 5 users
- github.com/pfnet-research
- テクノロジー
- 2023/12/04
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- あとで読む
LLM Evaluation Tutorial
- 5 users
- sites.google.com
- テクノロジー
- 2024/08/27
Grounding and Evaluation for Large Language Models (Tutorial) With the ongoing rapid adoption of Artificial Intelligence (AI) based systems in high-stakes domains such as financial services, healthcare and life sciences, hiring and human resources, education, societal infrastructure, and national security, it is crucial to develop and deploy the underlying AI models and systems in a responsible ma
- tutorial
- あとで読む
論文紹介 Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems
- 5 users
- speakerdeck.com/hagino3000
- エンタメ
- 2021/10/15
社内論文読み会の資料です Mehrotra, Rishabh, et al. "Towards a fair marketplace: Counterfactual evaluation of the trade-off between relevance, fairness & satisfac…
- 音楽
Evaluation method of UX “The User Experience Honeycomb” | blog / bookslope
- 5 users
- bookslope.jp
- テクノロジー
- 2020/07/08
ウェブサイトを評価する・レビューする方法にはさまざまな視点が必要になると思いますが、市場の流れから考えて「UX」視点が必要だとする見方があります。以前から、利用者視点というものを評価方法として加えている調査会社であれば、当然の流れといえますが、そうした場合のUXの評価とはユーザーテストを実施して実際に被験者に利用してもらうことが多いと思います。ユーザテストのシナリオ作成においては、もっぱらそうした検討がされていると思いますが、評価方法としてUXを考える場合、「UXハニカム構造」がベースになるように思いました。 User Experience Design – Semantic Studios この記事に「The User Experience Honeycomb」というものがあり、これを「UXハニカム構造」と呼んでいるわけですが、UXを構成する要素には、Useful (役に立つ)・Usa
- UX
GitHub - FreedomIntelligence/LLMZoo: ⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
- 5 users
- github.com/FreedomIntelligence
- テクノロジー
- 2023/04/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Humanloop: Collaboration and evaluation for LLM applications
- 4 users
- humanloop.com
- テクノロジー
- 2023/03/16
A shared workspace where PMs, Engineers and Domain Experts collaborate on building AI features Humanloop is the first platform that combines software best practices with the needs of LLMs in a unified platform. Empowering your whole team to drive AI improvement.
長期の評価に最適なWindows 10／11 Enterprise Evaluationともっと長く付き合う“裏ワザ”
- 4 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2024/02/28
長期の評価に最適なWindows 10／11 Enterprise Evaluationともっと長く付き合う“裏ワザ”：山市良のうぃんどうず日記（277） Windows 10／11 Enterpriseには、90日無料で評価できる「Evaluation」エディションがあります。ライセンスを購入しなくても、企業向けWindows 10／11をテスト、評価できるので、筆者はよく利用しています。そんなEvaluationエディションを可能な限り長く利用する裏ワザを幾つか紹介します。
- windows
Misplaced trust: When trust in science fosters belief in pseudoscience and the benefits of critical evaluation
- 4 users
- www.sciencedirect.com
- 学び
- 2021/07/06
At a time when pseudoscience threatens the survival of communities, understanding this vulnerability, and how to reduce it, is paramount. Four preregistered experiments (N = 532, N = 472, N = 605, N = 382) with online U.S. samples introduced false claims concerning a (fictional) virus created as a bioweapon, mirroring conspiracy theories about COVID-19, and carcinogenic effects of GMOs (Geneticall
OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future | OpenTofu
- 4 users
- opentofu.org
- テクノロジー
- 2024/07/29
July 29, 2024OpenTofu 1.8.0 is out with Early Evaluation, Provider Mocking, and a Coder-Friendly Future Since the 1.7 release, the OpenTofu community and core team have been hard at work on much-requested features, making .tf code easier to write, reducing unnecessary boilerplate, improving performance, and more. We are happy to announce the immediate availability of OpenTofu 1.8 with the followin
【ML Tech RPT. 】第11回機械学習のモデルの評価方法 (Evaluation Metrics) を学ぶ (2) - Sansan Tech Blog
- 4 users
- buildersbox.corp-sansan.com
- テクノロジー
- 2020/02/28
DSOC研究員の吉村です. 弊社には「よいこ」という社内の部活のような社内制度があり, 私はその中のテニス部に所属しています. 月一程度で活動をしているのですが, 最近は新たに入社された部員も増えてきて新しい風を感じています. さて, 今回も前回に引き続き「機械学習のモデルの評価方法 (Evaluation Metrics)」に焦点を当てていきます. (今回も前回同様, "モデル" という言葉を機械学習のモデルという意味で用います.) 前回は, モデルを評価する観点や注意事項について確認しました. 今回からは, 各種問題設定ごとにどのような評価指標が存在し, それらが何を意味するのかについて見ていこうと思います. 今回は二値分類問題を取り扱います. 前回の記事の最後で, 多クラス (マルチクラス) 分類・回帰問題についても本記事で取り扱うと書きましたが, 量が多くなりすぎてしまったため,
International evaluation of an AI system for breast cancer screening - Nature
- 4 users
- www.nature.com
- テクノロジー
- 2020/01/03
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
The Generative AI Evaluation Company - Galileo
- 4 users
- www.rungalileo.io
- テクノロジー
- 2023/02/09
Evaluate, observe, and protect your GenAI applications Go beyond ‘vibe checks’ and asking GPT with the first end-to-end GenAI Stack, powered by Evaluation Foundation Models.
Windows Server 2022 | Microsoft Evaluation Center
- 3 users
- www.microsoft.com
- テクノロジー
- 2021/08/19
In addition to your trial experience of Windows Server 2022, you can more easily add and manage languages and Features on Demand with the new Languages and Optional Features ISO. Download this ISO. This ISO is only available on Windows Server 2022 and combines the previously separate Features on Demand and Language Packs ISOs, and can be used as a FOD and Language pack repository. To learn about F
- Windows
「Microsoft Evaluation Center」に障害、評価版ソフトがダウンロード不能に／コミュニティサイトでダウンロードリンクを案内中
- 3 users
- forest.watch.impress.co.jp
- テクノロジー
- 2022/05/13
- Microsoft
- Software
- Web
Evaluation of Retrieval-Augmented Generation: A Survey
- 3 users
- arxiv.org
- 学び
- 2024/05/15
Retrieval-Augmented Generation (RAG) has recently gained traction in natural language processing. Numerous studies and real-world applications are leveraging its ability to enhance generative models through external information retrieval. Evaluating these RAG systems, however, poses unique challenges due to their hybrid structure and reliance on dynamic knowledge sources. To better understand thes
- 論文
- あとで読む
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
- 3 users
- arxiv.org
- テクノロジー
- 2020/08/23
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there has been growing research interest in this field. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of
Mandoline: Model Evaluation under Distribution Shift
- 3 users
- arxiv.org
- テクノロジー
- 2021/08/14
Machine learning models are often deployed in different settings than they were trained and validated on, posing a challenge to practitioners who wish to predict how well the deployed model will perform on a target distribution. If an unlabeled sample from the target distribution is available, along with a labeled sample from a possibly different source distribution, standard approaches such as im
Terms of Evaluation
- 3 users
- www.hashicorp.com
- テクノロジー
- 2020/05/30
Terms of Evaluation for HashiCorp SoftwareBefore you download and/or use our enterprise software for evaluation purposes, you will need to agree to a special set of terms (“Agreement”), which will be applicable for your use of the HashiCorp, Inc.’s (“HashiCorp”, “we”, or “us”) enterprise software. PLEASE READ THIS AGREEMENT CAREFULLY BEFORE INSTALLING OR USING THE SOFTWARE. THESE TERMS AND CONDITI
- law
論文紹介：ChatGPT で情報抽出タスクは解けるのか？�Is information extraction solved by ChatGPT? �An analysis of performance, evaluation criteria, robustness and errors
- 3 users
- speakerdeck.com/stktu
- テクノロジー
- 2023/07/12
論文紹介：ChatGPT で情報抽出タスクは解けるのか？�Is information extraction solved by ChatGPT? �An analysis of performance, evaluation criteria, robustness and errors
- あとで読む
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.
- 3 users
- github.com/EleutherAI
- テクノロジー
- 2023/05/03
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
U.S. AI Safety Institute Signs Agreements Regarding AI Safety Research, Testing and Evaluation With Anthropic and OpenAI
- 3 users
- www.nist.gov
- テクノロジー
- 2024/08/30
GAITHERSBURG, Md. — Today, the U.S. Artificial Intelligence Safety Institute at the U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) announced agreements that enable formal collaboration on AI safety research, testing and evaluation with both Anthropic and OpenAI. Each company’s Memorandum of Understanding establishes the framework for the U.S. AI Safety Institut
- 人工知能
CAE (Continuous Access Evaluation: 継続的アクセス評価)
- 3 users
- jpazureid.github.io
- テクノロジー
- 2021/05/18
こんにちは。Azure Identity チームの金森です。みなさんは CAE (Continuous Access Evaluation: 継続的アクセス評価) という機能をご存知でしょうか。 2021 年 11 月現在、以下のようなお知らせがあり、目にされた方も多いのではないかと思います。 Microsoft 365 管理ポータルのメッセージセンターに MC255540 (Continuous access evaluation on by default) として情報が公開送信元 : Microsoft Azure azure-noreply@microsoft.com から TRACKING ID: 5T93-LTG として以下の件名のメールでお知らせ -> Continuous access evaluation will be enabled in premium Azu

新着記事

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx