GitHub - promptfoo/promptfoo: Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/

テクノロジーカテゴリーの変更を依頼記事元:

github.com/promptfoo

2 usersがブックマークコメント

記事へのコメント0件

注目コメント
新着コメント

新着コメントはまだありません。
このエントリーにコメントしてみましょう。

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

リンクを埋め込む

以下のコードをコピーしてサイトに埋め込むことができます

<iframe marginwidth="0" marginheight="0" src="https://b.hatena.ne.jp/entry.parts?url=https%3A%2F%2Fgithub.com%2Fpromptfoo%2Fpromptfoo" scrolling="no" frameborder="0" height="230" width="500"><div class="hatena-bookmark-detail-info"><a href="https://github.com/promptfoo/promptfoo">GitHub - promptfoo/promptfoo: Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/</a><a href="https://b.hatena.ne.jp/entry/s/github.com/promptfoo/promptfoo">はてなブックマーク - GitHub - promptfoo/promptfoo: Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/</a></div></iframe>

プレビュー

規約違反を報告

いまの話題をアプリでチェック！

バナー広告なし
ミュート機能あり
ダークモード搭載

アプリをダウンロード

GitHub - promptfoo/promptfoo: Test your prompts, models, RAGs. Evaluate and compare LLM outputs, catch regressions, and improve prompt quality. LLM evals for OpenAI/Azure GPT, Anthropic Claude, VertexAI Gemini, Ollama, Local & private models like Mistral/

promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can: Syste... promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can: Systematically test prompts, models, and RAGs with predefined test cases Evaluate quality and catch regressions by comparing LLM outputs side-by-side Speed up evaluations with caching and concurrency Score outputs automatically by defining test cases Use as a CLI, library, or in CI/CD Use OpenAI, Anthropic,