mkusakaのブックマーク - はてなブックマーク

mkusaka id:mkusaka

ブックマーク / alignment.anthropic.com (2)

Petri 2.0: New Scenarios, New Model Comparisons, and Improved Eval-Awareness Mitigations
mkusaka 2026/01/24
AnthropicのPetri 2.0は、新シナリオ追加、model comparisonsの拡充、eval-awareness緩和策を導入した監査フレームワークです。

AI要約

Anthropic

Petri

alignment

ニュース
リンク
Findings from a Pilot Anthropic - OpenAI Alignment Evaluation Exercise
Findings from a Pilot Anthropic—OpenAI Alignment Evaluation Exercise Samuel R. Bowman, Megha Srivastava, Jon Kutasov, Rowan Wang, Trenton Bricken, Benjamin Wright, Ethan Perez, and Nicholas Carlini tl;dr In early summer 2025, Anthropic and OpenAI agreed to evaluate each other's public models using in-house misalignment-related evaluations. We are now releasing our findings in parallel. The evaluat
mkusaka 2025/08/28
AI要約

AI

OpenAI

Claude

AIエージェント

evaluation

misalign
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx