と言ってもこの点数が低いのか高いのか分かりませんので、Claude 3.5 Sonnetの点数も見ていきましょう。 Claude 3.5 Sonnetの点数 現時点で最強と名高いClaude 3.5 SonnetにもELYZA-tasks-100を解いてもらいます。 単純に問題文だけを投げる形で、temperatureは0.8にしました。 import json import anthropic from datasets import load_dataset client = anthropic.Anthropic( api_key="APIキー", ) dataset = load_dataset("elyza/ELYZA-tasks-100") test_set = dataset["test"] results = {} for i, example in enumerate(t
![ELYZA-tasks-100を人間が解くと何点取れるのか?](https://cdn-ak-scissors.b.st-hatena.com/image/square/d351c48382fe5bd986347f0147497d9ca28f2101/height=288;version=1;width=512/https%3A%2F%2Fres.cloudinary.com%2Fzenn%2Fimage%2Fupload%2Fs--kRXF3u0P--%2Fc_fit%252Cg_north_west%252Cl_text%3Anotosansjp-medium.otf_55%3AELYZA-tasks-100%2525E3%252582%252592%2525E4%2525BA%2525BA%2525E9%252596%252593%2525E3%252581%25258C%2525E8%2525A7%2525A3%2525E3%252581%25258F%2525E3%252581%2525A8%2525E4%2525BD%252595%2525E7%252582%2525B9%2525E5%25258F%252596%2525E3%252582%25258C%2525E3%252582%25258B%2525E3%252581%2525AE%2525E3%252581%25258B%2525EF%2525BC%25259F%252Cw_1010%252Cx_90%252Cy_100%2Fg_south_west%252Cl_text%3Anotosansjp-medium.otf_37%3AYuki%252520Tomita%252Cx_203%252Cy_121%2Fg_south_west%252Ch_90%252Cl_fetch%3AaHR0cHM6Ly9zdG9yYWdlLmdvb2dsZWFwaXMuY29tL3plbm4tdXNlci11cGxvYWQvYXZhdGFyL2VkZGQ3ZjUxNzYuanBlZw%3D%3D%252Cr_max%252Cw_90%252Cx_87%252Cy_95%2Fv1627283836%2Fdefault%2Fog-base-w1200-v2.png)