[hacker news] Why SWE-bench Verified no longer measures frontier coding capabilities]
Available in: 中文
Why SWE-bench Verified no longer measures frontier coding capabilities]
摘要
Why SWE-bench Verified no longer measures frontier coding capabilities]
来源
本文首发于 hacker news。
阅读原文:[Why SWE-bench Verified no longer measures frontier coding capabilities]](https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/)
0