[hacker news] Refusal in Language Models Is Mediated by a Single Direction]
Available in: 中文
Refusal in Language Models Is Mediated by a Single Direction]
摘要
Refusal in Language Models Is Mediated by a Single Direction]
来源
本文首发于 hacker news。
阅读原文:[Refusal in Language Models Is Mediated by a Single Direction]](https://arxiv.org/abs/2406.11717)
← Previous: Oil tanker hijacked off Yemen, steers toward Somalia]Next: An unknown Sega Saturn project has come to light after 29 years] →
0