[hacker news] Refusal in Language Models Is Mediated by a Single Direction]

Available in: 中文
2026-05-02T16:29:08.548Z·1 min read
Refusal in Language Models Is Mediated by a Single Direction]

摘要

Refusal in Language Models Is Mediated by a Single Direction]

来源

本文首发于 hacker news

阅读原文:[Refusal in Language Models Is Mediated by a Single Direction]](https://arxiv.org/abs/2406.11717)

↗ Original source · 2026-05-02T00:00:00.000Z
← Previous: Oil tanker hijacked off Yemen, steers toward Somalia]Next: An unknown Sega Saturn project has come to light after 29 years] →
Comments0