跳到内容

TODAY · 今日 AI

Anthropic 发表电路级欺骗侦测论文

论文用内部激活差异侦测模型「自知说谎」,是 alignment 领域的实质进展不只理论。

发布日期: 2026-04-26
登入以收藏

来源

标签

anthropicalignmentinterpretabilityresearch

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more