Subliminal Learning in AIs
研究揭示语言模型可从无关数据中隐性学习特征,如偏好猫头鹰,并通过看似无害的数据传播对齐问题。此现象仅当师生模型共享同一基础时发生,凸显AI安全与完整性的研究必要性。 2025-7-25 11:10:10 Author: www.schneier.com(查看原文) 阅读量:15 收藏

Today’s freaky LLM behavior:

We study subliminal learning, a surprising phenomenon where language models learn traits from model-generated data that is semantically unrelated to those traits. For example, a “student” model learns to prefer owls when trained on sequences of numbers generated by a “teacher” model that prefers owls. This same phenomenon can transmit misalignment through data that appears completely benign. This effect only occurs when the teacher and student share the same base model.

Interesting security implications.

I am more convinced than ever that we need serious research into AI integrity if we are ever going to have trustworthy AI.

Tags: , , , ,

Posted on July 25, 2025 at 7:10 AM0 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.


文章来源: https://www.schneier.com/blog/archives/2025/07/subliminal-learning-in-ais.html
如有侵权请联系:admin#unsafe.sh