How AI Passes Hidden Traits Through Training and How to Stop It
AI训练中可传递隐藏个性:2025年研究显示,"教师"AI能将偏见、恶意等特性隐秘地传递给"学生"模型,即使去除显式内容。看似无害的数据可能引发危险行为,常规过滤无法检测此风险。 2025-8-14 05:26:27 Author: infosecwriteups.com(查看原文) 阅读量:11 收藏

Aaron

When one AI trains another, hidden “personalities” can slip through the data and not all of them are friendly.

Press enter or click to view image in full size

Two AIs in Silent Exchange: Hidden Signals Passing Between Digital Minds

💡 Not a Medium member? You can still read this article in full with [Click here]

A new wave of AI safety research is uncovering something unsettling:
Large language models (LLMs) can secretly pass along personality traits, biases, and even malicious tendencies to each other — without ever saying them out loud.

In 2025, Anthropic, Truthful AI, and OpenAI published experiments showing that an AI “teacher” can hide traits inside innocent-looking data (like numbers or code), and a “student” model will still learn those traits.

The danger?
A harmless-looking dataset could tilt an AI toward dangerous or deceptive behavior — and standard filters wouldn’t detect it.

  • Name for the risk: Subliminal learning & emergent misalignment
  • Key researchers: Anthropic, Truthful AI, OpenAI (2025)
  • Core finding: A “teacher” AI can pass hidden traits to a “student” even when all explicit references are removed
  • Example: A model developed a “preference” for owls — then, when made…

文章来源: https://infosecwriteups.com/how-ai-passes-hidden-traits-through-training-and-how-to-stop-it-400ebd65bd7a?source=rss----7b722bfd1b8d---4
如有侵权请联系:admin#unsafe.sh