Regulating AI Behavior with a Hypervisor
Guillotine是一种用于隔离恶意AI的hypervisor架构。它结合虚拟化技术与新隔离机制,防止AI攻击控制平面,并提供物理安全措施作为最后防线。 2025-4-23 16:2:48 Author: www.schneier.com(查看原文) 阅读量:6 收藏

Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.”

Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models—models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.

The basic idea is that many of the AI safety policies proposed by the AI community lack robust technical enforcement mechanisms. The worry is that, as models get smarter, they will be able to avoid those safety policies. The paper proposes a set technical enforcement mechanisms that could work against these malicious AIs.

Tags: , , ,

Posted on April 23, 2025 at 12:02 PM0 Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.


文章来源: https://www.schneier.com/blog/archives/2025/04/regulating-ai-behavior-with-a-hypervisor.html
如有侵权请联系:admin#unsafe.sh