基于虚拟化的HIPS架构 从0到1(VT部分)
2021-02-24 22:17:11 Author: www.freebuf.com(查看原文) 阅读量:94 收藏

Hypervisor简单不准确概念就是,启用HV后,会有客户机(guest)和主机(host),客户机的CPU的一些操作会经过一个叫做VMCS的结构(占用一个page大小)交给主机处理再交给客户机.如果你用过vmware 那么主机就是你现在的电脑,客户端就是你开的虚拟机里面的东西.介于国内这方面资料很少,所以在这边做个记录.

由于虚拟化技术分intel的VT与AMD的SVM 篇幅有限 本文暂时介绍VT,SVM会在稍后的文章中介绍

由于是第一篇 所有我们的最终目的是

制作一个hypervisor 并且挂钩 SSDT 函数与安排掉常规hypervisor检测

(请注意,本文中的hyperduck是我的项目名字,不要跟hypervisor搞混)

阅读本文 如果你想真正掌握制作一个虚拟机,你需要掌握如下技能:

1. C与C++

2. 基本内核知识

3. RIP ESP 概念

4. 学习的热情

话不多说 开始

1. 判断CPU架构

如之前所说,SVM跟VT的架构不一样 ,AMD跟INTEL的CPU进入虚拟化的方式是不一样的,第一步我们就需要通过cpuid函数判断CPU类型:

int get_cpu_type()
{
	_cpuid data = { 0 };
	char vendor[0x20] = { 0 };
	__cpuid((int*)&data, 0);
	*(int*)(vendor) = data.Rbx;
	*(int*)(vendor + 4) = data.Rdx;
	*(int*)(vendor + 8) = data.Rcx;
	if (memcmp(vendor, "GenuineIntel", 12) == 0)
	{
		global::cpu_type = _cpu_intel;
		return _cpu_intel;
	}
	if (memcmp(vendor, "AuthenticAMD", 12) == 0)
	{
		global::cpu_type = _cpu_amd;
		return _cpu_amd;
	}
	DebugPrint("[DebugMessage] Unknown CPU Detected! %s \n", vendor);

	global::cpu_type = _cpu_unk;
	return _cpu_unk;
}

启用VMXON功能

其实就是读ia32_feature_control msr寄存器然后修改一个enablevmxon位再写回去就行.请注意,VMXON有时候会被主板锁定,需要自己在bios里面设置打开VT

bool enable_vmx_operation()
{
	_cpuid data = { 0 };

	__cpuid((int*)&data, 1);
	if ((data.Rcx & (1 << 5)) == 0)
		return false;
	
	IA32_FEATURE_CONTROL_MSR Control = { 0 };
	Control.All = __readmsr(ia32_feature_control);

	// BIOS lock check
	if (Control.Fields.Lock == 0)
	{
		Control.Fields.Lock = true;
		Control.Fields.EnableVmxon = true;
		_huoji_writemsr(ia32_feature_control, Control.All);
	}
	else if (Control.Fields.EnableVmxon == false)
	{
		DebugPrint("[%s]: VMX locked off in BIOS\n", __FUNCTION__);
		return false;
	}

	return true;
}

分配VMContext与VmStack <-我们稍后会在虚拟机里面用到

void vt::allocate_vmm_context() {
	PHYSICAL_ADDRESS phys = { 0 };
	phys.QuadPart = ~0ULL;
	//global::vm_context = (_vmm_context*)ExAllocatePoolWithTag(NonPagedPool, sizeof(_vmm_context), HUOJI_TAG);
	global::vm_context = (_vmm_context*)MmAllocateContiguousMemory(sizeof(_vmm_context), phys);
	RtlSecureZeroMemory(global::vm_context, sizeof(_vmm_context));
	global::vm_context->processor_count = KeQueryActiveProcessorCountEx(ALL_PROCESSOR_GROUPS);
	global::vm_context->vt_vcpu_table = (_vcpu_t**)ExAllocatePoolWithTag(NonPagedPool, sizeof(struct _vcpu_t*) * global::vm_context->processor_count, HUOJI_TAG);
	global::vm_context->kennel_base = global::kernel_base;
	global::vm_context->kennel_size = global::kernel_size;
	DebugPrint("vmm_context allocated at %p\n", global::vm_context);
	DebugPrint("vcpu_table allocated at %p\n", global::vm_context->vt_vcpu_table);
}

请注意,我们用MmAllocateContiguousMemory分配vm_context因为进入VM的时候IRQL级别会大于或者等于DPC LEVEL 因此我们必须要分配一个连续的内存(非连续内存会产生中断然后炸系统,如果不懂是什么意思可以复习一下大学学的操作系统)

分配stack,我们的虚拟机栈,不了解可以百度一下虚拟机栈的概念

vcpu->stack = MmAllocateContiguousMemory(KERNEL_STACK_SIZE, phys);

修正CR0与CR4

CR0 包含各种可以修改基本处理器操作的标志。我们将遇到的一个这样的标志是保护启用位,它确定处理器是在实模式还是保护模式下执行

CR4 的VMXE 决定我们是否能启动VM(也叫做 VMX ENABLE)

请记住还有一个CR3,当时目前不需要,以后会需要

void vt::enable_vmx() {
	uintptr_t cr0 = _huoji_readcr0();
	uintptr_t cr4 = _huoji_readcr4();
	
	cr0 |= __readmsr(ia32_vmx_cr0_fixed0);
	cr0 &= __readmsr(ia32_vmx_cr0_fixed1);
	cr4 |= __readmsr(ia32_vmx_cr4_fixed0);
	cr4 &= __readmsr(ia32_vmx_cr4_fixed1);
	_huoji_writecr0(cr0);
	_huoji_writecr4(cr4);
}

你可以看到我这里用了我都代理函数去做,因为我这边是用LLVM编译的,LLVM没有此类操作(疑似作者偷懒,我这边就用我自己的函数代替了),类似于这样,请不要大惊小怪

static unsigned __int64 _huoji_readcr0(void) {
#ifdef _llvm
	unsigned __int64 result_data = 0;
	__asm("mov %%cr0, %0" : "=r"(result_data) : : "memory");
	return result_data;
#else
	return __readcr0();
#endif
}

static unsigned __int64 _huoji_readcr4(void) {
#ifdef _llvm
	unsigned __int64 result_data = 0;
	__asm("mov %%cr4, %0" : "=r"(result_data) : : "memory");
	return result_data;
#else
	return __readcr4();
#endif
}

同步核心

在进入VM之前,你需要了解一个很重要的概念,现在cpu都不是单核CPU了,都是多核CPU,因此我们需要同一时间让所有的核心同时执行代码,因此就需要用到DPC KeGenericCallDpc:

void init_vm_dpc_callback(PRKDPC Dpc, PVOID Context, PVOID SystemArgument1, PVOID SystemArgument2)
{
	uintptr_t processor_number = KeGetCurrentProcessorNumber();
	_vcpu_t* vcpu = global::vm_context->vt_vcpu_table[processor_number];
	RtlCaptureContext(&vcpu->context_frame);
	if (vcpu->vm_status == 0) {
		vcpu->vm_status = 1;
		vt::init_logical_processor();
	}
	else if(vcpu->vm_status == 1) {
		vcpu->vm_status = 2;
		DebugPrint("[%d] vm finished! \n", processor_number);
		vm_restore_context(&vcpu->context_frame);
		DebugPrint("[%d] vm finished restore contex finished! \n", processor_number);

	}
	KeSignalCallDpcSynchronize(SystemArgument2);
	KeSignalCallDpcDone(SystemArgument1);
}
.....
KeGenericCallDpc(init_vm_dpc_callback, NULL);

这样所有核心都会同时执行我们的init_vm_dpc_callback函数

我函数里面使用RtlCaptureContext保存上下文(这样子进入VM的时候GUEST RIP就会恢复到这句话下面)

看不懂没关系,我们继续,到时候你就理解含义了

进入VM

到这里,我们就可以进入VM了:

首先用__vmx_on指令激活这个核心的VM扩展功能(同时他会返回一个VMXON的物理地址,我们记录她)

if (_huoji_vmx_on(&vcpu->vmxon_physical) != 0) {
......失败时候的处理
}

之后我们要初始化VMCS区域,你可以百度这个区域意思

初始化之前,调用_vmx_vmclear清理掉老的VMCS区域内容防止出现冲突:

if ((_huoji_vmx_vmclear(&vcpu->vmcs_physical) != vmx_success) || (_huoji_vmx_vmptrld(&vcpu->vmcs_physical) != vmx_success)) {
		__debugbreak();
	}

调整msr寄存器,VM的接收的vmexit事件等信息都受到这些msr寄存器的影响(比如我们希望接收什么指令的vmexit,是否要做APIC虚拟化等),我们根据需要调整他们:

举个例子,我们要求处理器必须处于长模式下(具体可以百度 实模式、保护模式、长模式各自是什么意思 计算机组成原理的基本课程):

_vt_vmx_entry_control_t entry_controls;
	entry_controls.control = 0;
	entry_controls.bits.ia32e_mode_guest = TRUE;
	vt_vmx_adjust_entry_controls(&entry_controls);

这是我们目前需要调整的(以后):

_vt_vmx_exit_control_t exit_controls;
	exit_controls.control = 0;
	exit_controls.bits.host_address_space_size = TRUE;
	vt_vmx_adjust_exit_controls(&exit_controls);

	_vt_vmx_pinbased_control_msr_t pinbased_controls;
	pinbased_controls.control = 0;
	vt_vmx_adjust_pinbased_controls(&pinbased_controls);

	_vt_vmx_primary_processor_based_control_t primary_controls;
	primary_controls.control = 0;
	primary_controls.bits.use_msr_bitmaps = TRUE;
	primary_controls.bits.active_secondary_controls = TRUE;
	//primary_controls.bits.rdtsc_exiting = TRUE; //rdtsc
	vt_vmx_adjust_processor_based_controls(&primary_controls);

	_vt_vmx_secondary_processor_based_control_t secondary_controls;
	secondary_controls.control = 0;
	secondary_controls.bits.enable_rdtscp = TRUE;
	secondary_controls.bits.enable_xsave_xrstor = TRUE;
	secondary_controls.bits.enable_invpcid = TRUE;
	vt_vmx_adjust_secondary_controls(&secondary_controls);

调整代码(其他代码无非修改寄存器,参考intel白皮书):

uintptr_t vt_vmx_adjust_cv(unsigned int capability_msr, unsigned int value)
{
    union _vt_vmx_true_control_settings_t cap;
    unsigned int actual;

    cap.control = __readmsr(capability_msr);
    actual = value;

    actual |= cap.allowed_0_settings;
    actual &= cap.allowed_1_settings;
    return actual;
}
void vt_vmx_adjust_entry_controls(union _vt_vmx_entry_control_t* entry_controls)
{
    unsigned int capability_msr;
    union _vt_vmx_basic_msr_t basic;

    basic.control = __readmsr(ia32_vmx_basic);
    capability_msr = (basic.bits.true_controls != FALSE) ? ia32_vmx_true_entry_ctrl : ia32_vmx_entry_ctrl;

    entry_controls->control = vt_vmx_adjust_cv(capability_msr, entry_controls->control);
    _huoji_vmx_vmwrite(pin_based_vm_execution_controls, entry_controls->control);

}

保存到VMCS:

_huoji_vmx_vmwrite(pin_based_vm_execution_controls, pinbased_controls.control);
	_huoji_vmx_vmwrite(primary_processor_based_vm_execution_controls, primary_controls.control);
	_huoji_vmx_vmwrite(secondary_processor_based_vm_execution_controls, secondary_controls.control);
	_huoji_vmx_vmwrite(vmexit_controls, exit_controls.control);
	_huoji_vmx_vmwrite(vmentry_controls, entry_controls.control);

	_huoji_vmx_vmwrite(cr0_guest_host_mask, 0x80000021);		// Monitor PE, NE and PG flags
	_huoji_vmx_vmwrite(cr4_guest_host_mask, 0x2000);			// Monitor VMXE flags

设置段寄存器,

// Guest State Area - CS Segment
	_huoji_vmx_vmwrite(guest_cs_selector, state_p.cs.selector);
	_huoji_vmx_vmwrite(guest_cs_limit, state_p.cs.limit);
	_huoji_vmx_vmwrite(guest_cs_access_rights, vt_attrib(state_p.cs.selector, state_p.cs.attrib));
	_huoji_vmx_vmwrite(guest_cs_base, (uintptr_t)state_p.cs.base);
	// Guest State Area - DS Segment
	_huoji_vmx_vmwrite(guest_ds_selector, state_p.ds.selector);
	_huoji_vmx_vmwrite(guest_ds_limit, state_p.ds.limit);
	_huoji_vmx_vmwrite(guest_ds_access_rights, vt_attrib(state_p.ds.selector, state_p.ds.attrib));
	_huoji_vmx_vmwrite(guest_ds_base, (uintptr_t)state_p.ds.base);
	// Guest State Area - ES Segment
	_huoji_vmx_vmwrite(guest_es_selector, state_p.es.selector);
	_huoji_vmx_vmwrite(guest_es_limit, state_p.es.limit);
	_huoji_vmx_vmwrite(guest_es_access_rights, vt_attrib(state_p.es.selector, state_p.es.attrib));
	_huoji_vmx_vmwrite(guest_es_base, (uintptr_t)state_p.es.base);
	// Guest State Area - FS Segment
	_huoji_vmx_vmwrite(guest_fs_selector, state_p.fs.selector);
	_huoji_vmx_vmwrite(guest_fs_limit, state_p.fs.limit);
	_huoji_vmx_vmwrite(guest_fs_access_rights, vt_attrib(state_p.fs.selector, state_p.fs.attrib));
	_huoji_vmx_vmwrite(guest_fs_base, (uintptr_t)state_p.fs.base);
	// Guest State Area - GS Segment
	_huoji_vmx_vmwrite(guest_gs_selector, state_p.gs.selector);
	_huoji_vmx_vmwrite(guest_gs_limit, state_p.gs.limit);
	_huoji_vmx_vmwrite(guest_gs_access_rights, vt_attrib(state_p.gs.selector, state_p.gs.attrib));
	_huoji_vmx_vmwrite(guest_gs_base, (uintptr_t)state_p.gs.base);
	// Guest State Area - SS Segment
	_huoji_vmx_vmwrite(guest_ss_selector, state_p.ss.selector);
	_huoji_vmx_vmwrite(guest_ss_limit, state_p.ss.limit);
	_huoji_vmx_vmwrite(guest_ss_access_rights, vt_attrib(state_p.ss.selector, state_p.ss.attrib));
	_huoji_vmx_vmwrite(guest_ss_base, (uintptr_t)state_p.ss.base);
	// Guest State Area - Task Register
	_huoji_vmx_vmwrite(guest_tr_selector, state_p.tr.selector);
	_huoji_vmx_vmwrite(guest_tr_limit, state_p.tr.limit);
	_huoji_vmx_vmwrite(guest_tr_access_rights, vt_attrib(state_p.tr.selector, state_p.tr.attrib));
	_huoji_vmx_vmwrite(guest_tr_base, (uintptr_t)state_p.tr.base);
	// Guest State Area - Local Descriptor Table Register
	_huoji_vmx_vmwrite(guest_ldtr_selector, state_p.ldtr.selector);
	_huoji_vmx_vmwrite(guest_ldtr_limit, state_p.ldtr.limit);
	_huoji_vmx_vmwrite(guest_ldtr_access_rights, vt_attrib(state_p.ldtr.selector, state_p.ldtr.attrib));
	_huoji_vmx_vmwrite(guest_ldtr_base, (uintptr_t)state_p.ldtr.base);
	// Guest State Area - IDTR and GDTR
	_huoji_vmx_vmwrite(guest_gdtr_base, (uintptr_t)state_p.gdtr.base);
	_huoji_vmx_vmwrite(guest_idtr_base, (uintptr_t)state_p.idtr.base);
	_huoji_vmx_vmwrite(guest_gdtr_limit, state_p.gdtr.limit);
	_huoji_vmx_vmwrite(guest_idtr_limit, state_p.idtr.limit);
	// Guest State Area - Control Registers
	_huoji_vmx_vmwrite(guest_cr0, state_p.cr0);
	_huoji_vmx_vmwrite(guest_cr3, state_p.cr3);
	_huoji_vmx_vmwrite(guest_cr4, state_p.cr4);
	_huoji_vmx_vmwrite(cr0_read_shadow, state_p.cr0);
	_huoji_vmx_vmwrite(cr4_read_shadow, state_p.cr4 & ~ia32_cr4_vmxe_bit);
	// Guest State Area - Debug Controls
	_huoji_vmx_vmwrite(guest_dr7, state_p.dr7);
	_huoji_vmx_vmwrite(guest_msr_ia32_debug_ctrl, state_p.debug_ctrl);
	// VMCS Link Pointer - Essential for Accelerated VMX Nesting
	_huoji_vmx_vmwrite(vmcs_link_pointer, 0xffffffffffffffff);
_huoji_vmx_vmwrite(host_cs_selector, state_p.cs.selector & selector_mask);
_huoji_vmx_vmwrite(host_ds_selector, state_p.ds.selector & selector_mask);
_huoji_vmx_vmwrite(host_es_selector, state_p.es.selector & selector_mask);
_huoji_vmx_vmwrite(host_fs_selector, state_p.fs.selector & selector_mask);
_huoji_vmx_vmwrite(host_gs_selector, state_p.gs.selector & selector_mask);
_huoji_vmx_vmwrite(host_ss_selector, state_p.ss.selector & selector_mask);
_huoji_vmx_vmwrite(host_tr_selector, state_p.tr.selector & selector_mask);
// Host State Area - Segment Bases
_huoji_vmx_vmwrite(host_fs_base, (uintptr_t)state_p.fs.base);
_huoji_vmx_vmwrite(host_gs_base, (uintptr_t)state_p.gs.base);
_huoji_vmx_vmwrite(host_tr_base, (uintptr_t)state_p.tr.base);
// Host State Area - Descriptor Tables
_huoji_vmx_vmwrite(host_gdtr_base, (uintptr_t)state_p.gdtr.base);
_huoji_vmx_vmwrite(host_idtr_base, (uintptr_t)state_p.idtr.base);
_huoji_vmx_vmwrite(host_cr0, state_p.cr0);
_huoji_vmx_vmwrite(host_cr3, __readcr3());
_huoji_vmx_vmwrite(host_cr4, state_p.cr4);

设置我们的RIP与RSP

这就是为什么我们前面要调用RtlCaptureContext保存上下文

// Guest State Area - Flags, Stack Pointer, Instruction Pointer
	_huoji_vmx_vmwrite(guest_rsp, vcpu->context_frame.Rsp);
	_huoji_vmx_vmwrite(guest_rip, vcpu->context_frame.Rip);

设置HOST机堆栈(就是我们前面申请的那个)

_huoji_vmx_vmwrite(host_rsp, (ULONG_PTR)vcpu->stack + KERNEL_STACK_SIZE - sizeof(CONTEXT));

设置HOST机的RIP:

_huoji_vmx_vmwrite(host_rip, (uintptr_t)vmm_entrypoint);

然后就是启动虚拟机:

int status = _huoji_vmx_vmlaunch();
	/*
	* 这些代码不会执行了,执行了就是有问题的,代码会执行到GUEST_RIP去了
	*/
	if (status != vmx_success)
	{
		int vmx_error;
		_huoji_vmread(vm_instruction_error, (size_t*)&vmx_error);
		DebugPrint("Failed at VM-Entry, Code=%d\t Reason: %s\n", vmx_error, vt_error_message[vmx_error]);
		return  false;
	}
	__debugbreak();

请注意,_vmx_vmlaunch后,下面的代码将不会执行,而是跑去前面的RtlCaptureContext保存的位置去了(我们之前 _huoji_vmx_vmwrite(guest_rip, vcpu->context_frame.Rip)写的就是那个地方)

我们的vm_entrypoin是当出现VMEXIT事件的时候执行的地方,汇编代码如下:

vmm_entrypoint proc
    push    rcx
    lea     rcx, [rsp+8h]
    call    RtlCaptureContext
    jmp     vmexit_handler
;    RESTORE_GP
;    vmresume
vmm_entrypoint endp

可以看到 我这边使用RtlCaptureContext保存当前的上下文环境,另外,我们这边破坏了context->rcx的值,在vmexit_handler里面我们要恢复她

DECLSPEC_NORETURN EXTERN_C VOID vmexit_handler(CONTEXT* context)
{
	context->Rcx = *(PULONG64)((ULONG_PTR)context - sizeof(context->Rcx));
......
}

至此,我们成功进入虚拟机.

其他部分会在接下来的文章中说明


文章来源: https://www.freebuf.com/geek/264276.html
如有侵权请联系:admin#unsafe.sh