A brief look at Windows telemetry: CIT aka Customer Interaction Tracker
2022-4-12 22:6:46 Author: research.nccgroup.com(查看原文) 阅读量:29 收藏

tl;dr

  • Windows version up to at least version 7 contained a telemetry source called Customer Interaction Tracker
  • The CIT database can be parsed to aid forensic investigation
  • We have provided code to parse said database

Introduction

About 2 years ago while I was working on a large compromise assessment, I had extra time available to do a little research. For a compromise assessment, we take a forensic snapshot of everything that is in scope. This includes various log or SIEM sources, but also includes a lot of host data. This host data can vary from full disk images, such as those from virtual machines, to smaller, forensically acquired, evidence packages. During this particular compromise assessment, we had host data from about 10,000 machines. An excellent opportunity for large scale data analysis, but also a huge set of data to test new parsers on, or find less common edge cases for existing parsers! During these assignments we generally also take some time to look for new and interesting pieces of data to analyse. We don’t often have access to such a large and varied dataset, so we take advantage of it while we can.

Around this time I also happened to stumble upon the excellent blog posts from Maxim Suhanov over at dfir.ru. Something that caught my eye was his post about the CIT database in Windows. It may or may not stand for “Customer Interaction Tracker” and is one of the telemetry systems that exist within Windows, responsible for tracking interaction with the system and applications. I’d never heard of it before, and it seemed relatively unknown as his post was just about the only reference I could find about it. This, of course, piqued my interest, as it’s more fun exploring new and unknown data sources in contrast to well documented sources. And since I now had access to about 10k hosts, it seemed like as good a time as any to see if I could expore a little bit further than he had.

While Maxim does hypothesise about the purpose of the CIT database, he doesn’t describe much about how it is structured. It’s an LZNT1 compressed blob stored in the Windows registry at HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\System that, when decompressed, has some executable paths in there. Nothing seems to be known about how to parse this binary blob. So called “grep forensics”, while having its’ place, doesn’t scale, and you might be missing crucial pieces of information from the unparsed data. I’m also someone who takes comfort in knowing exactly how something is structured, without too many hypotheses and guesses.

In my large dataset I had plenty of CIT databases, so I could compare them and possibly spot patterns on how to parse this blob, so that’s exactly what I set out to do. Fast iteration with dissect.cstruct and a few hours of eyeballing hexdumps later, I came up with some structures on how I thought the data might be stored.

struct header {
uint16 unk0;
uint16 unk1;
uint32 size;
uint64 timestamp;
uint64 unk2;
uint32 num_entry1;
uint32 entry1_offset;
uint32 block1_size;
uint32 block1_offset;
uint32 num_entry2;
uint32 entry2_offset;
uint64 timestamp2;
uint64 timestamp3;
uint32 unk9;
uint32 unk10;
uint32 unk11;
uint32 unk12;
uint32 unk13;
uint32 unk14;
};
// <snip>
struct entry1 {
uint32 entry3_offset;
uint32 entry2_offset;
uint32 entry3_size;
uint32 entry2_size;
};
// <snip>
struct entry3 {
uint32 path_offset;
uint32 unk0;
uint32 unk1;
uint32 unk2;
uint32 unk3;
uint32 unk4;
uint32 unk5;
};

While still incredibly rough, I figured I had a rudimentary understanding of how the CIT was stored. However, at the time it was hardly a practical improvement over just “extracting the strings”, except perhaps that the parsing was a bit more efficient when compared to extracting strings. It did scratch initial my itch on figuring out how it might be stored, but I didn’t want to spend a lot more time on it at the time. I added it as a plugin in our investigation framework called dissect, ran it over all the host data we had and used it as an additional source of information during the remainder of the compromise assessment. I figured I’d revisit some other time.

Revisiting

Some other time turned out to be a lot farther into the future than I had anticipated. On an uneventful friday afternoon a few weeks ago, at the time of writing, and 2 years after my initial look, I figured I’d give the CIT another shot. This time I’d go about it with my usual approach, given that I had more time available now. That approach roughly consists of finding whatever DLL, driver or part of the Windows kernel is responsible for some behaviour, reverse engineering it and writing my own implementation. This is my preferred approach if I have a bit more time available, since it leaves little room for wrongful hypotheses and own interpretation, and grounds your implementation in mostly facts.

Approach

My usual approach starts with scraping the disk of one of my virtual machines with some byte pattern, usually a string in various encodings (UTF-8 and UTF-16-LE, the default string encoding in Windows, for example) in search of files that contain those strings or byte patterns. For this we can utilize our dissect framework that, among its many capabilities, allows us to easily search for occurrences of data within any data container, such as (sparse) VMDK files or other types of disk images. We can combine this with a filesystem parser to see if a hit is within the dataruns of a file, and report which files have hits. This process only takes a few minutes and I immediately get an overview of all the files on my entire filesystem that may have a reference to something I’m looking for.

In this case, I used part of the registry path where the CIT database is stored. Using this approach, I quickly found a couple of DLLs that looked interesting, but a quick inspection revealed only one that was truly of interest: generaltel.dll. This DLL, among other things, seems to be responsible for consuming the CIT database and its records, and emitting telemetry ETW messages.

Reverse engineering

Through reverse engineering the parsing code and looking at how the ETW messages are constructed, we can create some fairly complete looking structures to parse the CIT database.

typedef struct _CIT_HEADER {
WORD MajorVersion;
WORD MinorVersion;
DWORD Size; /* Size of the entire buffer */
FILETIME CurrentTimeLocal; /* Maybe the time when the saved CIT was last updated? */
DWORD Crc32; /* Crc32 of the entire buffer, skipping this field */
DWORD EntrySize;
DWORD EntryCount;
DWORD EntryDataOffset;
DWORD SystemDataSize;
DWORD SystemDataOffset;
DWORD BaseUseDataSize;
DWORD BaseUseDataOffset;
FILETIME StartTimeLocal; /* Presumably when the aggregation started */
FILETIME PeriodStartLocal; /* Presumably the starting point of the aggregation period */
DWORD AggregationPeriodInS; /* Presumably the duration over which this data was gathered
* Always 604800 (7 days) */
DWORD BitPeriodInS; /* Presumably the amount of seconds a single bit represents
* Always 3600 (1 hour) */
DWORD SingleBitmapSize; /* This appears to be the sizes of the Stats buffers, always 21 */
DWORD _Unk0; /* Always 0x00000100? */
DWORD HeaderSize;
DWORD _Unk1; /* Always 0x00000000? */
} CIT_HEADER;
typedef struct _CIT_PERSISTED {
DWORD BitmapsOffset; /* Array of Offset and Size (DWORD, DWORD) */
DWORD BitmapsSize;
DWORD SpanStatsOffset; /* Array of Count and Duration (DWORD, DWORD) */
DWORD SpanStatsSize;
DWORD StatsOffset; /* Array of WORD */
DWORD StatsSize;
} CIT_PERSISTED;
typedef struct _CIT_ENTRY {
DWORD ProgramDataOffset; /* Offset to CIT_PROGRAM_DATA */
DWORD UseDataOffset; /* Offset to CIT_PERSISTED */
DWORD ProgramDataSize;
DWORD UseDataSize;
} CIT_ENTRY;
typedef struct _CIT_PROGRAM_DATA {
DWORD FilePathOffset; /* Offset to UTF-16-LE file path string */
DWORD FilePathSize; /* strlen of string */
DWORD CommandLineOffset; /* Offset to UTF-16-LE command line string */
DWORD CommandLineSize; /* strlen of string */
DWORD PeTimeDateStamp; /* aka Extra1 */
DWORD PeCheckSum; /* aka Extra2 */
DWORD Extra3; /* aka Extra3, some flag from PROCESSINFO struct */
} CIT_PROGRAM_DATA;

When compared against the initial guessed structures, we can immediately get a feeling for the overall format of the CIT. Decompressed, the CIT is made up of a small header, a global “system use data”, a global “use data” and a bunch of entries. Each entry has its’ own “use data” as well as references to a file path and optional command line string.

Interpreting the data

Figuring out how to parse data is the easy part, interpreting this data is oftentimes much harder.

Looking at the structures we came up with, we have something called “use data” that contains some bitmaps, “stats” and “span stats”. Bitmaps are usually straightforward since there are only so many ways you can interpret those, but “stats” and “span stats” can mean just about anything. However, we still have the issue that the “system use data” has multiple bitmaps.

To more confidently interpet the data, it’s best we look at how it’s created. Further reverse engineering brings us to wink32base.sys, win32kfull.sys for newer Windows versions (e.g. Windows 10+), and win32k.sys for older Windows versions (e.g. Windows 7, Server 2012).

In the CIT header, we can see a BitPeriodInS, SingleBitmapSize and AggregationPeriodInS. With some values from a real header, we can confirm that (BitPeriodInS * 8) * SingleBitmapSize = AggregationPeriondInS. We also have a PeriodStartLocal field which is usually a nicely rounded timestamp. From this, we can make a fairly confident assumption that for every bit in the bitmap, the application in the entry or the system was used within a BitPeriodInS time window. This means that the bitmaps track activity over a larger time period in some period size, by default an hour. Reverse engineered code seems to support this, too. Note that all of this is in local time, not UTC.

For the “stats” or “span stats”, it’s not that easy. We have no indication of what these values might mean, other than their integer size. The parsing code seems to suggest they might be tuples, but that may very well be a compiler optimization. We at least know they aren’t offsets, since their values are often far larger than the size of the CIT.

Further reverse engineering win32k.sys seems to suggest that the “stats” are in fact individual counters, being incremented in functions such as CitSessionConnectChange, CitDesktopSwitch, etc. These functions get called from other relevant functions in win32k.sys, like xxxSwitchDesktop that calls CitDesktopSwitch. One of the smaller increment functions can be seen below as an example:

void __fastcall CitThreadGhostingChange(tagTHREADINFO *pti)
{
struct _CIT_USE_DATA *UseData; // rax
__int16 v2; // cx
if ( g_CIT_IMPACT_CONTEXT )
{
if ( _bittest((const signed __int32 *)&pti->TIF_flags, 0x1Fu) )
{
UseData = CitpProcessGetUseData(pti->ppi);
if ( UseData )
{
v2 = –1;
if ( (unsigned __int16)(UseData->Stats.ThreadGhostingChanges + 1) >= UseData->Stats.ThreadGhostingChanges )
v2 = UseData->Stats.ThreadGhostingChanges + 1;
UseData->Stats.ThreadGhostingChanges = v2;
}
}
}
}

The increment events are different between the system use data and the program use data. If we map these increments out to the best of our ability, we end up with the following structures:

struct _CIT_SYSTEM_DATA_STATS
{
WORD Unknown_BootIdRelated0;
WORD Unknown_BootIdRelated1;
WORD Unknown_BootIdRelated2;
WORD Unknown_BootIdRelated3;
WORD Unknown_BootIdRelated4;
WORD SessionConnects;
WORD ProcessForegroundChanges;
WORD ContextFlushes;
WORD MissingProgData;
WORD DesktopSwitches;
WORD WinlogonMessage;
WORD WinlogonLockHotkey;
WORD WinlogonLock;
WORD SessionDisconnects;
};
struct _CIT_USE_DATA_STATS
{
WORD Crashes;
WORD ThreadGhostingChanges;
WORD Input;
WORD InputKeyboard;
WORD Unknown;
WORD InputTouch;
WORD InputHid;
WORD InputMouse;
WORD MouseLeftButton;
WORD MouseRightButton;
WORD MouseMiddleButton;
WORD MouseWheel;
};

There are some interesting tracked statistics here, such as the amount of times someone logged on, locked their system, or how many times they clicked or pressed a key in an application.

We can see similar behaviour for “span stats”, but in this case it appears to be a pair of (count, duration). Similarly, if we map these increments out to the best of our ability, we end up with the following structures:

struct _CIT_SPAN_STAT_ITEM
{
  DWORD Count;
  DWORD Duration;
};
struct _CIT_SYSTEM_DATA_SPAN_STATS
{
  _CIT_SPAN_STAT_ITEM ContextFlushes0;
  _CIT_SPAN_STAT_ITEM Foreground0;
  _CIT_SPAN_STAT_ITEM Foreground1;
  _CIT_SPAN_STAT_ITEM DisplayPower0;
  _CIT_SPAN_STAT_ITEM DisplayRequestChange;
  _CIT_SPAN_STAT_ITEM DisplayPower1;
  _CIT_SPAN_STAT_ITEM DisplayPower2;
  _CIT_SPAN_STAT_ITEM DisplayPower3;
  _CIT_SPAN_STAT_ITEM ContextFlushes1;
  _CIT_SPAN_STAT_ITEM Foreground2;
  _CIT_SPAN_STAT_ITEM ContextFlushes2;
};
struct _CIT_USE_DATA_SPAN_STATS
{
  _CIT_SPAN_STAT_ITEM ProcessCreation0;
  _CIT_SPAN_STAT_ITEM Foreground0;
  _CIT_SPAN_STAT_ITEM Foreground1;
  _CIT_SPAN_STAT_ITEM Foreground2;
  _CIT_SPAN_STAT_ITEM ProcessSuspended;
  _CIT_SPAN_STAT_ITEM ProcessCreation1;
};

Finally, when looking for all references to the bitmaps, we can identify the following bitmaps stored in the “system use data”:

struct _CIT_SYSTEM_DATA_BITMAPS
{
_CIT_BITMAP DisplayPower;
_CIT_BITMAP DisplayRequestChange;
_CIT_BITMAP Input;
_CIT_BITMAP InputTouch;
_CIT_BITMAP Unknown;
_CIT_BITMAP Foreground;
};

We can also identify that the single bitmap linked to each program entry is a bitmap of “foreground” activity for the aggregation period.

In the original source, I suspect these fields are accessed by index with an enum, but mapping them to structs makes for easier reverse engineering. You can also still see some unknowns in there, or unspecified fields such as Foreground0 and Foreground1. This is because the differentiation between these is currently unclear. For example, both counters might be incremented upon a foreground switch, but only one of them when a specific flag or condition is true. The exact condition or meaning of the flag is currently unknown.

Newer Windows versions

During the reverse engineering of the various win32k modules, I noticed something disappointing: the CIT database seems to no longer exist in the same form on newer Windows versions. Some of the same code remains and some new code was introduced, but any relation to the stored CIT database as described up until now seems to no longer exists. Maybe it’s now handled somewhere else and I couldn’t find it, but I also haven’t encountered any recent Windows host that has had CIT data stored on it.

Something else seems to have taken its place, though. We have some stored DP and PUUActive (Post Update Use Info) data instead. If the running Windows version is a “multi-session SKU”, as determined by the RtlIsMultiSessionSku API, these values are stored under the key HKCU\Software\Microsoft\Windows NT\CurrentVersion\Winlogon. Otherwise, they are stored under HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT.

Post Update Use Info

We can apply the same technique here as we did with the older CIT database, which is to look at how ETW messages are being created from the data. A little bit of reversing later and we get the following structure:

typedef struct _CIT_POST_UPDATE_USE_INFO {
DWORD UpdateKey;
WORD UpdateCount;
WORD CrashCount;
WORD SessionCount;
WORD LogCount;
DWORD UserActiveDurationInS;
DWORD UserOrDispActiveDurationInS;
DWORD DesktopActiveDurationInS;
WORD Version;
WORD _Unk0;
WORD BootIdMin;
WORD BootIdMax;
DWORD PMUUKey;
DWORD SessionDurationInS;
DWORD SessionUptimeInS;
DWORD UserInputInS;
DWORD MouseInputInS;
DWORD KeyboardInputInS;
DWORD TouchInputInS;
DWORD PrecisionTouchpadInputInS;
DWORD InForegroundInS;
DWORD ForegroundSwitchCount;
DWORD UserActiveTransitionCount;
DWORD _Unk1;
FILETIME LogTimeStart;
QWORD CumulativeUserActiveDurationInS;
WORD UpdateCountAccumulationStarted;
WORD _Unk2;
DWORD BuildUserActiveDurationInS;
DWORD BuildNumber;
DWORD _UnkDeltaUserOrDispActiveDurationInS;
DWORD _UnkDeltaTime;
DWORD _Unk3;
} CIT_POST_UPDATE_USE_INFO;

Looks like we lost the information for individual applications, but we still get a lot of usage data.

DP

Once again we can apply the same technique, resulting in the following:

typedef struct _CIT_DP_MEMOIZATION_ENTRY {
DWORD Unk0;
DWORD Unk1;
DWORD Unk2;
} CIT_DP_MEMOIZATION_ENTRY;
typedef struct _CIT_DP_MEMOIZATION_CONTEXT {
_CIT_DP_MEMOIZATION_ENTRY Entries[12];
} CIT_DP_MEMOIZATION_CONTEXT;
typedef struct _CIT_DP_DATA {
WORD Version;
WORD Size;
WORD LogCount;
WORD CrashCount;
DWORD SessionCount;
DWORD UpdateKey;
QWORD _Unk0;
FILETIME _UnkTime;
FILETIME LogTimeStart;
DWORD ForegroundDurations[11];
DWORD _Unk1;
_CIT_DP_MEMOIZATION_CONTEXT MemoizationContext;
} CIT_DP_DATA;

I haven’t looked too deeply into the memoization shown here, but it’s largely irrelevant when parsing the data. We see some of the same fields we also saw in the PUU data, but also a ForegroundDurations array. This appears to be an array of foreground durations in milliseconds for a couple of hardcoded applications:

  • Microsoft Internet Explorer
    • IEXPLORE.EXE
  • Microsoft Edge
    • MICROSOFTEDGE.EXE, MICROSOFTEDGECP.EXE, MICROSOFTEDGEBCHOST.EXE, MICROSOFTEDGEDEVTOOLS.EXE
  • Google Chrome
    • CHROME.EXE
  • Microsoft Word
    • WINWORD.EXE
  • Microsoft Excel
    • EXCEL.EXE
  • Mozilla Firefox
    • FIREFOX.EXE
  • Microsoft Photos
    • MICROSOFT.PHOTOS.EXE
  • Microsoft Outlook
    • OUTLOOK.EXE
  • Adobe Acrobat Reader
    • ACRORD32.EXE
  • Microsoft Skype
    • SKYPE.EXE

Each application is given an index in this array, starting from 1. Index 0 appears to be reserved for a cumulative time. It is not currently known if this list of applications changes between Windows versions. It’s also not currently known what “DP” stands for.

Other findings

While looking for some test CIT data, I stumbled upon two other pieces of information stored in the registry under the CIT registry key.

Telemetry answers

This information is stored at the registry key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\win32k, under a subkey of some arbitrary version number. It contains values with the value name being the ImageFileName of the process, and the value being a flag indicating what types of messages or telemetry this application received during its lifetime. For example, the POWERBROADCAST flag is set if NtUserfnPOWERBROADCAST is called on a process, which itself it called from NtUserMessageCall. Presumably a system message if the power state of the system changed (e.g. a charger was plugged in). Currently known values are:

You can discover which events a process received by masking the stored value with these values. For example, the value 0x30000 can be interpreted as POWERBROADCAST|DEVICECHANGE, meaning that a process received those events.

This behaviour was only present in a Windows 7 win32k.sys and seems to no longer be present in more recent Windows versions. I have also seen instances where the values 4 and 8 were used, but have not been able to find a corresponding executable that produces these values. In most win32k.sys the code for this is inlined, but in some the function name AnswerTelemetryQuestion can be seen.

Modules

Another interesting registry key is HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\Module. It has subkeys for certain runtime DLLs (for example, System32/mrt100.dll or Microsoft.NET/Framework64/v4.0.30319/clr.dll), and each subkey has values for applications that have loaded this module. The name of the value is once again the ImageFileName and the value is a standard Windows timestamp of when the value was written.

These values are written by ahcache.sys, function CitmpLogUsageWorker. This function is called from CitmpLoadImageCallback, which subsequently is the callback function provided to PsSetLoadImageNotifyRoutine. The MSDN page for this function says that this call registers a “driver-supplied callback that is subsequently notified whenever an image is loaded (or mapped into memory)”. This callback checks a couple of conditions. First, it checks if the module is loaded from a system partition, by checking the DO_SYSTEM_SYSTEM_PARTITION flag of the underlying device. Then it checks if the image it’s loading is from a set of tracked modules. This list is optionally read from the registry key HKLM\System\CurrentControlSet\Control\Session Manager\AppCompatCache and value Citm, but has a default list to fall back to. The version of ahcache.sys that I analysed contained:

  • \System32\mrt100.dll
  • Microsoft.NET\Framework\v1.0.3705\mscorwks.dll
  • Microsoft.NET\Framework\v1.0.3705\mscorsvr.dll
  • Microsoft.NET\Framework\v1.1.4322\mscorwks.dll
  • Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll
  • Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
  • \Microsoft.NET\Framework\v4.0.30319\clr.dll
  • \Microsoft.NET\Framework64\v4.0.30319\clr.dll
  • \Microsoft.NET\Framework64\v2.0.50727\mscorwks.dll

The tracked module path is concatenated to the aforementioned registry key to, for example, result in the key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\Module\Microsoft.NET/Framework/v1.0.3705/mscorwks.dll. Note the replaced path separators to not conflict with the registry path separator. It does a final check if there are not more than 64 values already in this key, or if the ImageFileName of the executable exceeds 520 characters. In the first case, the current system time is stored in the OverflowQuota value. In the second case, the value name OverflowValue is used.

So far I haven’t found anything that actually removes values from this registry key, so OverflowQuota effectively contains the timestamp of the last execution to load that module, but which already had more than 64 values. If these values are indeed never removed, it unfortunately means that these registry keys only contain the first 64 executables to load these modules.

This behaviour seems to be present from Windows 10 onwards.

Summary

We showed how to parse the CIT database and provide some additional information on what it stores. The information presented may not be perfect, but this was just a couple of days worth of research into CIT. We hope it’s useful to some and perhaps also a showcase of a method to quickly research topics like these.

We also discovered the lack of the CIT database on newer Windows versions, and these new DP and PUUActive values. We provided some information on what these structures contain and structure definitions to easily parse them.

Finally, we also provide code to parse the CIT database yourself. It’s just the code to parse the CIT contents and doesn’t do anything to access the registry. There’s also no code to parse the other mentioned registry keys, since registry access is very implementation specific between investigation tools, and the values are quite trivial to parse out. We have implemented all of these findings into our investigation framework, which enables us to use them on all types of evidence data that we encounter.

We invite anyone curious on this topic to provide feedback and information for anything we may have missed or misinterpreted.

Source

#!/usr/bin/env python3
import array
import argparse
import io
import struct
import sys
from binascii import crc32
from datetime import datetime, timedelta, timezone
try:
from dissect import cstruct
from Registry import Registry
except ImportError:
print("Missing dependencies, run:\npip install dissect.cstruct python-registry")
sys.exit(1)
try:
from zoneinfo import ZoneInfo
HAS_ZONEINFO = True
except ImportError:
HAS_ZONEINFO = False
cit_def = """
typedef QWORD FILETIME;
flag TELEMETRY_ANSWERS {
POWERBROADCAST = 0x10000,
DEVICECHANGE = 0x20000,
IME_CONTROL = 0x40000,
WINHELP = 0x80000,
};
typedef struct _CIT_HEADER {
WORD MajorVersion;
WORD MinorVersion;
DWORD Size; /* Size of the entire buffer */
FILETIME CurrentTimeLocal; /* Maybe the time when the saved CIT was last updated? */
DWORD Crc32; /* Crc32 of the entire buffer, skipping this field */
DWORD EntrySize;
DWORD EntryCount;
DWORD EntryDataOffset;
DWORD SystemDataSize;
DWORD SystemDataOffset;
DWORD BaseUseDataSize;
DWORD BaseUseDataOffset;
FILETIME StartTimeLocal; /* Presumably when the aggregation started */
FILETIME PeriodStartLocal; /* Presumably the starting point of the aggregation period */
DWORD AggregationPeriodInS; /* Presumably the duration over which this data was gathered
* Always 604800 (7 days) */
DWORD BitPeriodInS; /* Presumably the amount of seconds a single bit represents
* Always 3600 (1 hour) */
DWORD SingleBitmapSize; /* This appears to be the sizes of the Stats buffers, always 21 */
DWORD _Unk0; /* Always 0x00000100? */
DWORD HeaderSize;
DWORD _Unk1; /* Always 0x00000000? */
} CIT_HEADER;
typedef struct _CIT_PERSISTED {
DWORD BitmapsOffset; /* Array of Offset and Size (DWORD, DWORD) */
DWORD BitmapsSize;
DWORD SpanStatsOffset; /* Array of Count and Duration (DWORD, DWORD) */
DWORD SpanStatsSize;
DWORD StatsOffset; /* Array of WORD */
DWORD StatsSize;
} CIT_PERSISTED;
typedef struct _CIT_ENTRY {
DWORD ProgramDataOffset; /* Offset to CIT_PROGRAM_DATA */
DWORD UseDataOffset; /* Offset to CIT_PERSISTED */
DWORD ProgramDataSize;
DWORD UseDataSize;
} CIT_ENTRY;
typedef struct _CIT_PROGRAM_DATA {
DWORD FilePathOffset; /* Offset to UTF-16-LE file path string */
DWORD FilePathSize; /* strlen of string */
DWORD CommandLineOffset; /* Offset to UTF-16-LE command line string */
DWORD CommandLineSize; /* strlen of string */
DWORD PeTimeDateStamp; /* aka Extra1 */
DWORD PeCheckSum; /* aka Extra2 */
DWORD Extra3; /* aka Extra3, some flag from PROCESSINFO struct */
} CIT_PROGRAM_DATA;
typedef struct _CIT_BITMAP_ITEM {
DWORD Offset;
DWORD Size;
} CIT_BITMAP_ITEM;
typedef struct _CIT_SPAN_STAT_ITEM {
DWORD Count;
DWORD Duration;
} CIT_SPAN_STAT_ITEM;
typedef struct _CIT_SYSTEM_DATA_SPAN_STATS {
CIT_SPAN_STAT_ITEM ContextFlushes0;
CIT_SPAN_STAT_ITEM Foreground0;
CIT_SPAN_STAT_ITEM Foreground1;
CIT_SPAN_STAT_ITEM DisplayPower0;
CIT_SPAN_STAT_ITEM DisplayRequestChange;
CIT_SPAN_STAT_ITEM DisplayPower1;
CIT_SPAN_STAT_ITEM DisplayPower2;
CIT_SPAN_STAT_ITEM DisplayPower3;
CIT_SPAN_STAT_ITEM ContextFlushes1;
CIT_SPAN_STAT_ITEM Foreground2;
CIT_SPAN_STAT_ITEM ContextFlushes2;
} CIT_SYSTEM_DATA_SPAN_STATS;
typedef struct _CIT_USE_DATA_SPAN_STATS {
CIT_SPAN_STAT_ITEM ProcessCreation0;
CIT_SPAN_STAT_ITEM Foreground0;
CIT_SPAN_STAT_ITEM Foreground1;
CIT_SPAN_STAT_ITEM Foreground2;
CIT_SPAN_STAT_ITEM ProcessSuspended;
CIT_SPAN_STAT_ITEM ProcessCreation1;
} CIT_USE_DATA_SPAN_STATS;
typedef struct _CIT_SYSTEM_DATA_STATS {
WORD Unknown_BootIdRelated0;
WORD Unknown_BootIdRelated1;
WORD Unknown_BootIdRelated2;
WORD Unknown_BootIdRelated3;
WORD Unknown_BootIdRelated4;
WORD SessionConnects;
WORD ProcessForegroundChanges;
WORD ContextFlushes;
WORD MissingProgData;
WORD DesktopSwitches;
WORD WinlogonMessage;
WORD WinlogonLockHotkey;
WORD WinlogonLock;
WORD SessionDisconnects;
} CIT_SYSTEM_DATA_STATS;
typedef struct _CIT_USE_DATA_STATS {
WORD Crashes;
WORD ThreadGhostingChanges;
WORD Input;
WORD InputKeyboard;
WORD Unknown;
WORD InputTouch;
WORD InputHid;
WORD InputMouse;
WORD MouseLeftButton;
WORD MouseRightButton;
WORD MouseMiddleButton;
WORD MouseWheel;
} CIT_USE_DATA_STATS;
// PUU
typedef struct _CIT_POST_UPDATE_USE_INFO {
DWORD UpdateKey;
WORD UpdateCount;
WORD CrashCount;
WORD SessionCount;
WORD LogCount;
DWORD UserActiveDurationInS;
DWORD UserOrDispActiveDurationInS;
DWORD DesktopActiveDurationInS;
WORD Version;
WORD _Unk0;
WORD BootIdMin;
WORD BootIdMax;
DWORD PMUUKey;
DWORD SessionDurationInS;
DWORD SessionUptimeInS;
DWORD UserInputInS;
DWORD MouseInputInS;
DWORD KeyboardInputInS;
DWORD TouchInputInS;
DWORD PrecisionTouchpadInputInS;
DWORD InForegroundInS;
DWORD ForegroundSwitchCount;
DWORD UserActiveTransitionCount;
DWORD _Unk1;
FILETIME LogTimeStart;
QWORD CumulativeUserActiveDurationInS;
WORD UpdateCountAccumulationStarted;
WORD _Unk2;
DWORD BuildUserActiveDurationInS;
DWORD BuildNumber;
DWORD _UnkDeltaUserOrDispActiveDurationInS;
DWORD _UnkDeltaTime;
DWORD _Unk3;
} CIT_POST_UPDATE_USE_INFO;
// DP
typedef struct _CIT_DP_MEMOIZATION_ENTRY {
DWORD Unk0;
DWORD Unk1;
DWORD Unk2;
} CIT_DP_MEMOIZATION_ENTRY;
typedef struct _CIT_DP_MEMOIZATION_CONTEXT {
_CIT_DP_MEMOIZATION_ENTRY Entries[12];
} CIT_DP_MEMOIZATION_CONTEXT;
typedef struct _CIT_DP_DATA {
WORD Version;
WORD Size;
WORD LogCount;
WORD CrashCount;
DWORD SessionCount;
DWORD UpdateKey;
QWORD _Unk0;
FILETIME _UnkTime;
FILETIME LogTimeStart;
DWORD ForegroundDurations[11];
DWORD _Unk1;
_CIT_DP_MEMOIZATION_CONTEXT MemoizationContext;
} CIT_DP_DATA;
"""
c_cit = cstruct.cstruct()
c_cit.load(cit_def)
class CIT:
def __init__(self, buf):
compressed_fh = io.BytesIO(buf)
compressed_size, uncompressed_size = struct.unpack("<2I", compressed_fh.read(8))
self.buf = lznt1_decompress(compressed_fh)
self.header = c_cit.CIT_HEADER(self.buf)
if self.header.MajorVersion != 0x0A:
raise ValueError("Unsupported CIT version")
digest = crc32(self.buf[0x14:], crc32(self.buf[:0x10]))
if self.header.Crc32 != digest:
raise ValueError("Crc32 mismatch")
system_data_buf = self.data(self.header.SystemDataOffset, self.header.SystemDataSize, 0x18)
self.system_data = SystemData(self, c_cit.CIT_PERSISTED(system_data_buf))
base_use_data_buf = self.data(self.header.BaseUseDataOffset, self.header.BaseUseDataSize, 0x18)
self.base_use_data = BaseUseData(self, c_cit.CIT_PERSISTED(base_use_data_buf))
entry_data = self.buf[self.header.EntryDataOffset :]
self.entries = [Entry(self, entry) for entry in c_cit.CIT_ENTRY[self.header.EntryCount](entry_data)]
def data(self, offset, size, expected_size=None):
if expected_size and size > expected_size:
size = expected_size
data = self.buf[offset : offset + size]
if expected_size and size < expected_size:
data.ljust(expected_size, b"\x00")
return data
def iter_bitmap(self, bitmap: bytes):
bit_delta = timedelta(seconds=self.header.BitPeriodInS)
ts = wintimestamp(self.header.PeriodStartLocal)
for byte in bitmap:
if byte == b"\x00":
ts += 8 * bit_delta
else:
for bit in range(8):
if byte & (1 << bit):
yield ts
ts += bit_delta
class Entry:
def __init__(self, cit, entry):
self.cit = cit
self.entry = entry
program_buf = cit.data(entry.ProgramDataOffset, entry.ProgramDataSize, 0x1C)
self.program_data = c_cit.CIT_PROGRAM_DATA(program_buf)
use_data_buf = cit.data(entry.UseDataOffset, entry.UseDataSize, 0x18)
self.use_data = ProgramUseData(cit, c_cit.CIT_PERSISTED(use_data_buf))
self.file_path = None
self.command_line = None
if self.program_data.FilePathOffset:
file_path_buf = cit.data(self.program_data.FilePathOffset, self.program_data.FilePathSize * 2)
self.file_path = file_path_buf.decode("utf-16-le")
if self.program_data.CommandLineOffset:
command_line_buf = cit.data(self.program_data.CommandLineOffset, self.program_data.CommandLineSize * 2)
self.command_line = command_line_buf.decode("utf-16-le")
def __repr__(self):
return f"<Entry file_path={self.file_path!r} command_line={self.command_line!r}>"
class BaseUseData:
MIN_BITMAPS_SIZE = 0x8
MIN_SPAN_STATS_SIZE = 0x30
MIN_STATS_SIZE = 0x18
def __init__(self, cit, entry):
self.cit = cit
self.entry = entry
bitmap_items = c_cit.CIT_BITMAP_ITEM[entry.BitmapsSize // len(c_cit.CIT_BITMAP_ITEM)](
cit.data(entry.BitmapsOffset, entry.BitmapsSize, self.MIN_BITMAPS_SIZE)
)
bitmaps = [cit.data(item.Offset, item.Size) for item in bitmap_items]
self.bitmaps = self._parse_bitmaps(bitmaps)
self.span_stats = self._parse_span_stats(
cit.data(entry.SpanStatsOffset, entry.SpanStatsSize, self.MIN_SPAN_STATS_SIZE)
)
self.stats = self._parse_stats(cit.data(entry.StatsOffset, entry.StatsSize, self.MIN_STATS_SIZE))
def _parse_bitmaps(self, bitmaps):
return BaseUseDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return None
def _parse_stats(self, stats_data):
return None
class BaseUseDataBitmaps:
def __init__(self, cit, bitmaps):
self.cit = cit
self._bitmaps = bitmaps
def _parse_bitmap(self, idx):
return list(self.cit.iter_bitmap(self._bitmaps[idx]))
class SystemData(BaseUseData):
MIN_BITMAPS_SIZE = 0x30
MIN_SPAN_STATS_SIZE = 0x58
MIN_STATS_SIZE = 0x1C
def _parse_bitmaps(self, bitmaps):
return SystemDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return c_cit.CIT_SYSTEM_DATA_SPAN_STATS(span_stats_data)
def _parse_stats(self, stats_data):
return c_cit.CIT_SYSTEM_DATA_STATS(stats_data)
class SystemDataBitmaps(BaseUseDataBitmaps):
def __init__(self, cit, bitmaps):
super().__init__(cit, bitmaps)
self.display_power = self._parse_bitmap(0)
self.display_request_change = self._parse_bitmap(1)
self.input = self._parse_bitmap(2)
self.input_touch = self._parse_bitmap(3)
self.unknown = self._parse_bitmap(4)
self.foreground = self._parse_bitmap(5)
class ProgramUseData(BaseUseData):
def _parse_bitmaps(self, bitmaps):
return ProgramDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return c_cit.CIT_USE_DATA_SPAN_STATS(span_stats_data)
def _parse_stats(self, stats_data):
return c_cit.CIT_USE_DATA_STATS(stats_data)
class ProgramDataBitmaps(BaseUseDataBitmaps):
def __init__(self, cit, use_data):
super().__init__(cit, use_data)
self.foreground = self._parse_bitmap(0)
# Some inlined utility functions for the purpose of the POC
def wintimestamp(ts, tzinfo=timezone.utc):
# This is a slower method of calculating Windows timestamps, but works on both Windows and Unix platforms
# Performance is not an issue for this POC
return datetime(1970, 1, 1, tzinfo=tzinfo) + timedelta(seconds=float(ts) * 1e-7 11644473600)
# LZNT1 derived from https://github.com/google/rekall/blob/master/rekall-core/rekall/plugins/filesystems/lznt1.py
def _get_displacement(offset):
"""Calculate the displacement."""
result = 0
while offset >= 0x10:
offset >>= 1
result += 1
return result
DISPLACEMENT_TABLE = array.array("B", [_get_displacement(x) for x in range(8192)])
COMPRESSED_MASK = 1 << 15
SIGNATURE_MASK = 3 << 12
SIZE_MASK = (1 << 12) 1
TAG_MASKS = [(1 << i) for i in range(0, 8)]
def lznt1_decompress(src):
"""LZNT1 decompress from a file-like object.
Args:
src: File-like object to decompress from.
Returns:
bytes: The decompressed bytes.
"""
offset = src.tell()
src.seek(0, io.SEEK_END)
size = src.tell() offset
src.seek(offset)
dst = io.BytesIO()
while src.tell() offset < size:
block_offset = src.tell()
uncompressed_chunk_offset = dst.tell()
block_header = struct.unpack("<H", src.read(2))[0]
if block_header & SIGNATURE_MASK != SIGNATURE_MASK:
break
hsize = block_header & SIZE_MASK
block_end = block_offset + hsize + 3
if block_header & COMPRESSED_MASK:
while src.tell() < block_end:
header = ord(src.read(1))
for mask in TAG_MASKS:
if src.tell() >= block_end:
break
if header & mask:
pointer = struct.unpack("<H", src.read(2))[0]
displacement = DISPLACEMENT_TABLE[dst.tell() uncompressed_chunk_offset 1]
symbol_offset = (pointer >> (12 displacement)) + 1
symbol_length = (pointer & (0xFFF >> displacement)) + 3
dst.seek(symbol_offset, io.SEEK_END)
data = dst.read(symbol_length)
# Pad the data to make it fit.
if 0 < len(data) < symbol_length:
data = data * (symbol_length // len(data) + 1)
data = data[:symbol_length]
dst.seek(0, io.SEEK_END)
dst.write(data)
else:
data = src.read(1)
dst.write(data)
else:
# Block is not compressed
data = src.read(hsize + 1)
dst.write(data)
result = dst.getvalue()
return result
def print_bitmap(name, bitmap, indent=8):
print(f"{' ' * indent}{name}:")
for entry in bitmap:
print(f"{' ' * (indent + 4)}{entry}")
def print_span_stats(span_stats, indent=8):
for key, value in span_stats._values.items():
print(f"{' ' * indent}{key}: {value.Count} times, {value.Duration}ms")
def print_stats(stats, indent=8):
for key, value in stats._values.items():
print(f"{' ' * indent}{key}: {value}")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("input", type=argparse.FileType("rb"), help="path to SOFTWARE hive file")
parser.add_argument("–tz", default="UTC", help="timezone to use for parsing local timestamps")
args = parser.parse_args()
if not HAS_ZONEINFO:
print("[!] zoneinfo module not available, falling back to UTC")
tz = timezone.utc
else:
tz = ZoneInfo(args.tz)
hive = Registry.Registry(args.input)
try:
cit_key = hive.open("Microsoft\\Windows NT\\CurrentVersion\\AppCompatFlags\\CIT\\System")
except Registry.RegistryKeyNotFoundException:
parser.exit("No CIT\\System key found in the hive specified!")
for cit_value in cit_key.values():
data = cit_value.value()
if len(data) <= 8:
continue
print(f"Parsing {cit_value.name()}")
cit = CIT(data)
print("Period start:", wintimestamp(cit.header.PeriodStartLocal, tz))
print("Start time:", wintimestamp(cit.header.StartTimeLocal, tz))
print("Current time:", wintimestamp(cit.header.CurrentTimeLocal, tz))
print("Bit period in hours:", cit.header.BitPeriodInS // 60 // 60)
print("Aggregation period in hours:", cit.header.AggregationPeriodInS // 60 // 60)
print()
print("System:")
print(" Bitmaps:")
print_bitmap("Display power", cit.system_data.bitmaps.display_power)
print_bitmap("Display request change", cit.system_data.bitmaps.display_request_change)
print_bitmap("Input", cit.system_data.bitmaps.input)
print_bitmap("Input (touch)", cit.system_data.bitmaps.input_touch)
print_bitmap("Unknown", cit.system_data.bitmaps.unknown)
print_bitmap("Foreground", cit.system_data.bitmaps.foreground)
print(" Span stats:")
print_span_stats(cit.system_data.span_stats)
print(" Stats:")
print_stats(cit.system_data.stats)
print()
for i, entry in enumerate(cit.entries):
print(f"Entry {i}:")
print(" File path:", entry.file_path)
print(" Command line:", entry.command_line)
print(" PE TimeDateStamp", datetime.fromtimestamp(entry.program_data.PeTimeDateStamp, tz=timezone.utc))
print(" PE CheckSum", hex(entry.program_data.PeCheckSum))
print(" Extra 3:", entry.program_data.Extra3)
print(" Bitmaps:")
print_bitmap("Foreground", entry.use_data.bitmaps.foreground)
print(" Span stats:")
print_span_stats(entry.use_data.span_stats)
print(" Stats:")
print_stats(entry.use_data.stats)
print()
if __name__ == "__main__":
main()

文章来源: https://research.nccgroup.com/2022/04/12/a-brief-look-at-windows-telemetry-cit-aka-customer-interaction-tracker/
如有侵权请联系:admin#unsafe.sh