A Guest Post by Giorgos Karantzas and Professor Constantinos Patsakis
A few years ago, a vigilante hacker under the name “Phineas Phisher” conducted a series of high-profile attacks, including hacking into a company that, among others, was developing and selling spyware to government agencies named “Hacking Team”. This was not a result of a random attack but a well-planned and targeted one.
To achieve his goals, the hacker developed a 0-day for the SonicWall VPN appliance. After this attack, the attacker scanned the internet for such devices and found out that an offshore bank in the Cayman Islands was using the same vulnerable version. Beyond this exploit, he reported through his write-ups that he used common hacker utilities like Meterpreter and Empire and that he was not some kind of APT with custom malware writers nor did he receive significant funding and support; on the contrary, he claimed to be a humble ‘one-man army’.
The final goal of the bank hack was to access Bottomline’s SWIFT management panel and initiate transactions targeting his own accounts. Then, he uploaded the VMs used by the bank along with all the sensitive clients’ information that was stored in these systems.
The scenario is rather intriguing as, despite the impact and sensitivity of the information, it provides a deep insight into an environment in which few people operate. Moreover, such environments are not well publicly documented, and their digital twins are hard to find.
We argue that emulating such an attack scenario and adapting it to current tools and methods, offensive and defensive-wise, can provide a good baseline to understand the capabilities of both sides and stress the changes that have undergone these years. To this end, in our scenario, we have tried to follow the evolution in defensive and offensive security by rebuilding such an environment and equipping it with modern defense mechanisms.
Since most organizations are now integrating endpoint detection and response (EDR) systems to their endpoints to behaviorally detect and throttle cyber-attacks, we have equipped our endpoints accordingly. However, as shown in our previous research, EDRs are no silver bullets and have their weak points as well. In fact, Advanced Persistent Threat (APT) groups have significantly advanced their capabilities. Having access to several such defensive technologies, they study them and customize their malware accordingly to target them and minimize their detection. Moreover, APTs and ransomware groups use several C2 frameworks, with the most widely used being Cobalt Strike; however, there are different options that may provide different capabilities and serve fit better in the cyber kill chain.
Based on the above, this work can be considered a purple teaming scenario in the financial sector. Practically, we present the blue versus red team fight detailing, where possible, detection and bypass methods, their rationale, and gaps, where applicable, mainly through the use of C2 servers. We present in each step the attacker’s and defender’s perspectives of the same scenario. This means that we report by what means an EDR would report and/or block and how the attacker would try to prevent this.
Unlike common tests like MITRE ATT&CK, our threat actor is highly adapted to the target’s defenses as any serious actor would do instead of placing a generic baseline. Such an actor’s type of offensive security ranges from basic attacks and operations performed on the network to weaponizing a series of private toolkits, from the highest end and combat-proven solutions to lesser-known yet highly effective options.
The lab’s architecture did not emulate human traffic, thus denying the attacker typical places to hide and admittedly offering an advantage to EDR solutions as the samples would stand out easily. Detection engineering can be considered the art of avoiding false positives, and in these cases with low traffic, there are not many chances of blending in with regular user and applications traffic; therefore, we tried to mimic this as static principles to be followed in the environment we had.
Given this fact, we decided to constantly modify the policies ranging from production-ready to BETA features, pushing both the researchers conducting the experiment and the product, leading to highly sensitive discoveries that were kept private to ensure SentinelOne’s client safety. In close collaboration with the RnD department of SentinelOne, several bugs were reported and fixed as well as real-world bypasses and architectural blind spots.
We weaponized the statistically more probable way of entering an organization and exploited several post-foothold TTPs through a series of ways and frameworks enabling operational and scientific diversity.
We try to keep the scenario simplistic yet expose a complex mindset, ideas and tooling (at points); therefore, we consider the case of a financial institution that is based on a recreation of Sherwood’s target network; the infrastructure of Cayman Island National Bank and Trust that Phineas Phisher penetrated, reconfiguring it and extending it with new machines on a Hyper-V server. On the network level PfSense, Virtual Switches, RRAS and Squid were used to enforce network segmentation and security policies. Mainstream applications and banking related ones were installed on servers and workstations, a virtual Citrix XenApp instance was used and a virtual SWIFT secure zone was emulated as well using jump servers. Various versions of Windows that were production ready were used and several security features such as Credential Guard were employed on hosts to stage a scenario with a minimum level of operational sophistication and realism. In general, the network design allows us to demonstrate several privileged and unprivileged attack vectors by giving local admin access on some endpoints and allowing loading of kernel drivers and performing several other actions.
During this study we will go through several tools of various levels of sophistication. Of specific interest are the C2s that the attacker may use. To this end, we provide a brief overview of each one.
Brute Ratel C4, by Dark Vortex, is one of the most ambitious attempts we have seen in the industry. It is a low cost alternative to Cobalt Strike with less well-known indicators, more opsec and user friendly, as well as adaptable. It is maintained by a highly active developer with serious red teaming experience who is adding an increased number of features to the core.
Some of the highlights include the custom plugin called LDAP sentinel which can be used for enumeration, the customized reflective loader and the BOF files as well as the easily configurable TTPs ranging from the network communication to process injections and more.
BRC4 should be a solution capable of bringing all operations to a successful end.
Version 0.7 was the latest at the time of testing and version 0.8 was tested; however, new versions come out quickly making them hard to follow. After a comment from the developer, we replicated some tests with the latest version at the time, 0.9.
Notably, during our testing process we faced several limitations. The most important limitation being the form of delivery.
Before continuing we should make a note. BRC4 is a new product, this justifies some stability issues related to the shellcode upon execution that would lead to a “half-beacon” that crashes immediately after execution. We attempted several times to conduct each attack and we managed to successfully execute the beacon in both protected and unprotected machines. We noticed based on the frequency of the crashes combined with the creation of the werfault process and the fact that the badger was unusable that the crash of the badger was more probable to happen on VMs protected with an In-Process agent.
Cobalt Strike is the norm when it comes to C2 frameworks. It provides the core functionality needed to perform basic operations but is also fully extendable. We can see that a significant amount of the work is done by the community judging from the plug-ins in the forms of BOFs, Reflective DLLs, various kits etc. The latest version at the time of the writing of this section is 4.4 and it is the one we are using in our experiments. In this version, among others, a custom reflective loader capability is introduced, meaning you can replace the default one with your own, like boku7’s implementation. Moreover, you can use BOFs, pieces of code that will be executed inside the local process and avoid default fork and run behavior of the version.
Cobalt Strike is not that opsec safe anymore, yet you can always implement several security features on your own and embed them on your loader.
In our case, most of Cobalt Strike’s out-of-the-box features and kits will not be helpful as many detections nowadays are multi-layered and generic which means they try to target the very core of the threat and adapt to different usage scenarios from an attacker’s point of view. We will present a few POC detections from various commercial tools to support the fact that a wide range of IOCs exists, and the framework will have high chances of being detected at least at some point in a highly advanced network that will employ various defenses.
Therefore, a combination of TTPs may be needed including using other C2s, customizing tradecraft and tailoring Cobalt Strike both host wise and network wise to the target (e.g., sideload into teams and make traffic look like legit teams’) but retaining guardrails against both UM and KM based detections), something that could cost a large amount of time and effort while other tools could be simply more effective for specific tasks.
However, although some of the offensive coders out there are highly skilled, the stability of some publicly available tools is questionable as they are mostly created during someone’s free time and not the same way as a production software maintained by a company. However, the most important part is that Beacon will usually get detected one way or another even when some customization takes place.
Cobalt Strike in all our tests needed us to conduct monstrosities to be able to use it, usually after some tampering occurred based on privileged attacks or product-specific bugs which again did not guarantee success.
Havoc represents the category of malware that is not a commercial product; rather, it was developed by an aspiring, young security researcher who is still a student. We therefore wish to demystify this kind of tool and demonstrate the capabilities of non-corporate software developed with stealth and stability in mind.
Havoc was built targeting the vast majority of endpoint solutions and therefore it had a few IOCs by design. We assisted the developer to transition the software to a more suitable condition for this scenario by contributing slightly in the development process.
The network communication is performed through the TCP protocol’s sockets with AES encrypted content and capability of sleeps during which no command will be fetched.
Havoc has proven to be able to go through extremely hardened environments as long as the operators are willing to employ some basic op-sec techniques and modify the code base.
Nighthawk is a high-end C2 framework developed by MDSec designed by hardcore red teamers for hardcore red teamers with stealth, configurability, and feature richness in-mind. This toolkit is the tip of the spear for several reasons as we will discuss in our experiments and offers by design capabilities that would require serious amounts of work to adapt to existing tooling or even create custom tooling. As of this writing, MDSec has added even more features than those used in this work, including some stealthy injection chains that will bypass even ETWTi and spoofing strategies for mini-filter callbacks.
Nighthawk delivers a set of op-sec features which includes ROP-based system call unhooking and later on full DLL unhooking which comes “by-design” therefore it makes the operator’s life easier, it also includes other useful features like Thread Stack Spoofing and in-memory hiding via heap-based encryption as well as it usually avoids several tools (depending on the case) that will scrap through the memory of a process for abnormal indicators such as. The idea behind Nighthawk is being fully malleable which means you can control all the behavior of the C2 manually without writing any code. This enabled us to change the behavior of the implant during the operation according to our needs. This means among others, customized process injection methods, universal usage of system calls and network-callback related options to guarantee undercover beaconing.
At this moment Nighthawk was proven to be the most feature rich, stable and effective solution we used.
Oyabun is a newly created tool by Red Code labs with a somewhat more generic and limited scope when it comes to its usage. It is a multi-platform Golang based toolkit that has proven to be highly effective against modern defenses to pass some initial barriers. The more time passes, the more its instabilities reduce thanks to the development team’s commitment, although the design is solid from time-to-time discrepancies may occur.
In this case, we will not test a C2 framework such as Nighthawk, BRC4 or Cobalt Strike. Those aforementioned frameworks are armed with highly advanced features related to opsec and can be used to reach objectives at all stages of the intrusion.
There are, however, cases, that a specific malware strain will be deployed to deploy later those heavy weapons. This malware is called “stage-0″ as it is the first malware performing callbacks to a server controlled by the attacker to touch the victim network.
The main goals of the actor included:
The toolset that was used included both lesser-used and regarded techniques and ones that are extremely popular and trending but with high adaptations to the internal mechanisms of the defensive tools. Several malicious files including XLLs, MSIs, EXEs were used for initial access and even privileged attack packages impairing the defenses using exploitable drivers to inject into free AV software as it was one of the few ways to totally avoid the in-process client. DLLs with cloned exports were used for hijacking into legitimate applications our unstaged payloads in some cases.
SentinelOne demonstrated a capability of disallowing generic threats on-touch including Cobalt Strike and BRC4 in many different ways. SentinelOne’s traps sustained spoofing the format of the BRC4 code provided, including many different more “exotic” and customized loading types such Customized Phantom DLL Hollowing, commercial tools like Shellter Pro, MacroPack and more, something that surprised us. In the case of Nighthawk, malleability, bug fixes and delivery format enabled us to step on the network leaving the defenders valuable context of the attack.
Havoc needed customization to survive in the environment and our team collaborated with the developer leading upgrades to Havoc C2.
Oyabun was able to survive but only for a few limited actions and for a very specific format that could be targeted by Application Whitelisting.
Tools were deployed in various ways including reflection, heavily monitored, yet working PowerShell sessions and CLR loading but always under a safe context op-sec wise to cover beaconing and tool execution as much as possible. The researchers managed to circumvent all PIC-related mitigations and run malicious post-ex tools in various formats. Admittedly, those mitigation covered 90% of the attempts made and needed heavy customization.
Tampering occurred in many cases using exploitable drivers during the experiments but also we were able to find ways to completely disregard all user mode traps with several techniques one being NightHawk’s ROP based system call unhooking scheme and the other one being a bugs in the logics that our team discovered and reported.
Lateral movement included RDP, WMI, WinRM, an WebDAV based internal phishing toolkit called the Farmer and Credential Stealing Tool in PIC, PIC that was essential a PE-Loader Bootstrapped to the code such as sRDI and executable forms to collect credentials and of-course SOCKS proxying with an RDP client. When it came to credential dumping from processes like LSASS, we had to avoid all common techniques and had serious problems executing the attack that led us to exploit a bug in SentinelOne’s architecture to be able to have such an opportunity, something unlikely to happen in the wild.
The detection model employed limits usage of common evasion techniques and injections targeting both the technique and the shellcode itself, it took a significant amount of effort to bypass those guardrails in place place, yet again kernel-based traps caught the behavior of the PIC. At this place we revealed bugs that were immediately fixed, related to SentinelOne’s APC protections raising the bar for attackers again. Our experience taught us that such attempts are easy to complicate things with SentinelOne and we avoided risking our context.
Although ransomware was not the goal of such an attacker, known variants with customized loaders performing various injections with heavily modified payload format, including Ryuk, Ragnar and Babuk, were successfully mitigated, some at a ransomware and some at deployment levels.
Given the sensitivity of the tests, not much can be revealed to ensure clients are safe. SentinelOne’s team indeed took into account our proposals and our various bug and miss reports and during testing, the product was able to effectively tackle most of the generic attacks existing out there as “out-of-the-box” solutions (always depending on the policy). What SentinelOne does is push the attacker while giving the defender valuable context that could be later exploited using STAR rules to effectively track down specific attackers. That said, a holistic model that tackles at the right points the attacker is invaluable and something that can be achieved to some extent if proper configuration occurs. Having the chance to BETA test some of the upcoming features we can say that the upcoming stability could ease the defenders’ side by making the customization more and more necessary costing to the attackers a lot.
Nowadays, true value comes in information, and goal-driven intelligence has met an outstanding amount of development in recent years, especially following the advancements in personal, corporate, public, and critical infrastructure.
Threat actors may come from various backgrounds, being government-sponsored, self-motivated based on curiosity, financially motivated, hacktivists and more. However, the goals can be achieved through a common chain of actions. This chain is what modern security vendors try to tackle, producing useful alerts with minimal disruption and false positives. Defensive strategies may vary per organization and per product; however, certain limitations are posed by each operating system or network appliance and must be considered. Holistically covering as much attack surface as possible is the modern goal, increasing the chances of blocking the threat actor or revealing his existence at some point at a minimum.
An environment must always be re-adapted every few months/years to a certain security model on multiple levels, from very basic security-oriented network segmentation to constantly updating assets and ensuring proper privilege management. In this study, we will face at some point such issues on purpose.
The most important asset, however, is the endpoint, whether a server or a workstation; thus, the defensive focus is primarily oriented towards this direction, where even the internals of the operating system are at stake.
The motive is not to discourage people about EDRs but to show that silver bullets simply do not exist – by using the top-notch industry products to present attack-specific examples with as much transparency as possible – and that attackers can easily investigate and adapt to most defenses. Obviously, if computers are in closed networks without plugin capabilities for devices, they will be highly secure but also more difficult to function. Today’s security personnel try to find a golden mean between functionality, security and performance.
The main point of this research, however, is that no matter how well structured a defense is, a sophisticated actor or a group of people, each with its own specialization, will be able to penetrate even some of the most advanced and mature networks. This assumption is based on the level of motivation and the time to be spent on a single objective, as well as the sophistication of the attacker. The study depicts this situation as we go through a threat emulation scenario in our own lab, representing a sensitive and high-value target. We see through the attacker’s eyes and understand how to think, research, experiment and tackle objectives one after the other while being in a constant race with the blue team. The attacker’s mistakes give the blue team the next move and vice versa.
From the aforementioned statement on the criticality of the endpoints, we can easily suppose that tooling that interacts with such endpoints is of extreme importance to the attackers. This is another focus of this study, demonstrating state-of-the-art malware, some of which is not easily accessible. This tooling will ease the execution and provide us with stealth and post-exploitation capabilities to spread across the target network.
During this journey, we explored all parts of an intrusion that will be exposed to both blue and red teams, explaining what is executed TTP-wise, why it is executed, the background concepts, and the footprint left.