Catching sophisticated bots requires all kinds of signals—from behavioral signals, to proxy detection, to client-side fingerprints.
Indeed, as sophisticated bots leverage proxies, mimic human behavior, and attempt to forge several fingerprinting attributes, it’s important to have redundancy and exhaustivity in the signals collected to ensure all bots are detected.
When it comes to client-side browser fingerprinting, DataDome collects signals in 3 different ways:
In these various components, we collect different kinds of browser fingerprint signals discussed in other blog posts, ranging from:
While some APIs provide information about the OS and the environment the browser is running on, bot developers often modify these values to appear more human. Thus, a bot running on a Linux virtual machine may lie about its OS to pretend it’s running on a Windows machine. They may not even lie about the OS string alone, but about other attributes you’d expect to go with a particular OS, such as the type of GPU.
To avoid relying on static APIs returning information about the OS, researchers have come up with ways to ask a browser to execute a JS challenge, which can help determine the nature of its environment. When these tests aim to detect virtual machines, they’re named red pills (in reference to The Matrix movie).
In this blog post, we present how DataDome leverages Picasso, an approach originally conceived by Google, in our CAPTCHA and Device Check to detect bots lying about their environment.
Picasso is a device class fingerprinting protocol that enables a server to verify whether or not a device is lying about its browser, OS, or its environment in general.
Usually, when we refer to an approach like browser fingerprinting, a fingerprint is a combination of attributes that is—more or less—unique and stable, and can help identify an individual. In the case of device fingerprinting, the goal is not to identify a single individual, but instead to identify a class of devices. In the case of Picasso, we aim to identify classes defined by the nature of their browser (Chrome, Firefox, Safari) and their OS (Windows, Linux, Mac, iOS, Android).
To do that, Picasso leverages the HTML canvas API, and in particular the graphic rendering system (GPU). The server sends a proof of work challenge to an untrusted user whom we want to verify the nature of the device. The Picasso challenge then captures the entropy induced by a device’s underlying hardware.
The reason Picasso succeeds in identifying the type of OS and browser lies in the incidental yet stable pixel rendering differences across devices, due to their inherent features—both physical (graphical hardware) and software (graphical drivers, operating system)—which makes this type of fingerprinting possible (Figure 1).
In other words, the output of a web browser graphics, such as HTML5 canvas, depends on different layers, from hardware (GPU), to lower level software (GPU driver, OS rendering), to higher level software (browser and library provided graphics API). This makes an HTML5 canvas output—for an exact same set of instructions—highly unique per OS/browser, and allows accurate differentiation between them (Figure 2).