Breaking Down Datadome Captcha WAF

3 hours ago 1

glizzykingdreko

Let’s walk through, step by step, the latest Datadome Captcha/Geetest WAF to understand how to access sites protected by it.

Below is the response you’ll see when a request is blocked by the Datadome Captcha WAF while scraping a protected site:

<html lang="en">
<head>
<title>datadome.co</title>
<style>
...
</style>
</head>
<body style="margin:0">
<p id="cmsg">Please enable JS and disable any ad blocker</p>
<script data-cfasync="false">
var dd = {
'rt': 'c',
'cid': 'AHrlqAAAAAMAseGpalIFikoAyV2i2w==',
'hsh': '14D062F60A4BDE8CE8647DFC720349',
't': 'fe',
'qp': '',
's': 44330,
'e': 'a1dea4f437d7f7234cf4ab0017373492e7438d099a0632bc3dd4ac787bf0ffa3',
'host': 'geo.captcha-delivery.com',
'cookie': 'Y1XRfldQv9mO1z2TBDqSmbztAi_H6BR04bz14NTujEm~6G_7X6eHyn9i71KhHxsv86WUIZGokIBSyX9bulEARTINIkeWdhjKGUu4Rm7pkbnPpTfSwfC4ntcQPF6rcA5z'
}
</script>
<script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script>
</body>
</html>

Using the returned dm dictionary, the linked JavaScript generates the actual challenge URL and injects it as an iframe on the page. Once the iframe is embedded, you’ll either encounter the captcha to solve or a “You have been blocked” message. This means one of two things:

  • Your IP address is flagged as bad.
  • Your request flow or headers deviate from a normal browser flow, triggering Datadome’s bot detection.

To avoid this, always test your proxies in a real browser first, to confirm they can solve the challenge and navigate the site, and mirror the exact request sequence and headers from that browser session in your script.

The request

After solving the challenge, the following GET request is sent:

Upon successful solving, a Datadome cookie is returned. Most of the values in this request are simply parsed from the page. The only important element is the ddCaptchaEncodedPayload, which we will examine and break down later in this article. It is essentially a dictionary containing various details, referred to as signals.

“It’s not just about human or bot — it’s about intent. Our AI detects malicious vs. legitimate intent in real time, whether it is from a bot or a human.”

Their company calls itself “the leader in cyberfraud protection” and boasts blocking “billions” of attacks by processing over a trillion signals per day across 25+ global points of presence. It focuses on its high-speed AI engine: for example, an April 2025 release claims it can “identify, categorize, adapt, and respond to traffic in less than 2 milliseconds.”

“Most bot detection tools rely on static, rule-based logic that can’t keep pace with emerging threats. DataDome’s AI engine is different. It continuously learns and adapts in real time using both supervised and unsupervised models, …”

Marketing collateral often contrasts this with legacy defenses; blogs claim rule-based WAFs are “antiquated” and “no match for advanced bots.”
In one blog, DataDome even calls itself “the only bot and online fraud protection solution delivered as a service.”

Their CEO claims the machine-learning engine “assesses every request” instead of relying on static rules (Source: stechcrunch.com).
They also assert they prevent content theft, account takeovers, ad fraud, and DDoS “in real time.”

Before we begin our deep analysis, let’s keep in mind that this company has been funded with a total of $81 million.

This WAF page was first introduced around five years ago during the “golden age of sneaker games.” When it appeared on sites like Slamjam and Starcow, the only way to solve it was via reCAPTCHA.

It was later replaced with a Geetest implementation, and then with a custom Geetest-style slide captcha or an audio challenge.
The audio option has never been ideal, as cookies obtained that way often receive a lower trust score, triggering a second challenge or a quick block.

Currently, Datadome is still used by some retailers, most of the European football teams’ purchasing websites, and some credit-card gift-card websites. Here are some examples:

And the list goes on…

nside the iframe page, you’ll find a long, minified line containing the captcha’s actual JavaScript.

When beautified, this file reveals a fairly simple obfuscation built on array- and function-based techniques.

It’s quite lengthy because it actually consists of eight distinct modules/files, as shown in the attached image.

Each file serves a specific, organized purpose and contains its own set of functions.

Loop-Switch Constructs Obfuscation

One noteworthy aspect is the unique way they “obfuscate” for-loops and switch statements. It’s entirely pointless and trivial to reverse, but I hadn’t seen this technique before.

What it does
Instead of writing:

case 23

the code builds a two-dimensional lookup table s and drives the switch state via s[x][y], so every case looks like:

case s[263][471]:

Where the table generation function looks like this:


// dynamic table function generation
var s = function (e, t) {
var a, n;
for (t = [], e = 0; e < 128; e++) t[e] = new Array(512);
for (a = 0; a < 512; a++) for (n = 0; n < 128; n++) t[n][a] = t[Ze(128, a, 337, 163, 349, 30, n)];
return t[30];
}();

// and his helper
function Ze(e, t, a, n, c, i, r) {
return (t * c ^ r * n ^ i * a) >>> 0 & e - 1;
}

Inspecting the table at runtime
If you log s[yyy][zzz] in your browser console, you won’t get a number, you’ll see an Array object (one row of that lookup matrix).

Why it’s a dumb obfuscation
By running the same table-generator and mapping functions in a sandbox (for example, Node’s vm), you can reconstruct the entire lookup table, defeating the obfuscation.

// 1. Spin up a sandbox and rebuild the exact same table
const vm = require('vm');
const context = vm.createContext({});

vm.runInContext(`
// dynamic table generation and his helper
`, context);

// 2. Build a reverse map from each row object back to its [x,y] indices
const s = context.s;
const rowMap = new Map();
s.forEach((row, x) => {
row.forEach((cell, y) => {
// we key off the *exact* Array instance stored at s[x][y]
rowMap.set(cell, [x, y]);
});
});

// 3. Helper to recover the original numeric state
function stateLocation(x, y) {
return rowMap.get(s[x][y]);
}

// 4. Proof it works:
console.log(stateLocation(263, 471)); // e.g. 42

Cleaning it up
A simple AST traversal based on this logic can replace each case s[x][y] with the original numeric literal, restoring a normal switch statement.

Quick and easy, but a nice case study.

It’s pretty strange that a company founded with more than $80 million in funding still hasn’t introduced a fully dynamic script for each session, like most of the “big” antibot providers currently do.

Instead, what they’ve done is implement a daily rotation of the files, meaning the iframe’s script changes every day at a specific time.

This rotation was introduced about one year ago, unless there’s been a major update or patch, the version number remains the same, with only a couple of static IDs modified to identify the exact challenge being solved.

Dynamic signals names

A few months ago, a major change was introduced — first on the Captcha challenge, then on the interstitial as well: dynamic keys for the signals. Every day, the keys in the signals dictionary change to a random six-character string. If you don’t match them correctly, the solving will be invalid.

new dynamic key system

This has been a clever technique to block most solving API providers, but someone could have stored all of the daily scripts “just to be sure” over the past five years. They could then perform a quick semantic search to identify renamed signals, new or removed entries, or other modifications. :P

some Datadome files stored

Currently, this dynamic-key mechanism has knocked most providers out of the game or forced them to switch to browser-based solutions, which in my opinion are the worst way to deliver a service due to:

  • Slow solving times and heavy resource usage
  • Lock-in to specific user agents, browsers, OS versions, timestamps, or fingerprints, forcing clients to match exactly
  • Developers often having no clue what the antibot checks, making debugging or fixing site-specific issues difficult
  • The need for new browser patches on every challenge update, resulting in instability for long-term or critical projects

All of this stems from not having figured out a scalable way to scrape and identify signal names on a daily basis.

WASM Challenge: boring_challenge

extracted call of the “boring_challenge” from original script

Recently, a WASM-based challenge was introduced as part of one of the signals. The main function, called boring_challenge, essentially forces your browser to run a tiny Rust-compiled state machine repeatedly until it spits out a number.

  1. Base64-decode a compact Wasm blob and synchronously compile it into a WebAssembly.Module.
  2. Set up the wasm-bindgen imports (grow an externref table with [undefined, null, true, false]).
  3. Instantiate the module (which zeroes some globals via its __wbindgen_start).
  4. Pick a random 32-bit seed (between 10 million and 20 million) and a concurrency hint (your CPU core count).
  5. Call boring_challenge(BigInt(seed), BigInt(concurrency)), which jumps into a massive, nested loop of bit-twiddling, XORs, shifts, rotates and magic constants—all hard-coded in an obfuscated lookup-table state machine.
  6. Exit only when that state machine finally reaches a terminal value, returning a 64-bit result that your script converts back to a Number.

There’s no real “captcha” here, just pure proof-of-work. The server can cheaply re-invoke the same function on its side to verify you did the grind, but your browser must burn CPU cycles to solve it.

decompiled .wat file

It’s trivial to reverse (we just decompile and run it ourselves). It is basically more a is more of a “CPU tax” than a real fingerprint, maybe introduced to create problems ot headless browser (I assume ?).

You can learn more about it here into my open-sourced repo

datadome-wasm

Dynamic hash challenge

After the WASM introduction, the latest “feature” introduced is a dynamic hash challenge, where some browser details, already provided in other signals, are assembled into a list and then hashed using a specific dynamic challenge hash.

The resulting list then feeds into a dynamic chain of operations that looks like this:

((((((inputValue[0] >>> 0 ^ 555683) >>> 0 >> 4 >>> 0)
+ (((956305 & inputValue[1] >>> 0) >>> 0 & (inputValue[2] >>> 0)
+ (inputValue[1] >>> 0) >>> 0) >>> 0) >>> 0)
+ ((2488139776 ^ (inputValue[1] >>> 0 << 1 >>> 0)
+ (inputValue[0] >>> 0 >> 6 >>> 0) >>> 0) >>> 0) >>> 0
^ (((699654 + (inputValue[0] >>> 0) >>> 0) - 1111820 >>> 0)
- ((968080 - (inputValue[0] >>> 0) >>> 0 ^ 864233) >>> 0) >>> 0)
- ((((792526 & inputValue[2] >>> 0) >>> 0
| (298058 & inputValue[1] >>> 0) >>> 0) >>> 0)
- (((inputValue[1] >>> 0) + (inputValue[1] >>> 0) >>> 0
^ 622410 - (inputValue[1] >>> 0) >>> 0) >>> 0) >>> 0) >>> 0) >>> 0
& ((11187856 + (410903 + (inputValue[1] >>> 0) >>> 0) >>> 0)
+ (((inputValue[0] >>> 0 | 0) >>> 0 ^ 706164
- (inputValue[1] >>> 0) >>> 0) >>> 0) >>> 0 << 9 >>> 0)
- (((441672 - (inputValue[2] >>> 0) >>> 0)
+ ((inputValue[0] >>> 0) - (inputValue[2] >>> 0) >>> 0) >>> 0)
+ (((inputValue[1] >>> 0) - 617963 >>> 0)
+ ((inputValue[2] >>> 0) - 798365 >>> 0) >>> 0) >>> 0 >> 8 >>> 0) >>> 0) >>> 0
^ 351641146) >>> 0

It’s an interesting approach, but unfortunately it’s quite easy to identify and extract, and because it relies solely on math-based calculations, a simple evaluation suffices.

extracted dynamic challenges from different files using a simple babel traverse

Integrity checks of the script

To this day, no integrity checks have been added, making it easy to patch for debugging purposes.

Understanding more about the versions changes

For learning purposes, check out my now-outdated, open-sourced deobfuscation modules for DataDome (both Interstitial and Captcha):

displayed challenge captcha

The images returned by the Datadome Captcha are quite easy and quick to solve using just a few filters and mathematical operations. I’ve already open-sourced a still-working base Python script to get started here: Datadome-GeeTest-Captcha-Solver

solved challenge captcha

Movements computation

The actual key challenge is ensuring the provided movements match what DataDome expects.
DataDome analyzes movements in two lists

  • _initialCoordsList which captures movements from page load to the slide-button click
  • _coordsList which contains the slide movements

These two lists are then computed into 31 signals, based on curvature, length, straightness, and other metrics, to flag abnormal inputs.

some computed movements signals

Honestly, their checks are strict and effective at stopping most browser-based automated scripts, though a good ML model fed with plenty of data can still do the trick :P.

To help with this part of the process, I open-sourced Datadome-Movements-Display which visualizes the raw movement lists before computation, letting you compare your generated patterns against genuine browser data.

When adding a signal to the final payload, a custom encoding method is used based on the website hash.

e(`THcQWT`, i.left)
e(`ds6frg`, i.right)
e(`zzUlTA`, i.up)
e(`uJkDwZ`, i.down);
encryption process

Initialization
The system is seeded with a hash, a client identifier (cid), and an optional salt. These parameters initialize a pseudo-random number generator (PRNG) that drives subsequent steps.

Buffer Construction
Each signal (key-value pair) is obfuscated and XOR-ed with PRNG-generated bytes. Marker bytes ({, }) and separators (:) are also XOR-obfuscated and appended to delineate structure.

XOR w/PNRG 2nd Round
The entire buffer undergoes an additional XOR pass using a second PRNG sequence derived from the cid and salt. This dual-layer approach ensures each byte’s final value depends intricately on multiple dynamic parameters.

Custom Base64-like Encoding
After XOR, the buffer is encoded into a string using a custom Base64-like algorithm. Additional XOR operations with a decrementing salt value are applied. A unique character mapping guarantees URL-safe transmission, further complicating decryption.

The decryption process simply reverses these steps once you understand them.

Without going too deep into technical detail, so as not to lose uninterested readers, I’ve open-sourced a deep dive into the encryption and decryption logic here, both for NodeJS and Python:

Let’s walk through the process step by step. I’ll assume you’re using a requests-based approach, because a browser-based solution requires constant patching for each update, high resource usage, and offers little control over what’s happening.

  1. Identify and construct the challenge URL from the initial response.
  2. Load the challenge page.
  3. Parse the required details from the loaded challenge.
  4. Extract, sort, and interpret the dynamic keys.
  5. Solve the image challenge.
  6. Assemble your signals dictionary, browser details, solved challenges, and computed movements.
  7. Encrypt the payload.
  8. Send it to the DataDome endpoint.
  9. Retrieve the cookie (if the solving was successful).
  10. Reload the target page.

In conclusion, this was just a case study. I’m not saying that DataDome is a bad antibot, some of their techniques are unique and interesting, even if they’re easy to bypass, but rather highlighting how easy and “cheap” their WAF is, and raising questions about how the millions they’ve received from investors have been spent.

Perhaps they should remove the article How to Bypass DataDome (And Why It’s Not That Simple)

If you need a reliable DataDome bypass solution for your project, turn to the experts who truly understand the technology. My company, TakionAPI, offers professional anti-bot bypass APIs with proven effectiveness against DataDome and other bot-defense systems.

You can check and run the example file from the video here.

No more worrying about understanding, reversing, and solving the challenge yourself, or about keeping it up to date every day. One simple API call does it all.

We provide free trials, example implementations, and setup assistance to make the entire process easy and smooth. Check our straightforward documentation here, start your trial here, or for custom development and support, contact us on Discord.

Visit TakionAPI.tech for real, high-quality anti-bot bypass solutions, we know what we’re doing.

If you enjoyed this article, follow me on GitHub and on Medium to receive notifications whenever I post or open-source something.

Read Entire Article