Network Scanner script to automate Adblock rules

3 hours ago 1

A Puppeteer-based tool for scanning websites to find third-party (or optionally first-party) network requests matching specified patterns, and generate Adblock-formatted rules.

Scan websites and detect matching third-party or first-party resources
Output Adblock-formatted blocking rules
Support for multiple filters per site
Grouped titles (! ) before site matches
Ignore unwanted domains (global and per-site)
Block unwanted domains during scan (simulate adblock)
Support Chrome, Firefox, Safari user agents (desktop or mobile)
Advanced fingerprint spoofing and referrer header simulation
Delay, timeout, reload options per site
Verbose and debug modes
Dump matched full URLs into matched_urls.log
Save output in normal Adblock format or localhost (127.0.0.1/0.0.0.0)
Subdomain handling (collapse to root or full subdomain)
Optionally match only first-party, third-party, or both
Enhanced redirect handling with JavaScript and meta refresh detection

Argument Description

-o, --output <file>	Output file for rules. If omitted, prints to console
--compare <file>	Remove rules that already exist in this file before output
--color, --colour	Enable colored console output for status messages
--append	Append new rules to output file instead of overwriting (requires -o)

Argument Description

--localhost[=IP]	Output as IP domain.com (default: 127.0.0.1)
	Examples: --localhost, --localhost=0.0.0.0, --localhost=192.168.1.1
--plain	Output just domains (no adblock formatting)
--dnsmasq	Output as local=/domain.com/ (dnsmasq format)
--dnsmasq-old	Output as server=/domain.com/ (dnsmasq old format)
--unbound	Output as local-zone: "domain.com." always_null (unbound format)
--privoxy	Output as { +block } .domain.com (Privoxy format)
--pihole	Output as (^\|\\.)domain\\.com$ (Pi-hole regex format)
--adblock-rules	Generate adblock filter rules with resource type modifiers (requires -o)

Argument Description

--verbose	Force verbose mode globally
--debug	Force debug mode globally
--silent	Suppress normal console logs
--titles	Add ! <url> title before each site's group
--dumpurls	Dump matched URLs into matched_urls.log
--remove-tempfiles	Remove Chrome/Puppeteer temporary files before exit
--compress-logs	Compress log files with gzip (requires --dumpurls)
--sub-domains	Output full subdomains instead of collapsing to root
--no-interact	Disable page interactions globally
--custom-json <file>	Use a custom config JSON file instead of config.json
--headful	Launch browser with GUI (not headless)
--cdp	Enable Chrome DevTools Protocol logging (now per-page if enabled)
--remove-dupes	Remove duplicate domains from output (only with -o)
--dry-run	Console output only: show matching regex, titles, whois/dig/searchstring results, and adblock rules
--eval-on-doc	Globally enable evaluateOnNewDocument() for Fetch/XHR interception
--help, -h	Show this help menu
--version	Show script version
--max-concurrent <number>	Maximum concurrent site processing (1-50, overrides config/default)
--cleanup-interval <number>	Browser restart interval in URLs processed (1-1000, overrides config/default)

Argument Description

--cache-requests	Cache HTTP requests to avoid re-requesting same URLs within scan
--validate-config	Validate config.json file and exit
--validate-rules [file]	Validate rule file format (uses --output/--compare files if no file specified)
--clean-rules [file]	Clean rule files by removing invalid lines and optionally duplicates (uses --output/--compare files if no file specified)
--test-validation	Run domain validation tests and exit
--clear-cache	Clear persistent cache before scanning (improves fresh start performance)
--ignore-cache	Bypass all smart caching functionality during scanning

Example:

{ "ignoreDomains": [ "googleapis.com", "googletagmanager.com" ], "sites": [ { "url": "https://example.com/", "userAgent": "chrome", "filterRegex": "ads|analytics", "resourceTypes": ["script", "xhr", "image"], "reload": 2, "delay": 5000, "timeout": 30000, "verbose": 1, "debug": 1, "interact": true, "fingerprint_protection": "random", "referrer_headers": { "mode": "random_search", "search_terms": ["example reviews", "best deals"] }, "custom_headers": { "X-Custom-Header": "value" }, "firstParty": 0, "thirdParty": 1, "subDomains": 0, "blocked": [ "googletagmanager.com", ".*tracking.*" ] } ] }