Know before you scrape. Analyze any website's anti-bot protections in seconds.
Stop wasting hours building scrapers only to discover the site has Cloudflare + JavaScript rendering + CAPTCHA + rate limiting. caniscrape does reconnaissance upfront so you know exactly what you're dealing with before writing a single line of code.
caniscrape analyzes a URL and tells you:
- What protections are active (WAF, CAPTCHA, rate limits, TLS fingerprinting, honeypots)
- Difficulty score (0-10 scale: Easy → Very Hard)
- Specific recommendations on what tools/proxies you'll need
- Estimated complexity so you can decide: build it yourself or use a service
Required dependency:
Identifies Web Application Firewalls (Cloudflare, Akamai, Imperva, DataDome, PerimeterX, etc.)
- Tests with burst and sustained traffic patterns
- Detects HTTP 429s, timeouts, throttling, soft bans
- Determines blocking threshold (requests/min)
- Compares content with/without JS execution
- Detects SPAs (React, Vue, Angular)
- Calculates percentage of content missing without JS
- Scans for reCAPTCHA, hCaptcha, Cloudflare Turnstile
- Tests if CAPTCHA appears on load or after rate limiting
- Monitors network traffic for challenge endpoints
- Compares standard Python clients vs browser-like clients
- Detects if site blocks based on TLS handshake signatures
- Scans for invisible "honeypot" links (bot traps)
- Detects if site is monitoring mouse/scroll behavior
- Checks scraping permissions
- Extracts recommended crawl-delay
The tool calculates a 0-10 difficulty score based on:
| CAPTCHA on page load | +5 points |
| CAPTCHA after rate limit | +4 points |
| DataDome/PerimeterX WAF | +4 points |
| Akamai/Imperva WAF | +3 points |
| Aggressive rate limiting | +3 points |
| Cloudflare WAF | +2 points |
| Honeypot traps detected | +2 points |
| TLS fingerprinting active | +1 point |
Score interpretation:
- 0-2: Easy (basic scraping will work)
- 3-4: Medium (need some precautions)
- 5-7: Hard (requires advanced techniques)
- 8-10: Very Hard (consider using a service)
- Python 3.9+
- pip or pipx
Core dependencies (installed automatically):
- click - CLI framework
- rich - Terminal formatting
- aiohttp - Async HTTP requests
- beautifulsoup4 - HTML parsing
- playwright - Headless browser automation
- curl_cffi - Browser impersonation
External tools (install separately):
- wafw00f - WAF detection
- Before building a scraper: Check if it's even feasible
- Debugging scraper issues: Identify what protection broke your scraper
- Client estimates: Give accurate time/cost estimates for scraping projects
- Pipeline planning: Know what infrastructure you'll need (proxies, CAPTCHA solvers)
- Cost estimation: Calculate proxy/CAPTCHA costs before committing to a data source
- Site selection: Find the easiest data sources for your research
- Compliance: Check robots.txt before scraping
- Dynamic protections: Some sites only trigger defenses under specific conditions
- Behavioral AI: Advanced ML-based bot detection that adapts in real-time
- Account-based restrictions: Protections that only activate for logged-in users
- This tool is for reconnaissance only - it does not bypass protections
- Always respect robots.txt and terms of service
- Some sites may consider aggressive scanning hostile - use --find-all and --deep sparingly
- You are responsible for how you use this tool and any scrapers you build
- Analysis takes 30-60 seconds per URL
- Some checks require making multiple requests (may trigger rate limits)
- Results are a snapshot - protections can change over time
Found a bug? Have a feature request? Contributions are welcome!
- Fork the repo
- Create a feature branch (git checkout -b feature/amazing-feature)
- Commit your changes (git commit -m 'Add amazing feature')
- Push to the branch (git push origin feature/amazing-feature)
- Open a Pull Request
MIT License - see LICENSE file for details
Built on top of:
- wafw00f - WAF detection
- Playwright - Browser automation
- curl_cffi - Browser impersonation
Questions? Feedback? Open an issue on GitHub.
Remember: This tool tells you HOW HARD it will be to scrape. It doesn't do the scraping for you. Use it to make informed decisions before you start building.
.png)


