Skill Seeker is an automated tool that transforms documentation websites, GitHub repositories, and PDF files into production-ready Claude AI skills. Instead of manually reading and summarizing documentation, Skill Seeker:
✅ Large Documentation Support - Handle 10K-40K+ page docs with intelligent splitting
✅ Router/Hub Skills - Intelligent routing to specialized sub-skills
✅ Parallel Scraping - Process multiple skills simultaneously
✅ Checkpoint/Resume - Never lose progress on long scrapes
✅ Caching System - Scrape once, rebuild instantly
✅ Fully Tested - 299 tests with 100% pass rate
Option 1: Use from Claude Code (Recommended)
# One-time setup (5 minutes)
./setup_mcp.sh
# Then in Claude Code, just ask:"Generate a React skill from https://react.dev/""Scrape PDF at docs/manual.pdf and create skill"
Option 5: Unified Multi-Source Scraping (NEW - v2.0.0)
The Problem: Documentation and code often drift apart. Docs might be outdated, missing features that exist in code, or documenting features that were removed.
The Solution: Combine documentation + GitHub + PDF into one unified skill that shows BOTH what's documented AND what actually exists, with clear warnings about discrepancies.
✅ **Advantages:**
- **Identifies documentation gaps** - Find outdated or missing docs automatically
- **Catches code changes** - Know when APIs change without docs being updated
- **Single source of truth** - One skill showing intent (docs) AND reality (code)
- **Actionable insights** - Get suggestions for fixing each conflict
- **Development aid** - See what's actually in the codebase vs what's documented
**Example Unified Configs:**
- `configs/react_unified.json` - React docs + GitHub repo
- `configs/django_unified.json` - Django docs + GitHub repo
- `configs/fastapi_unified.json` - FastAPI docs + GitHub repo
**Full Guide:** See [docs/UNIFIED_SCRAPING.md](docs/UNIFIED_SCRAPING.md) for complete documentation.
## How It Works
```mermaid
graph LR
A[Documentation Website] --> B[Skill Seeker]
B --> C[Scraper]
B --> D[AI Enhancement]
B --> E[Packager]
C --> F[Organized References]
D --> F
F --> E
E --> G[Claude Skill .zip]
G --> H[Upload to Claude AI]
Detect llms.txt - Checks for llms-full.txt, llms.txt, llms-small.txt first
Scrape: Extracts all pages from documentation
Categorize: Organizes content into topics (API, guides, tutorials, etc.)
Enhance: AI analyzes docs and creates comprehensive SKILL.md with examples
Package: Bundles everything into a Claude-ready .zip file
Before you start, make sure you have:
Python 3.10 or higher - Download | Check: python3 --version
This guide walks you through EVERYTHING step-by-step (Python install, git clone, first skill creation).
Method 1: MCP Server for Claude Code (Easiest)
Use Skill Seeker directly from Claude Code with natural language!
# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# One-time setup (5 minutes)
./setup_mcp.sh
# Restart Claude Code, then just ask:
In Claude Code:
List all available configs
Generate config for Tailwind at https://tailwindcss.com/docs
Scrape docs using configs/react.json
Package skill at output/react/
Benefits:
✅ No manual CLI commands
✅ Natural language interface
✅ Integrated with your workflow
✅ 9 tools available instantly (includes automatic upload!)
# Clone repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers
# Create virtual environment
python3 -m venv venv
# Activate virtual environmentsource venv/bin/activate # macOS/Linux# OR on Windows: venv\Scripts\activate# Install dependencies
pip install requests beautifulsoup4 pytest
# Save dependencies
pip freeze > requirements.txt
# Optional: Install anthropic for API-based enhancement (not needed for LOCAL enhancement)# pip install anthropic
Always activate the virtual environment before using Skill Seeker:
source venv/bin/activate # Run this each time you start a new terminal session
# Make sure venv is activated (you should see (venv) in your prompt)source venv/bin/activate
# Optional: Estimate pages first (fast, 1-2 minutes)
python3 cli/estimate_pages.py configs/godot.json
# Use Godot preset
python3 cli/doc_scraper.py --config configs/godot.json
# Use React preset
python3 cli/doc_scraper.py --config configs/react.json
# See all presets
ls configs/
Once your skill is packaged, you need to upload it to Claude:
Option 1: Automatic Upload (API-based)
# Set your API key (one-time)export ANTHROPIC_API_KEY=sk-ant-...
# Package and upload automatically
python3 cli/package_skill.py output/react/ --upload
# OR upload existing .zip
python3 cli/upload_skill.py output/react.zip
# Package skill
python3 cli/package_skill.py output/react/
# This will:# 1. Create output/react.zip# 2. Open the output/ folder automatically# 3. Show upload instructions# Then manually upload:# - Go to https://claude.ai/skills# - Click "Upload Skill"# - Select output/react.zip# - Done!
Benefits:
✅ No API key needed
✅ Works for everyone
✅ Folder opens automatically
Option 3: Claude Code (MCP) - Smart & Automatic
In Claude Code, just ask:
"Package and upload the React skill"
# With API key set:
# - Packages the skill
# - Uploads to Claude automatically
# - Done! ✅
# Without API key:
# - Packages the skill
# - Shows where to find the .zip
# - Provides manual upload instructions
Benefits:
✅ Natural language
✅ Smart auto-detection (uploads if API key available)
python3 cli/doc_scraper.py --config configs/godot.json
# If data exists:
✓ Found existing data: 245 pages
Use existing data? (y/n): y
⏭️ Skipping scrape, using existing data
Automatic pattern extraction:
Extracts common code patterns from docs
Detects programming language
Creates quick reference with real examples
Smarter categorization with scoring
Enhanced SKILL.md:
Real code examples from documentation
Language-annotated code blocks
Common patterns section
Quick reference from actual usage examples
Automatically infers categories from:
URL structure
Page titles
Content keywords
With scoring for better accuracy
5. Code Language Detection
# Automatically detects:-Python (def, import, from)
-JavaScript (const, let, =>)
-GDScript (func, var, extends)
-C++ (#include, int main)-Andmore...
# Scrape once
python3 cli/doc_scraper.py --config configs/react.json
# Later, just rebuild (instant)
python3 cli/doc_scraper.py --config configs/react.json --skip-scrape
6. Async Mode for Faster Scraping (2-3x Speed!)
# Enable async mode with 8 workers (recommended for large docs)
python3 cli/doc_scraper.py --config configs/react.json --async --workers 8
# Small docs (~100-500 pages)
python3 cli/doc_scraper.py --config configs/mydocs.json --async --workers 4
# Large docs (2000+ pages) with no rate limiting
python3 cli/doc_scraper.py --config configs/largedocs.json --async --workers 8 --no-rate-limit
Performance Comparison:
Sync mode (threads): ~18 pages/sec, 120 MB memory
Async mode: ~55 pages/sec, 40 MB memory
Result: 3x faster, 66% less memory!
When to use:
✅ Large documentation (500+ pages)
✅ Network latency is high
✅ Memory is constrained
❌ Small docs (< 100 pages) - overhead not worth it
# Enable in config
{
"checkpoint": {
"enabled": true,
"interval": 1000 // Save every 1000 pages
}
}
# If scrape is interrupted (Ctrl+C or crash)
python3 cli/doc_scraper.py --config configs/godot.json --resume
# Resume from last checkpoint
✅ Resuming from checkpoint (12,450 pages scraped)
⏭️ Skipping 12,450 already-scraped pages
🔄 Continuing from where we left off...
# Start fresh (clear checkpoint)
python3 cli/doc_scraper.py --config configs/godot.json --fresh
Benefits:
✅ Auto-saves every 1000 pages (configurable)
✅ Saves on interruption (Ctrl+C)
✅ Resume with --resume flag
✅ Never lose hours of scraping progress
First Time (With Scraping + Enhancement)
# 1. Scrape + Build + AI Enhancement (LOCAL, no API key)
python3 cli/doc_scraper.py --config configs/godot.json --enhance-local
# 2. Wait for new terminal to close (enhancement completes)# Check the enhanced SKILL.md:
cat output/godot/SKILL.md
# 3. Package
python3 cli/package_skill.py output/godot/
# 4. Done! You have godot.zip with excellent SKILL.md
python3 cli/doc_scraper.py --interactive
# Follow prompts, it will create the config for you
# Copy a preset
cp configs/react.json configs/myframework.json
# Edit it
nano configs/myframework.json
# Use it
python3 cli/doc_scraper.py --config configs/myframework.json
# Test in Pythonfrombs4importBeautifulSoupimportrequestsurl="https://docs.example.com/page"soup=BeautifulSoup(requests.get(url).content, 'html.parser')
# Try different selectorsprint(soup.select_one('article'))
print(soup.select_one('main'))
print(soup.select_one('div[role="main"]'))
# After building, check:
cat output/godot/SKILL.md # Should have real examples
cat output/godot/references/index.md # Categories
Check your main_content selector
Try: article, main, div[role="main"]
Data Exists But Won't Use It?
# Force re-scrape
rm -rf output/myframework_data/
python3 cli/doc_scraper.py --config configs/myframework.json
Edit the config categories section with better keywords.
# Delete old data
rm -rf output/godot_data/
# Re-scrape
python3 cli/doc_scraper.py --config configs/godot.json