Show HN: m(ctf)p – A semi-automated environment for solving CTF challenges

4 months ago 7

#m(ctf)p

m(ctf)p is a semi-automated system for solving capture the flag (CTF) challenges. It has two main parts:

  • mctfp-server - The MCP server used for integrating with the CTFd environment
  • A Kali Linux-based environment - Basically a Kali Linux Docker image with Claude Code, as well as some scripts for spinning up new ones quickly.

Here's an example of download, solving, and submitting a simple challenge in a recent CTF:

Asciinema recording of the end-to-end flow, from creating the Docker container instance, to submitting the correct solution

#How well does it work?

Decently well! It can zero-shot simpler challenges (e.g. /attempt_challenge <id>, wait a bit + confirm prompts, challenge turns green on CTFd), and even when it goes totally haywire, it generally figures out at least a few pieces of useful information along the way, and you can pick up from there, then feed that back in later, and so on.

Generally, I think the instructions/Docker image can be improved quite a bit, to do things like:

  • Know that PyCryptodome is imported with import Cryptodome
  • Tell it to not flail helplessly and attempt to bruteforce things with generic passwords
    • hashcat and john can be used when cracking is actually the solution, but it's usually pretty clear when something is supposed to be cracked vs not
    • It tried this on basically every challenge when it ran out of ideas, probably spent $10 in tokens on these silly dead-ends
  • Tune how it uses the notes.txt files
    • It wasn't very effective with these, need to be more specific in what types of things it should record
  • Actually have some headless decompiler installed
    • It was totally unable to use Ghidra headless
    • It did a pretty decent job using radare2 non-interactively, but usually got the first few invocations wrong

#The MCP Server

#Current Tools

  • CTFd (via their API)
    • Full Swagger definitions, but we only want
      • Loading challenges: GET /challenges and GET /challenges/{challenge_id}
        • And probably GET /challenges/{challenge_id}/files for downloading files
      • Submitting answers POST /challenges/attempt
        • Body should be {"submission": "<flag value>", "challenge_id": <the challenge id>}
  • Notes
    • Mostly just a persisted scratchpad for the model to think in
  • VirusTotal (via their API)

#Usage

Build the Docker image/environment with docker build -t ctf .. If you change the tag, make sure to update it in scripts/new_env.sh as well.

Copy scripts/env.sh.example to scripts/env.sh and replace all the TODOs with the relevant secrets using whatever secret mechanism you use. I usually use pass

Then, run ./scripts/new_env.sh <env name> [opus] and you'll be dropped into a Dockerized Kali environment with Claude installed. Run claude to open Claude Code, hit Enter three times to get through the prompts, then run /attempt_challenge <challenge_id> to have Claude download and work on the challenge.

#TODO

  • [ ] Figure out what other tools should be added
    • Both via API to the MCP server, and in the Docker image
  • [ ] Test out the VirusTotal API integration
    • Make sure the API integration looks good first
  • [x] Add the ability to choose the strong/weak models when starting a new env
  • [x] Test it with Claude
    • And CTFd
  • [x] Make a Docker container
    • Should use Kali as a base
    • Add in Claude Code (npm) + the MCP server (build the binary?)
  • [x] Add scripts and tooling to create new parallel invocations
  • [x] Figure out how/where human + Claude notes should go
    • But maybe write to a file, they'll get lost as is
  • [x] Make sure Claude has access to a Python environment
    • And give it common libraries that make sense
    • Basically make it CyberChef
    • And give instructions for how to use it
  • [x] Add a custom slash command for starting a challenge
    • Tell it to download from CTFd
    • Tell it to go as far as it can without human intervention
  • [x] Read this Claude Code: Best practices for agentic coding
    • Make sure we understand how hierarchical CLAUDE.md files work
    • Make sure we understand slash commands
    • Look for other tips and tricks (e.g. running in Docker, MCP configs, etc)
Read Entire Article