Show HN: m(ctf)p – A semi-automated environment for solving CTF challenges

4 months ago 7

#m(ctf)p

m(ctf)p is a semi-automated system for solving capture the flag (CTF) challenges. It has two main parts:

mctfp-server - The MCP server used for integrating with the CTFd environment
- It's written in Go and uses the mark3labs/mcp-go library
A Kali Linux-based environment - Basically a Kali Linux Docker image with Claude Code, as well as some scripts for spinning up new ones quickly.

Here's an example of download, solving, and submitting a simple challenge in a recent CTF:

Asciinema recording of the end-to-end flow, from creating the Docker container instance, to submitting the correct solution

#How well does it work?

Decently well! It can zero-shot simpler challenges (e.g. /attempt_challenge <id>, wait a bit + confirm prompts, challenge turns green on CTFd), and even when it goes totally haywire, it generally figures out at least a few pieces of useful information along the way, and you can pick up from there, then feed that back in later, and so on.

Generally, I think the instructions/Docker image can be improved quite a bit, to do things like:

Know that PyCryptodome is imported with import Cryptodome
Tell it to not flail helplessly and attempt to bruteforce things with generic passwords
- hashcat and john can be used when cracking is actually the solution, but it's usually pretty clear when something is supposed to be cracked vs not
- It tried this on basically every challenge when it ran out of ideas, probably spent $10 in tokens on these silly dead-ends
Tune how it uses the notes.txt files
- It wasn't very effective with these, need to be more specific in what types of things it should record
Actually have some headless decompiler installed
- It was totally unable to use Ghidra headless
- It did a pretty decent job using radare2 non-interactively, but usually got the first few invocations wrong

#The MCP Server

#Current Tools

CTFd (via their API)
- Full Swagger definitions, but we only want
  - Loading challenges: GET /challenges and GET /challenges/{challenge_id}
    - And probably GET /challenges/{challenge_id}/files for downloading files
  - Submitting answers POST /challenges/attempt
    - Body should be {"submission": "<flag value>", "challenge_id": <the challenge id>}
Notes
- Mostly just a persisted scratchpad for the model to think in
VirusTotal (via their API)

#Usage

Build the Docker image/environment with docker build -t ctf .. If you change the tag, make sure to update it in scripts/new_env.sh as well.

Copy scripts/env.sh.example to scripts/env.sh and replace all the TODOs with the relevant secrets using whatever secret mechanism you use. I usually use pass

Then, run ./scripts/new_env.sh <env name> [opus] and you'll be dropped into a Dockerized Kali environment with Claude installed. Run claude to open Claude Code, hit Enter three times to get through the prompts, then run /attempt_challenge <challenge_id> to have Claude download and work on the challenge.

#TODO

[ ] Figure out what other tools should be added
- Both via API to the MCP server, and in the Docker image
[ ] Test out the VirusTotal API integration
- Make sure the API integration looks good first
[x] Add the ability to choose the strong/weak models when starting a new env
[x] Test it with Claude
- And CTFd
[x] Make a Docker container
- Should use Kali as a base
- Add in Claude Code (npm) + the MCP server (build the binary?)
[x] Add scripts and tooling to create new parallel invocations
[x] Figure out how/where human + Claude notes should go
- But maybe write to a file, they'll get lost as is
[x] Make sure Claude has access to a Python environment
- And give it common libraries that make sense
- Basically make it CyberChef
- And give instructions for how to use it
[x] Add a custom slash command for starting a challenge
- Tell it to download from CTFd
- Tell it to go as far as it can without human intervention
[x] Read this Claude Code: Best practices for agentic coding
- Make sure we understand how hierarchical CLAUDE.md files work
- Make sure we understand slash commands
- Look for other tips and tricks (e.g. running in Docker, MCP configs, etc)