Repo rule-files standards for AI Agents: chaos or convergence?

10 hours ago 1

Over the past year every major AI player has slipped a rules or memory file into its workflow: Cursor has .cursor/rules, Windsurf has .windsurf/rules, GitHub is testing copilot-instructions.md, and indie tools are inventing their own paths just to keep up.

Different names for the same idea: a repo-local file that tells the agent how to behave.

LLM agents tend to forget faster than they learn, and shoving repeat-worthy facts, coding style, or project policy into every prompt wastes context and burns tokens. A single, discoverable file lets the tool ingest persistent instructions on load, then prepend or retrieve them as needed. Think of it as a lightweight, git-versioned knowledge base on plain text.

These files also solve privacy and workflow friction in one shot. Preferences stay with the codebase rather than a cloud dashboard; collaborators audit changes through normal pull requests; CI can lint them to keep the schema honest.

Result: consistent agent behaviour across machines without leaking private prompts or relying on yet another SaaS knob.

Directory of markdown files called .cursor/rules; every open tab gets these lines prepended. Older single-file form is .cursorrules.

Each rule file is written in MDC (.mdc), a lightweight format that supports metadata and content in a single file. Rules supports the following types:

Always: Always included in the model context.
Auto Attached: Included when files matching a glob pattern are referenced.
Agent Requested: Rule is available to the AI, which decides whether to include it. Must provide a description.
ManualOnly: included when explicitly mentioned using @ruleName.

Official docs can be found here.

The file global_rules.md applies to all workspaces. The directory .windsurf/rules stores repo-specific rules. There’s no format as such, the rules are plain text, although XML can be used:

<coding_guidelines> - My project's programming language is python - Use early returns when possible - Always add documentation when creating new functions and classes </coding_guidelines>

Similar to MDC, there are several activation modes:

Manual: This rule can be manually activated via @mention in Cascade’s input box.

Always On: This rule will always be applied.

Model Decision: Based on a natural language description of the rule the user defines, the model decides whether to apply the rule.

Glob: Based on the glob pattern that the user defines (e.g. .js, src/**/.ts), this rule will be applied to all files that match the pattern.

Official docs can be found here, and some examples live in the Windsurf rules directory.

The docs don’t specify this anymore, since the link is broken, but there’s a file called sweep.yaml which is the main config. Among other options, such as blocking directories, you can define rules there.

There’s an example in the GitHub repo and it’s widely commented in their Discord server.

The .clinerules/ directory stores a set of plain text constraint files with the desired policies. The files support simple section headers (## guidelines, ## forbidden) and key-value overrides (max_tokens=4096).

For projects with multiple contexts, they provide the option of a bank of rules.

Official docs can be found here.

They use CLAUDE.md, an informal markdown Anthropic convention. There are two flavours: at repo root for project-specific instructions, and at ~/.claude/CLAUDE.md for user preferences for all projects. It is also possible to reference other markdown files:

See @README for project overview and @package.json for available npm commands for this project. # Additional Instructions - git workflow @docs/git-instructions.md

Anything inside the file or the extended paths is auto-prepended when you chat with Claude Code.

Official docs can be found here.

Amp has publicly stated they want AGENT.md to become the standard, and they offer a converter from other vendor’s files.

Amp now looks in the AGENT.md file at the root of your project for guidance on project structure, build & test steps, conventions, and avoiding common mistakes.

Amp will offer to generate this file by reading your project and other agents' files (.cursorrules, .cursor/rules, .windsurfrules, .clinerules, CLAUDE.md, and .github/copilot-instructions.md).

We chose AGENT.md as a naming standard to avoid the proliferation of agent-specific files in your repositories. We hope other agents will follow this convention.

Currently they provide a single file, although they’re working on adding support for a more granular guidance.

Plain markdown file .github/copilot-instructions.md: repo-level custom instructions. Once saved it is instantly available to Copilot Chat & inline chat.

Official docs are here. Note that the only stable version is the VSCode one; any other states that “this feature is currently in public preview and is subject to change”.

This one’s tricky because Autogen is not quite like the other tools here. However, you can define rules for a CodeExecutorAgent using the attribute system_message:

system_message (str, optional) – The system message for the model. If provided, it will be prepended to the messages in the model context when making an inference. Set to None to disable. Defaults to DEFAULT_SYSTEM_MESSAGE. This is only used if model_client is provided.

The default message can be found here:

DEFAULT_SYSTEM_MESSAGE = 'You are a Code Execution Agent. Your role is to generate and execute Python code based on user instructions, ensuring correctness, efficiency, and minimal errors. Handle edge cases gracefully.'

Based on the documentation, you can define general rules in a few ways:

In Playbooks, you can create a "Forbidden Actions" section that lists actions Devin should not take, like:

## Forbidden Actions - Do NOT touch any Kotlin code - Do NOT push directly to the main branch - Do NOT work on the main branch - Do NOT commit changes to yarn.lock or package-lock.json unless explicitly asked

It is also possible to add rules to Devin's Knowledge in Settings > Devin's Settings > Knowledge that will persist across all future sessions and can be pinned.

Not currently supported as per this Reddit thread.

Not currently supported but working on it, as per this Discord comment.

There are of course other options; to each its own. A quick search in GitHub or Google shows tons of different JSON manifests holding tool lists, memory knobs, and model params ("reflection": true, "vector_db": "chroma").

Format varies by project; should be treated as project-specific until a real spec lands.

Format: Nearly every vendor settled on plain-text—usually Markdown—because it travels well through Git: you can review changes in a pull-request, leave line comments, and blame regressions with git log or similar. Nothing proprietary, no binary diffs.
Location: Files live either in the repository root (CLAUDE.md, AGENT.md) or in a predictable sub-directory such as .cursor/ or .windsurf/. That convention lets CI pipelines locate them automatically for linting, signing, or policy scans without extra configuration.
Semantics: At their core the files carry “always-obey” instructions—style guides, security constraints, onboarding blurbs—plus a thin layer of optional metadata (tags:, scope:, enabled:). They are meant to be read-only for the human contributor; the agent simply ingests and obeys.
Runtime use: The ingestion path is straightforward: the tool reads the file, chunks it to stay under token limits, and prepends the text to each prompt (or attaches it selectively based on scope rules). In the end it’s just deterministic context injection, without any fine tuning.

File-discovery rules: Some tools look for an exact filename; others accept a glob pattern such as .cursor/rules/*.mdc. That choice determines whether you can organise rules into multiple files or must keep a single monolith.
Single file vs directory bundle: Cursor and Windsurf allow a directory of rule files, enabling granular enable/disable flags. Copilot and Claude expect one file, which simplifies discovery but can become unwieldy in large projects.
Metadata schema: YAML front-matter (---\nname: foo) gives you typed keys but requires a parser; inline markers (## Always) are human-friendly but less precise; some formats include no metadata at all and rely on naming conventions. Mixing styles complicates automated linting.
Scope controls: A few systems apply rules to the entire repository. Others let you bind rules to specific file patterns (*.py, docs/**) or even to individual tools. Scope flexibility is powerful but introduces fragmentation when each vendor invents its own key names (scope, globs, paths).

xkcd predicted this:

Interoperability. A common schema would let any agent-aware tool read the same file without adapters, the way every editor understands README.md or package.json. One set of linters, one mental model.
Ecosystem leverage. Shared conventions invite third-party addons—VS Code extensions, CI checks, security scanners—that work everywhere instead of being vendor-locked.
Security and auditability. A predictable location plus a typed schema makes it easier to diff, sign, or lint your long-term context. Random plaintext blobs are harder to monitor.

The surface hasn’t settled. Some files store static rules, others capture evolving memory, others mix both. Locking that into a single spec today may freeze a half-baked design.
The key point: translation is trivial. These files are plain text. Converting .cursor/rules to CLAUDE.md is a regex, not a moon shot. Adapters cost far less than committees.
Innovation tax. A standard invites gatekeeping and slows down wild experiments, which is exactly what we all hate in the wild agent world.

Treat “standards” as interfaces, not treaties. Keep experimenting, but converge on a minimal front-matter block—name, scope, version, enabled—that any tool can ignore or honor. The body stays free-form Markdown. When the ecosystem naturally coalesces around a few patterns, formalize the rest. Until then, thin adapters beat thick bureaucracies.

Pick one filename and stick to it. Your tooling can translate later, but human contributors need a single place to look. .cursor/rules or AGENT.md, or something else, doesn’t matter; choose and document it.
Keep the schema thin. Plain markdown body plus a short YAML-front-matter header. Avoid exotic keys and avoid nested data.
Generate, don’t hand-edit. Write a tiny script that assembles the file from smaller snippets: style guide, onboarding blurb, current TODOs. This enforces consistency and keeps merge conflicts low.
Lint in CI. A simple regex or json-schema check prevents malformed front-matter and keeps the agent from choking on bad tokens.
Treat it like code, not config. Commit every change, add PR context, and tag releases when the ruleset materially shifts. The git history becomes your agent’s audit log.
Encrypt secrets elsewhere. These files live in the repo; don’t embed API keys or proprietary data. If the model needs them, pull at runtime from a vault.

Follow those steps and you’ll get 90 % of the “standard” benefits today without waiting for the industry to agree on a file extension.

The current sprawl looks chaotic, but under the hood these files do the same two things: stash static instructions and feed them to a prompt. Translation is easy: just a regex, not a standards body. Spend energy on writing good rules, not lobbying for a universal filename; once the ecosystem stabilizes, a shared manifest (think package.json for agents) will emerge naturally.

Until then, thin adapters beat committees.