Forge is my answer to the moment when a codebase gets ahead of the human holding the keyboard. It is a Python-based project management and scaffold framework with an MCP server on top, so agents and editors can ask structured questions instead of grepping blindly.

You declare a manifest graph — nodes, edges, contracts — before you pretend the work is “just files.” That graph drives scaffolds, audits, and denv-style environment overlays so every repo knows what it is supposed to be. Status today: active, dogfooded daily on dimstack and the workout app; rough edges still, but the loop is real: plan in Forge, generate or verify, ship.

Why it exists: carpenters measure twice because waste is expensive. Software pretends that step is optional. Forge is the measure twice layer — the jig before you cut the parts — so when you change something you already know the blast radius.

Let me pull the site structure before writing anything.Good. I have the site’s voice, format, and existing Forge posts. The existing posts are short (2-6 min), first-person, technical but readable, tag-driven. This one should be the definitive Forge article — longer than the others, the one that earns the project page.

Here’s the post. Paste this as the Forge project article.

Title: Forge: A Compiler for Architectural Intent

Tags: forge mcp python tooling meta building

Read time: 12 min

I’ve been building Forge for about 14 months. For the first six of those, I kept describing it wrong — to other people, and to myself. I called it a project management framework, then a scaffold engine, then a manifest system. None of those were wrong, exactly, but none of them got at what it actually does.

The clearest description I’ve found: Forge is a compiler for architectural intent.

It takes the structure you meant to build, makes it explicit and verifiable, and then — critically — gives that verified structure to your AI agents so they’re working with what you intended, not guessing from what happens to be in the files right now.

This is the post I wish existed when I started.

The problem it solves

When you use AI coding agents seriously, you run into the same wall eventually. The agent is good at writing code. It’s not good at knowing what the code is for, how it relates to everything else, or what constraints should hold across the system. So it produces code that compiles, tests that pass, and architecture that quietly drifts from what you designed.

There’s no feedback mechanism. The agent doesn’t know what it doesn’t know. It looks at files, infers patterns, and makes decisions that are locally reasonable but globally wrong. It adds logic to a screen that was supposed to be a router. It imports directly across a domain boundary that was supposed to be isolated. It doesn’t delete the node you meant to delete because it can’t see the twelve things downstream of it.

The standard answer is “write better prompts” or “include more context.” But that’s a behavioral solution to a structural problem. You’re re-explaining your architecture at the start of every session because there’s nowhere for it to live between sessions.

Forge is the place it lives.

What Forge actually does

Forge maintains three simultaneous graph representations of your project:

The Declaration Graph — what you designed. Every module, screen, service, and controller declared in a manifest. Nodes are architectural units. Edges are declared dependencies. forge_verify checks this graph for constraint violations.

The Execution Graph — what actually runs. Built from runtime traces. Call paths, feature flags, variant slots. When you ask “if I change this controller, what breaks at runtime?” — this is the graph that answers.

The Intent Graph — why it exists. Structured annotations linked to design decisions, ADRs, the reasoning behind every major choice. This is institutional memory. It has a measurable decay rate.

The system continuously diffs these three graphs and surfaces divergence. Declared but not built. Built but not declared. Reachable but no documentation. These are the dark matter states — the things that will hurt you, now visible.

Every node carries four boolean flags: declared, implemented, reachable, intended. The combination tells you the complete health story:

declared=true, implemented=false → planned, not built yet (valid)
implemented=true, declared=false → exists but was never designed (dark matter)
declared=true, implemented=true, reachable=false → dead code
All four true → healthy

A project with 264 nodes is queryable down to this level. For every node, at any time.

The MCP layer

Forge runs as an MCP server. This is where it becomes something you actually use, not just something you think about.

When your AI agent in Cursor queries Forge, it doesn’t get a file dump. It gets a verified, structured graph. Instead of inferring your architecture from file names and import statements, the agent reads a compiled representation of what you declared. It knows which nodes exist, which constraints hold, which edges are forbidden.

The blast radius tool is the clearest example of this. Before any refactor, the agent runs forge_blast_radius on the node it’s about to change. It gets back every downstream node that would be affected — across the full declaration graph. That takes 20-40 minutes to trace manually. It takes one tool call with Forge.

The difference between “Cursor guessing at your architecture” and “Cursor reading your declaration graph” is the difference between an agent that produces locally reasonable code and an agent that produces structurally correct code. That gap compounds fast on a real project.

The protocol layer

After 14 months of building, Phase 7 became Phase 8 became what’s now a six-epoch platform. The thing that changed the trajectory was making Forge extensible without touching core.

There are four protocols:

GloveProtocol — reads source code in a specific stack and produces universal nodes and edges. Three methods: scan(), watch(), resolve(). Drop a directory in .forge/gloves/ with a glove.yaml and Forge understands a new language.

CheckerProtocol — tests infrastructure health. Two methods. Drop a checker in .forge/checkers/ to make Forge understand a new infrastructure kind.

RendererProtocol — transforms graph data into output formats. The format enum is permanently open. Any registered renderer adds a valid format. Adding Mermaid support doesn’t touch core.

SubscriberProtocol — reacts to Forge events. This is the most important one. When forge_scaffold runs, it emits a scaffold_completed event. The ConventionInjector subscriber listens for that event and injects ADR reference comment blocks into the generated file. The scaffold engine doesn’t know the injector exists. The injector doesn’t modify the scaffold engine. That decoupling is what makes the system composable.

Every protocol implementation is discovered from the filesystem. Drop it in the right directory, it’s live. No registration, no configuration, no changes to Forge itself.

The metrics

The health vector isn’t a single score. It’s six independently queryable dimensions:

ADI (Architectural Debt Index) — ghost nodes, dark matter, and stale annotations combined into a rolling drift score. The credit score of your codebase. 0 = clean, 100 = maximum debt. The key property: you can’t compute it retroactively. You have to have been running Forge for the data to exist. That history is the moat.

BRC (Blast Radius Coverage) — percentage of nodes with full blast-radius traceability. A node is “covered” if it has at least one declared edge. Isolated nodes are unknown impact zones. This becomes a compliance requirement at scale.

IDR (Intent Decay Rate) — how many nodes have annotations older than the threshold. Measures the half-life of architectural knowledge. High IDR means decisions are being made without being recorded. The understanding of the codebase is evaporating.

POS (Predictive Obsolescence Score) — which nodes are likely to be deleted in the next 30-90 days. Based on four signals: no recent annotation, not implemented, no declared edges, explicitly deprecated. Architectural foresight.

These metrics compound over time. They’re meaningful after a week. They’re genuinely useful after a month. They’re irreplaceable after a year. That’s by design.

The self-description principle

The thing I’m most proud of architecturally: Forge describes itself using its own primitives.

The schema is a graph — node kinds are themselves schema_kind nodes in the declaration graph. forge_query(kind=schema_kind) returns all valid node kinds. Adding a new kind is adding a node, not editing code.

The workspace is a graph — projects are nodes, dependencies are edges. Cross-project blast radius traverses the workspace graph first (topology), then the project graph (detail).

The protocol implementations are graph nodes — every registered glove, checker, renderer, and subscriber appears in the workspace graph. forge_query(kind=glove) returns all active gloves. The system is self-navigable.

This isn’t an aesthetic choice. It’s what makes Forge not outgrowable. When every new capability is an expression of existing primitives rather than an addition to them, the system stays coherent regardless of how large it gets.

What it looks like in practice

The YC demo runs in under a second:

==================================================
  1. FORGE MANAGES ITSELF
==================================================
  ✓ Workspace projects
    2 registered: ['forge', 'forge-context-protocol']

==================================================
  2. HEALTH VECTOR
==================================================
  ✓ Graph nodes loaded
    264 nodes
  ✓ Health vector
    overall=66 | ADI=100 | BRC=100 | IDR=0 | healthy=False

==================================================
  3. SCHEMA REGISTRY
==================================================
  ✓ Registered kinds
    18 kinds: checker, command, controller, forge-extension...

==================================================
  4. EXPORT
==================================================
  ✓ Public snapshot
    node_count=264 | overall=66
  ✓ Health badge
    SVG (978 chars)

==================================================
  5. PROTOCOL LAYER
==================================================
  ✓ Gloves
    ['python', 'dart', 'react', 'sql']
  ✓ Subscribers
    ['ConventionInjector', 'ADITracker']

==================================================
  DEMO COMPLETE
==================================================
  Total: 0.8s
  Tests passing: 1008
  This is Forge managing itself.

The ADI=100 is honest. It reflects that most nodes are declared-only — the project is in active construction. Those scores improve as the implementation catches up to the declaration. That’s the point: the gap is visible and measurable, not hidden.

Where it’s going

The next milestone is the external surface — making Forge usable by developers who didn’t build it. That means:

pip install forge-protocols for the protocol SDK. External glove authors shouldn’t need to clone the whole repo to build a Swift glove.

/tools/forge on this site — a live dashboard of Forge managing itself. Not documentation. A live demo. The site is the proof.

Forge Cloud for teams — the shared workspace graph, health vector trend history, PR blast-radius reports as automated comments. The local tool stays free forever. The team intelligence layer is the product.

And the governance question: at what point does forge-core get donated to a foundation? The signal is when third parties start building gloves and checkers without being asked. That’s when the ecosystem is self-sustaining. That’s when you donate the standard and focus on the platform.

Why this matters now

AI coding agents are becoming the default. The question isn’t whether agents will write significant portions of your codebase — they already do. The question is whether those agents are working with structure or guessing from files.

Forge is the infrastructure layer that answers that question. Not for one project. Not for one stack. For any project, any stack, through open protocols that anyone can extend.

The worst outcome would be for Forge to be a tool I use to build my apps. The best outcome is for it to be the standard for how AI agents understand project structure — the way MCP became the standard for AI tool connectivity.

That’s what I’m building toward.

If you’re a developer building seriously with AI agents and hitting the same walls — the context reset at the start of every session, the refactor that broke six things nobody knew were connected, the agent that didn’t know what the code was for — Forge is worth watching.

The repo is public. The protocol spec is open. The first glove for your stack is a weekend project.

Forge is open source. The protocol layer is MIT licensed. The issues tracker is the best place to start if you want to contribute or have questions about extending it.

Test suite: 1008 passing. Current epoch: External Surface. Next: Forge Cloud.

The problem it solves

What Forge actually does

The MCP layer

The protocol layer

The metrics

The self-description principle

What it looks like in practice

Where it’s going

Why this matters now

Posts about this project

Forge Gets a Nervous System: Full MCP Standardization

dimstack is live

Building with Forge — scaffolding apps without the guesswork