Forge Gets a Nervous System: Full MCP Standardization

There’s a difference between a tool that works and a tool that’s correct. For the past several months, Forge’s MCP server has been in the first category — functional, useful in daily Cursor sessions, but built on improvised substrate rather than proper protocol contracts. Today was the day I closed that gap.

What Forge is (quick version)

Forge is a Python-based project compiler I built to manage all my projects from one place. It maintains a declaration graph of every module, controller, screen, and service across a portfolio of apps, verifies that graph against the actual codebase, and exposes everything through an MCP server so AI agents in Cursor can reason about project structure without needing to read source files directly. It’s the jig before you start cutting parts.

What today was actually about

The MCP protocol has three native primitives: tools (callable functions), resources (addressable data), and prompts (behavior contracts). Forge had built its own versions of all three — tools doing resource work, prompts that were execution scripts instead of contracts, resources declared but never actually fetched. Today’s session was about peeling off the improvised layer and connecting properly to what the protocol already provides.

The analogy that kept coming up: Forge had built its own water system, electrical grid, and road network — and then discovered the municipal infrastructure was there the whole time, waiting for connection.

What actually got built

A canonical schema layer. forge/core/schemas.py is new — a single file of Pydantic models that define the shape of everything: tool return types, session state, annotation extensions, intent records, UI components for elicitation, progress events. Before today, these shapes lived implicitly in handler code and CLI output format strings. Now they’re typed contracts.

Protocol-compliant tools. All 14 visible tools got outputSchema declarations and ToolAnnotations (readOnly, destructive, idempotent hints). The MCP client now knows which tools are safe to call autonomously and which require human confirmation before proceeding. This is the protocol-native version of the hidden tool mechanism that existed before.

Dynamic prompt hydration. This one mattered most for token efficiency. The forge/session_start prompt used to tell the model to call forge_orient as its first step — a tool call that injects context into the conversation. Now, when you provide a project path, the prompt renders with a live resource URI embedded directly. The client fetches forge://project/forge/context at render time. No tool call. No context injection. The session bootstrap went from a two-step chain to a single operation.

Session lifecycle typing. ScratchStore is a real class now, with initialize() on session start and clear() on handoff. Session state is a Pydantic model written to .forge/session.json. Per-project scratch files with proper lifecycle rather than a global accumulating file.

Intent graph activated. .forge/intent/ directory structure is live. forge_intent is a new visible tool with an elicitation gate — when you add an intent record, the tool pauses and asks you to confirm the description before writing it to the graph. When a scaffold completes on a node that has a matching intent record, the tool asks whether the implementation satisfied the intent. If not, it files a risk annotation automatically.

Unified _meta channel. MetaChannel in the server wrapper now handles _meta injection consistently across all tool responses — hints from orient, resource links from scan and scaffold, logging behind the debug flag. Before, three tools had manually wired _meta channels; now it’s ambient.

The debugging that took most of the day

I’ll be honest: about half of today was spent on a cascading series of bugs introduced by the changes themselves.

The MCP spec (2025-06-18+) requires that any tool declaring outputSchema must return structuredContent in every response. Forge’s tools declare outputSchema now — but many of them wrap CLI subprocesses whose output isn’t always valid JSON. The make_tool_result() helper parses CLI stdout as JSON to populate structuredContent, but if parsing fails, structuredContent is None, and compliant MCP clients throw RuntimeError: Tool has an output schema but did not return structured content.

That manifested as forge_annotate, forge_audit_scan, and forge_impact_check silently returning “tool execution failed” — while forge_orient, forge_manifest, and forge_session worked fine. The split was exact and reproducible across four MCP restarts.

The fix was Option A: remove outputSchema from tools whose CLI output isn’t guaranteed JSON. The Pydantic models stay — they’re correct. The protocol declaration comes back once the CLI commands get reliable --format json modes. That’s the restore sequence tracked in the annotation system.

The other bug that surprised me: OutputResolutionPolicy was injecting dry_run, policy_mode, strict, and debug into every handler’s merged args — but CLI tools have different accepted flags. forge_impact_check was crashing with --dry-run: No such option. The fix is an explicit allowlist in each handler of which CLI flags it actually accepts. forge_impact_check has it now; 13 others don’t yet.

Both of these are the same underlying issue: untyped boundaries between the MCP layer and the CLI layer. The Pydantic schema layer exists now, but the handlers aren’t fully wired to use it as their interface contract. That’s the real debt item — a ForgeToolResponse typed union that every handler returns and every server wrapper operates on, eliminating duck-typing entirely. It’s tracked. It’ll get fixed.

Where Forge stands now

Graph: 238 nodes, 558 edges, 12 domains. Structural health: clean. All tools working as verified by live MCP protocol test and confirmed in my Claude session after the final restart.

On main. That matters — this was all on a feature branch for too long. The lesson: merge after every session that reaches a working state, not at the end of a multi-day effort. Branches should live for hours.

What’s next

The execution graph feedback loop is still the biggest open item — the trace→graph→verify cycle isn’t closed yet. forge_trace writes to .forge/traces/ but the execution graph doesn’t read it back. That’s Phase 4 work.

After that: getting the 8 tools that lost outputSchema back to reliable structured output (add --format json to their CLI commands), wiring the ForgeToolResponse typed union, and starting forge-denv Phase 1 so the HTTP API surface for the local dashboard exists.

Today was the infrastructure session. The kind where nothing visible ships but the foundation gets right. Those days are necessary, even when they’re mostly debugging.