mcpai-toolingforgeprotocol-designdeveloper-tools

What Happens When You Actually Implement the MCP Protocol Instead of Wiring Around It


Most MCP server implementations I’ve seen treat the protocol like a thin wrapper around function calls. You define tools, you handle requests, you return text. The resources and prompts surfaces exist, but they’re either empty or filled with static content nobody fetches.

That’s not a criticism — it’s how you start. It’s also where we started with Forge, an internal meta-framework and MCP server I’ve been building for managing multiple software projects. What happened when we actually committed to the full protocol spec is the subject of this post.

The problem with tools doing everything

Forge exposes tools for graph-based context retrieval, manifest-driven scaffolding, audit annotation, and session lifecycle management. For a long time, every tool call was self-contained: the tool resolved its own context, injected what it needed into the response, and returned a text blob that the model parsed.

The architectural inversion was subtle but costly. forge_orient — the session start tool — was returning 400-600 tokens of context on every call because it was pushing the context into the conversation. The model would read it, narrate it back, and proceed. Every scaffold session started with this ritual regardless of whether the model actually needed the full context or just the next action.

This is the difference between a librarian reading you the book versus pointing you at the shelf. Both work. One scales.

Resources as lazy context delivery

The MCP spec’s resource layer exists precisely for this. A resource has a URI, a MIME type, and content. The client fetches it when it needs it — not when the server decides to push it.

Forge now has 12 concrete resources and 6 URI templates:

forge://project/{id}/context      — live project state
forge://project/{id}/manifest     — declaration graph
forge://project/{id}/annotations  — persisted findings
forge://glove/{kind}/{stack}      — stack adapter contracts
forge://schema/manifest/v1        — manifest node schema
forge://schema/errors             — typed error registry
forge://schema/taxonomy           — full primitive hierarchy
forge://pipeline/{id}             — executable pipeline descriptors
forge://mcp/surface               — server self-description
forge://mcp/roots                 — workspace boundaries

The key change: tools now return resource_link objects pointing at these URIs instead of inlining the full content. forge_orient(response_tier=L0) returns a 20-token status payload with a context_link field. If the model needs more, it fetches forge://project/forge/context. If it doesn’t, it didn’t pay for tokens it won’t use.

The session bootstrap went from prompt → tool call → 600 token injection to prompt → resource fetch → proceed. The tool call step was eliminated entirely.

Prompts as behavior contracts, not instruction sequences

This is where the most significant conceptual shift happened.

The original Forge prompt set had 7 prompts. Each was an instruction sequence: “Call forge_orient first. Then read the glove resource. Then assemble the packet. Then confirm with the user. Then scaffold.” Step by step. The model was being used as a process executor by describing the process in prose.

The problem: prose instructions drift. They get interpreted differently in different contexts. They require the model to remember the sequence. They don’t compose.

We rebuilt the prompt set to 2 prompts:

  • forge/start — session contract declaration with embedded resource reference
  • forge/audit — audit posture with pipeline pointer

Neither contains step lists. Both contain a session contract (tier defaults, policy mode, active _meta channels, hidden tools) and a pointer to the relevant pipeline resource. The behavior text for forge/start with a resolved project path looks like:

Session context loaded above. No need to call forge_orient.
Session contract: response_tier=L0, policy_mode=advisory, dry_run=true.
Pipeline execution available via forge://pipeline/{id}.
Fetch the descriptor and walk its steps.

The session bootstrap is now: fetch forge/start with project_path → resource block for forge://project/{id}/context embeds automatically → behavior text declares the contract. One operation. The model has full context without a single tool call.

Pipelines as first-class protocol objects

This is the piece that took longest to design correctly.

Forge has workflows that span multiple tools: scaffold requires orient → contract → packet assembly → confirmation → file emission → verification. The naive approach encodes this sequence in prompt instructions or hardcodes it in a handler that calls other handlers.

Both approaches have the same failure mode: the sequence is opaque and non-composable. You can’t inspect it. You can’t modify it without changing code. You can’t query which pipelines reference a given tool.

The solution was PipelineDescriptor — a typed schema stored as a YAML resource under forge://pipeline/{id}:

pipeline_id: scaffold
steps:
  - step: orient
    kind: resource
    resource: "forge://project/{project_id}/context"
    required: true

  - step: contract
    kind: resource
    resource: "forge://glove/{node_kind}/{stack}"
    input_map:
      node_kind: "intent.inferred_nodes[0].kind"
      stack: "init.detected_stack"

  - step: confirm
    kind: elicitation
    elicitation_schema: ScaffoldConfirmation
    gate: true

  - step: emit
    kind: tool
    tool: forge_scaffold
    input_map:
      intent_id: "declare_intent.record_id"

  - step: write_output
    kind: fs_write
    tool: fs_write_file
    input_map:
      path: "emit.output_path"
      content: "emit.file_content"

  - step: verify
    kind: tool
    tool: forge_verify
    input_map:
      project_path: "context.project_path"

The input_map field is what makes this composable. It’s C/Java-style parameter composition: fs_write_file(path=forge_scaffold(...).output_path, content=forge_scaffold(...).file_content). The output of step N is automatically piped as input to step N+1 without a human relay step.

The PipelineExecutor worker walks this descriptor, resolves the input maps, fetches resources lazily, invokes tools at emit steps, and pauses at gate: true steps for elicitation. The pipeline is data. The executor is the interpreter. Changing a pipeline means editing a YAML file, not touching handler code.

The taxonomy: primitives all the way down

One structural decision that proved unexpectedly valuable was formalizing a NodeTier system for every primitive in the framework:

TierWhat it is
surfaceMCP-visible: tools, resources, prompts
workerInternal pipeline helpers, zero surface cost
gloveStack adapters implementing interface contracts
gateElicitation/confirmation boundary schemas
schemaTyped contracts: Pydantic models, YAML schemas
registryCentral lookup tables
pipelineExecutable step descriptors

Every manifest node carries its tier. The hierarchy is declared — glove is a subtype of worker, registry is a subtype of schema — so forge_query(tier=worker) returns workers, gloves, and gates without listing them explicitly.

What this unlocks: type-first search. Instead of “find forge/core/impact.py and read its edges,” you ask “show me all workers in CORE sorted by criticality” and get the full set with their contracts. No prior knowledge of internals required.

The same taxonomy powers the tier constraint rule system. Surface nodes may import surface, schema, and worker nodes — but a worker directly importing a surface node is a structural violation caught by forge verify at exit_code=2. Architecture constraints are enforced by the graph, not by convention.

_meta channels: out-of-band signals that don’t bloat context

The MCP spec supports metadata channels that travel alongside tool responses without entering the model’s context window as injected text. Forge uses these for:

  • _meta.hints on orient — mirrors suggested_workflow for client consumption
  • _meta.resource_links on 5 tools — points at the relevant forge:// URI for lazy fetch
  • _meta.annotations on scan/verify — auto-annotated finding counts
  • _meta.ui on mutation tools — typed UIComponent payloads for elicitation rendering

The _meta.ui channel is worth calling out specifically. Instead of elicitation being a bare string prompt, it’s a UIComponent(kind=CONFIRMATION, title=..., data=..., actions=[...], requires_response=True). The same typed component that a future dashboard would render is what drives the mid-session confirmation gate. One schema, multiple rendering targets.

The error system: typed, graph-linked, debug-gated

The original MCP server had a common failure mode: any unhandled exception in a handler would produce a generic “Tool execution failed” string. No code, no context, no clue what broke.

The replacement is ForgeError — a typed schema with code, domain, message, node_id, dependency_ids, source_tool, check_class, and context. Every handler is wrapped in _wrap_mcp_tool_handler which catches unhandled exceptions and returns ForgeError(code="ERR-001", source_tool=<tool>, context={"exception": str(e)}). The context field only populates when debug=True in session policy — no accidental token waste in normal operation.

The node_id and dependency_ids fields are what make errors graph-linked. When forge_verify emits VERIFY-001 for a broken node, it populates node_id with the failing node’s manifest ID and dependency_ids with its direct incoming edges. The error isn’t just “something broke” — it’s “this node broke, these dependencies contributed.” The full impact chain is traversable from the error object.

The error registry itself is a forge://schema/errors resource. There’s no ERROR_CODES.md. The registry IS the documentation, and it’s always current because it’s code.

What 254 nodes and 583 edges looks like in practice

The forge manifest currently tracks 254 nodes across 13 domains with 583 declared edges. Every new file added to the codebase gets a manifest node with tier inferred from kind — no manual specification required. New gloves slot in by implementing the IStackGlove interface and registering at forge://glove/{kind}/{stack}. New pipelines are YAML files that immediately appear at forge://pipeline/{id}.

The graph is verifiable at any point: forge_verify runs parity checks, tier constraint rules, VERIFY-002 for unresolved proposed dependencies, and VERIFY-003 for unresolved proposed contracts. Exit code 0 means the architecture is structurally consistent. The codebase enforces its own design.

The thing that took the longest to internalize: the MCP protocol primitives — tools, resources, prompts, elicitation, progressToken, roots — aren’t wrappers around function calls. They’re a vocabulary for a specific kind of human-AI collaboration where context is expensive, mutation is risky, and the system should be able to describe itself without a human intermediary. Forge is still getting there. But the scaffolding is now correct.


Forge is an open development meta-framework. The MCP server, glove system, and pipeline executor are all part of the same codebase. If you’re building something similar or have thoughts on the protocol design decisions described here, I’m interested to hear them.

← All notes