The Model Isn't the Risk. The Harness Is (Part 1): The Leak, the Context, and the Framework

Hero image generated by ChatGPT

This is a personal blog and all content herein is my personal opinion and not that of my employer.

The Procurement TL;DR

If you are buying or integrating an AI Agent platform today, ignore the “magic” demos. Ask these three questions:

How is my identity converted into a capability? (The Bridge)

Can the transport channel carry control commands? (The Control Plane)

What binds a session token to my specific device? (The Context)

This is Part 1 of a 3-part series.

Part	Title
Part 1 (this post)	The Leak, the Context, and the Framework
Part 2	Mapping the Trust Boundaries and the Attack Tree
Part 3	Defending Against Runtime Abuse

Prologue: this started with a leak

This analysis did not begin with a whitepaper, a vendor keynote, or a carefully sanded marketing diagram about “AI agents transforming productivity”. It started with an old and embarrassingly familiar class of failure: shipping more of the product than you intended to ship.

The spark was the public disclosure on X showing that Anthropic had leaked internal Claude Code source via an npm map file:

Claude code source code has been leaked via a map file in their npm registry!

Code: https://t.co/jBiMoOzt8G pic.twitter.com/rYo5hbvEj8
— Chaofan Shou (@Fried_rice) March 31, 2026

That alone was enough to make the story interesting. Source map leaks are one of those issues that feel almost quaint at this point. We all know better. Bundlers have done this sort of thing for years. CI/CD pipelines should catch it. Package contents should be validated before publication. Sensitive release hygiene should not depend on everyone remembering to squint at npm pack output at two in the morning. And yet here we are.

But once I had the source in hand, the interesting question was not “what spicy strings are in here?” It was “what does this reveal about how modern agent systems actually work?” So I deliberately started by analysing the leak itself. No commentary. No summaries. No hot takes. Just the code.

Only after building a view from the source did I go and read Sathwick’s article:

https://sathwick.xyz/blog/claude-hidden.html

That post makes an important observation: most people rushed to catalogue hidden features, internal prompts, half-finished capabilities, or names that sound like they escaped from a product roadmap meeting. That is understandable. It is also the wrong level of analysis if you care about security. The important thing was not what features were present. The important thing was how the system had been assembled.

The strongest line of thinking in that article is the claim that the moat is not the model; it is the harness around the model. I agree. But I want to take it one step further, because that framing is still missing the security punchline:

If the harness is the moat, the harness is also the attack surface.

That matters because modern agent systems are not just “LLM plus chat UI”. They are distributed runtimes. They have identity, state, token minting, transport layers, policy enforcement, memory, tool mediation, and control channels. They are full systems. And full systems break at boundaries.

This series is the result of a three-phase analysis of the leaked source code, followed by structured threat modelling and an attack tree. The aim is not to pretend we discovered a copy-paste remote code execution bug hiding in plain sight. The aim is more useful than that. It is to identify the real security shape of a modern agent runtime and to show why the model itself is often not the most interesting thing in the room.

The “Harness” mental model

Before diving into the code, we need to fix the industry’s mental model. We tend to think of AI security as a battle over the “brain” (the Model). In reality, the Model is just an engine; the Harness is the entire vehicle - the steering, the brakes, the ignition, and the locks.

graph TD subgraph Harness ["The Harness (The Real Attack Surface)"] Auth[Identity & OAuth] Bridge[Credential Bridge] Transport[SSE/WS Control Plane] Tools[Local Tool Mediation] State[Memory & Context] end Harness --> Model((The Model)) style Model fill:#f96,stroke:#333,stroke-width:4px

If the harness is the moat, the harness is also the attack surface.

Why this matters beyond one leak

It would be easy to treat this as an amusing vendor embarrassment: “ha, they leaked some TypeScript.” That would miss the point badly.

The architectural pattern exposed here is not peculiar to Anthropic. It is converging across the industry. Whether the badge on the front says OpenAI, Anthropic, Microsoft, Google, or some “agent platform” startup with a logo that looks like it was generated by the very thing it sells, the shape is becoming familiar:

A user authenticates.
The system mints or exchanges credentials for a runtime session.
That session gains access to some transport channel.
The transport carries both data and, often, control.
The runtime mediates tools, permissions, state, and context.
The resulting actions can reach other systems, sometimes many of them.

That means the dominant security problems increasingly look like:

identity translation flaws
capability leakage
state desynchronisation
confused deputy behaviour
transport binding mistakes
policy inconsistency across modes or recovery paths

In other words, old problems. Just wearing a tasteful new AI-themed blazer.

Methodology

I approached the leaked source in three phases.

Phase 1: Recon. Establish whether the leak included obvious embedded secrets or immediately exploitable artefacts, and identify the broad shape of the codebase.

Phase 2: Boundary mapping. Identify the places where trust changes form: identity becoming capability, capability becoming session authority, session state becoming transport control, and so on.

Phase 3: Abuse modelling. Turn those boundaries into plausible failure modes and rank the most interesting hypotheses.

Only after those three phases did I convert the result into an attack tree. That sequencing is intentional. It is the same basic reason threat modelling should start with architecture, not with a list of fashionable attack names. If you do not understand where authority enters, changes, or propagates, you are not modelling the system. You are just naming fears.

Phase 1: Recon — what was actually in the code?

The first question was the boring one, because boring questions are often the most useful:

Did this leak contain obvious, directly reusable secrets?

The quick answer: no dramatic “production root keys.” No glorious hardcoded signing key. No sk-whatever pasted into a helper. That is good. It would be irresponsible to imply otherwise.

But that does not mean the leak was low value. It simply means the value was architectural rather than immediately credential-centric.

Even a quick pass surfaced a highly structured runtime split across clearly recognisable concerns:

query and orchestration logic
tool abstractions
session creation and bridge-related code
JWT and token lifecycle code
trusted device logic
session ingress and transport handling
API wrappers for session creation and refresh

File names alone were suggestive. They pointed toward a system that clearly did more than just send prompts to a model and display text. There was a runtime here. There was session state. There were modes. There were policy knobs. There was a bridge.

What the absence of secrets still tells you

A common mistake in incident discussions is to assume that if no credentials were embedded, the leak was mostly embarrassing rather than strategically important. That is backwards.

Embedded credentials can absolutely be severe. But architecture leaks create a different kind of leverage: they reduce uncertainty.

Uncertainty is expensive for defenders and attackers alike. A leaked runtime reveals:

the token types that exist and where they appear
how they are refreshed
how state is represented
where transport boundaries are
which bits of policy are local versus remote
where “trusted device” controls slot in
how the system handles reconnect, replay, or recovery

That is not trivia. That is a map.

The first meaningful conclusion

This was not primarily a secret leak. It was a trust model leak.

The most valuable thing exposed was the conversion machinery: how identity becomes capability, how capability becomes action, and what extra conditions or policy layers are applied in between.

Continue reading: Part 2 — Mapping the Trust Boundaries and the Attack Tree

As ever, thanks for reading and feel free to leave comments down below!

The Model Isn't the Risk. The Harness Is (Part 1): The Leak, the Context, and the Framework

The Procurement TL;DR

Prologue: this started with a leak

The “Harness” mental model

Why this matters beyond one leak

Methodology

Phase 1: Recon — what was actually in the code?

What the absence of secrets still tells you

The first meaningful conclusion

Related Posts

OID-See v1.1.0: External Identity Posture, iOS Support, and New Auth Methods

From Clawdbot to GAINet: When Agent Experiments Outrun Accountability

OID-See v1.0.1: Small Release, Sharper Edges

The Model Isn't the Risk. …

The Procurement TL;DR

OID-See v1.1.0: External …

OID-See v1.1.0 is out

Self-Hosting Umami on …

Self-Hosting Umami on Netlify + Azure

TL;DR

Synthetic Authority and …

We Are Not Exploring New Territory

MCP, Latency, and …

The Wrong Latency Question

Capability ≠ Obligation: …

Capability ≠ Obligation