Hero image generated by ChatGPT
This is a personal blog and all content herein is my personal opinion and not that of my employer.
The Procurement TL;DR
If you are buying or integrating an AI Agent platform today, ignore the “magic” demos. Ask these three questions:
- How is my identity converted into a capability? (The Bridge)
- Can the transport channel carry control commands? (The Control Plane)
- What binds a session token to my specific device? (The Context)
This is Part 1 of a 3-part series.
| Part | Title |
|---|---|
| Part 1 (this post) | The Leak, the Context, and the Framework |
| Part 2 | Mapping the Trust Boundaries and the Attack Tree |
| Part 3 | Defending Against Runtime Abuse |
Prologue: this started with a leak
This analysis did not begin with a whitepaper, a vendor keynote, or a carefully sanded marketing diagram about “AI agents transforming productivity”. It started with an old and embarrassingly familiar class of failure: shipping more of the product than you intended to ship.
The spark was the public disclosure on X showing that Anthropic had leaked internal Claude Code source via an npm map file:
Claude code source code has been leaked via a map file in their npm registry!
— Chaofan Shou (@Fried_rice) March 31, 2026
Code: https://t.co/jBiMoOzt8G pic.twitter.com/rYo5hbvEj8
That alone was enough to make the story interesting. Source map leaks are one of those issues that feel almost quaint at this point. We all know better. Bundlers have done this sort of thing for years. CI/CD pipelines should catch it. Package contents should be validated before publication. Sensitive release hygiene should not depend on everyone remembering to squint at npm pack output at two in the morning. And yet here we are.
But once I had the source in hand, the interesting question was not “what spicy strings are in here?” It was “what does this reveal about how modern agent systems actually work?” So I deliberately started by analysing the leak itself. No commentary. No summaries. No hot takes. Just the code.
Only after building a view from the source did I go and read Sathwick’s article:
https://sathwick.xyz/blog/claude-hidden.html
That post makes an important observation: most people rushed to catalogue hidden features, internal prompts, half-finished capabilities, or names that sound like they escaped from a product roadmap meeting. That is understandable. It is also the wrong level of analysis if you care about security. The important thing was not what features were present. The important thing was how the system had been assembled.
The strongest line of thinking in that article is the claim that the moat is not the model; it is the harness around the model. I agree. But I want to take it one step further, because that framing is still missing the security punchline:
If the harness is the moat, the harness is also the attack surface.
That matters because modern agent systems are not just “LLM plus chat UI”. They are distributed runtimes. They have identity, state, token minting, transport layers, policy enforcement, memory, tool mediation, and control channels. They are full systems. And full systems break at boundaries.
This series is the result of a three-phase analysis of the leaked source code, followed by structured threat modelling and an attack tree. The aim is not to pretend we discovered a copy-paste remote code execution bug hiding in plain sight. The aim is more useful than that. It is to identify the real security shape of a modern agent runtime and to show why the model itself is often not the most interesting thing in the room.
The “Harness” mental model
Before diving into the code, we need to fix the industry’s mental model. We tend to think of AI security as a battle over the “brain” (the Model). In reality, the Model is just an engine; the Harness is the entire vehicle - the steering, the brakes, the ignition, and the locks.
If the harness is the moat, the harness is also the attack surface.
Why this matters beyond one leak
It would be easy to treat this as an amusing vendor embarrassment: “ha, they leaked some TypeScript.” That would miss the point badly.
The architectural pattern exposed here is not peculiar to Anthropic. It is converging across the industry. Whether the badge on the front says OpenAI, Anthropic, Microsoft, Google, or some “agent platform” startup with a logo that looks like it was generated by the very thing it sells, the shape is becoming familiar:
- A user authenticates.
- The system mints or exchanges credentials for a runtime session.
- That session gains access to some transport channel.
- The transport carries both data and, often, control.
- The runtime mediates tools, permissions, state, and context.
- The resulting actions can reach other systems, sometimes many of them.
That means the dominant security problems increasingly look like:
- identity translation flaws
- capability leakage
- state desynchronisation
- confused deputy behaviour
- transport binding mistakes
- policy inconsistency across modes or recovery paths
In other words, old problems. Just wearing a tasteful new AI-themed blazer.
Methodology
I approached the leaked source in three phases.
Phase 1: Recon. Establish whether the leak included obvious embedded secrets or immediately exploitable artefacts, and identify the broad shape of the codebase.
Phase 2: Boundary mapping. Identify the places where trust changes form: identity becoming capability, capability becoming session authority, session state becoming transport control, and so on.
Phase 3: Abuse modelling. Turn those boundaries into plausible failure modes and rank the most interesting hypotheses.
Only after those three phases did I convert the result into an attack tree. That sequencing is intentional. It is the same basic reason threat modelling should start with architecture, not with a list of fashionable attack names. If you do not understand where authority enters, changes, or propagates, you are not modelling the system. You are just naming fears.
Phase 1: Recon — what was actually in the code?
The first question was the boring one, because boring questions are often the most useful:
Did this leak contain obvious, directly reusable secrets?
The quick answer: no dramatic “production root keys.” No glorious hardcoded signing key. No sk-whatever pasted into a helper. That is good. It would be irresponsible to imply otherwise.
But that does not mean the leak was low value. It simply means the value was architectural rather than immediately credential-centric.
Even a quick pass surfaced a highly structured runtime split across clearly recognisable concerns:
- query and orchestration logic
- tool abstractions
- session creation and bridge-related code
- JWT and token lifecycle code
- trusted device logic
- session ingress and transport handling
- API wrappers for session creation and refresh
File names alone were suggestive. They pointed toward a system that clearly did more than just send prompts to a model and display text. There was a runtime here. There was session state. There were modes. There were policy knobs. There was a bridge.
What the absence of secrets still tells you
A common mistake in incident discussions is to assume that if no credentials were embedded, the leak was mostly embarrassing rather than strategically important. That is backwards.
Embedded credentials can absolutely be severe. But architecture leaks create a different kind of leverage: they reduce uncertainty.
Uncertainty is expensive for defenders and attackers alike. A leaked runtime reveals:
- the token types that exist and where they appear
- how they are refreshed
- how state is represented
- where transport boundaries are
- which bits of policy are local versus remote
- where “trusted device” controls slot in
- how the system handles reconnect, replay, or recovery
That is not trivia. That is a map.
The first meaningful conclusion
This was not primarily a secret leak. It was a trust model leak.
The most valuable thing exposed was the conversion machinery: how identity becomes capability, how capability becomes action, and what extra conditions or policy layers are applied in between.
Continue reading: Part 2 — Mapping the Trust Boundaries and the Attack Tree
As ever, thanks for reading and feel free to leave comments down below!