Pavilion AI in GTM School Session 5 · Bonus Field Kit

Token Management for Claude Code

A working playbook for treating context tokens like cap-table.

Scott Wueschinski · scott@gtmify.io · May 2026

Most teams ship MCP servers like they're Chrome extensions — install one, install another, never measure the cost of either. By the time you have five connected, your context window is being eaten alive before the first prompt. This kit is the playbook I use to keep context cheap, capability high, and the stack reproducible across machines.

1.The thesis

Context tokens are a fixed cost you pay before any work happens. Most AI-GTM practitioners are stacking MCP servers without measuring the boot tax. Every MCP loads its entire tool surface at session start whether you call it or not. Across a typical revenue stack, that's tens of thousands of context tokens burned at every launch, on every machine, for every team member — before the first prompt.

CLIs invert the model: lazy load, near-zero boot cost, a couple hundred tokens per call, and the heavy response stays out of context because the binary processes the bulk and returns a summary.

The discipline. Treat token budget like cap-table. Quarterly audit. Every MCP must justify why it isn't a CLI yet. If it can't justify, it gets migrated or removed.

What changes when you adopt this framing:

Sessions start faster and stay coherent longer.
Hard tasks get more reliable because models aren't competing with a wall of tool descriptions.
Onboarding new teammates becomes cheaper — they inherit a curated toolbox, not a shrugged-shoulders pile of MCPs.
You can actually add capability without trading it against existing context.

2.The decision tree

Apply in order. Stop at the first one that fits.

Order	Condition	Action
1	Does an official CLI exist?	Wrap it with a Claude Code skill. Examples: Vercel CLI, Stripe CLI, Wrangler, Supabase, gh, Mercury. The vendor maintains the binary; you maintain the skill that tells Claude when to use it.
2	Public REST API but no official CLI?	Build a thin Go wrapper (or Python, or Bun — whatever your shop ships). Most vendor APIs are 200 lines of HTTP + a JSON marshaller away from being a perfectly usable CLI.
3	Vendor only ships an MCP — no public REST?	Build a JSON-RPC client that hits the same MCP HTTP endpoint. You get the capability without paying the boot tax. This is the "MCP-endpoint-passthrough" pattern.
4	None of the above?	Install the MCP, but write down why in the commit message. Audit quarterly. Most vendors ship REST within 6 months once they see usage.

Probe before you build. The biggest single hour I've saved was checking a vendor's REST docs after assuming MCP was the only path. The REST endpoint was there, undocumented in their marketing. Always check before falling back to passthrough.

3.Reference implementations to clone

The five auth patterns that cover ~95% of real-world vendor APIs. Once you have one of each working, every new CLI is a copy-paste-and-rename job.

Pattern	What it solves	Where you'll meet it
REST + Bearer	Standard `Authorization: Bearer <key>`. Easiest path.	Slack, most modern SaaS APIs, JWT auth in general
REST + custom header	Vendor invents their own header name (`api_key:`, `X-API-Key`, `X-Auth-Token`).	Many enrichment vendors, smaller SaaS
REST + non-standard auth scheme	`Authorization: api-key <key>` (literal word "api-key", not "Bearer").	Vendors that misread the OAuth spec
OAuth installed-app flow	Browser consent once, refresh-token-managed forever. Includes rotation handling.	Google Workspace, Microsoft Graph, Intuit, anything enterprise
JSON-RPC passthrough over MCP HTTP	Vendor only exposes an MCP. You hit the same endpoint manually, skip the protocol overhead.	MCP-only vendors during the transition years

One worked example. Today's migration: a vendor in the prospecting space shipped a brand-new REST API alongside its existing MCP. Same API key now works as Authorization: Bearer <key> against the REST surface. Result: dropped a session-stateful three-step MCP handshake (initialize → notifications/initialized → tools/call) down to a single HTTP request per operation, and gained three new endpoints the MCP never exposed. The migration was one afternoon. The first version of the MCP wrapper had been ~750 lines. The REST version is ~560 lines and faster.

4.Shell shortcuts that compound

The trick isn't any one alias. It's that they all share a single convention so muscle memory transfers.

The launcher family

A consistent cc-<context> prefix that cds into the right project root, loads project-scoped credentials, and execs claude in one step.

cc                  # launch in current dir
cc-research         # cd to sandbox dir, then claude
cc-personal         # cd to personal projects, then claude
cc-<client>         # one per client engagement

Each launcher's cd target drives session-aware memory injection (see section 6) — entering a client's directory automatically surfaces only that client's memories at session start. No prompting required.

The lifecycle pair

Command	What it does
`cc-sync`	Commit and push your local config repo. Three-step rebase fallback on push reject. Net effect: every config change is immediately available on every machine.
`cc-pull`	Twelve-step idempotent provision script. Pulls config, heals symlinks, decrypts secrets, syncs runtimes (npm/mise/brew/uv), installs MCPs, merges `.zshrc`, builds CLIs, optionally pulls every engagement repo. Re-runs are safe.

The credential loader

A single shell function that runs at every shell start and at every launcher invocation. Pulls a curated set of credentials from your password manager into env via op read (1Password CLI). The list is in a synced template file. Persistent edits flow through the sync pipeline; machine-local overrides live in a gitignored file.

Why this matters. Most teams pass API keys around in Slack DMs and pinned notes. The launcher pattern makes credential loading invisible: you type cc-<client>, you have the keys, you start working. Onboarding a new teammate becomes brew install op && ./bootstrap.sh.

5.Two-machine sync without drift

Config-as-code, applied to the AI workspace.

Layer	Mechanism
Source of truth	A git repo containing every file Claude Code reads from `~/.claude`.
The symlink trick	`~/.claude` is a symlink into the synced repo. Everything Claude Code writes is automatically inside version control. Zero file copying.
Shared secrets	sops-encrypted with each machine's age public key. The encrypted file is committed; decryption happens during `cc-pull`.
Machine-local secrets	A gitignored `.env.local` for keys that should never leave one host.
Cache / runtime state	Vector indexes, plugin caches, telemetry — all gitignored. They rebuild locally on first use.

Edge case caught the hard way. OAuth client-secret JSON files that you download to ~/Downloads go stale silently when the cloud provider rotates the secret. The password-manager copy is always canonical. Never trust a two-week-old download.

6.Memory + hook architecture

Two halves: auto-capture (so nothing important is lost) and hand-curated memory (so the load-bearing rules persist across sessions).

Auto-capture (running in the background)

Hook event	What it captures	Where it lands
`PostToolUse` (MCP calls)	Every tool invocation: input, output, latency	Postgres table for cross-session recall
`PostToolUseFailure`	Failed tool calls — the most useful debug surface	Same table
`SessionEnd`	Full session transcript as JSONL	Cloud storage bucket
`SessionStart`	Injects a project-relevant memory slice as `additionalContext`	Read at every launch

Hand-curated memory (the canonical store)

A directory of typed markdown files — user profile, feedback (corrections + confirmed preferences), project state, references. Each file has YAML frontmatter and structured body. Across a year of daily use, you accumulate a few hundred load-bearing memories. They are auto-loaded at session start through an index file.

The key insight: auto-captured raw transcripts are noise; hand-curated memories are signal. The work is choosing what to remember, not how to capture everything.

Local semantic recall

To search across hundreds of memory files without guessing filenames, run a small local embeddings index (model2vec or similar — pure NumPy inference, no PyTorch, around 30 MB on disk). 256-dim vectors. Sub-second cold search over 250+ files. Indexes incrementally by file mtime. No API keys, no network calls beyond a one-time model download. The memory dir stays local, the search stays local, the recall stays free.

The 30-second demo. Type mem-search "the CPA in NJ". Top result is the right file at 0.34 cosine. No keyword overlap with the filename. This is the difference between "I remember writing something about this" and actually finding it.

7.The production safety rail

Speed and autonomy are great until you accidentally DROP TABLE users on a Friday. The single most important rule in my entire setup:

Claude Code never writes to production. Read-only access only. Any production write must be triggered by a human, by name, on a date you can quote.

Mechanically, that's enforced two ways:

Behavioral rule at the top of CLAUDE.md — first thing the model reads, takes precedence over every "auto mode" or "approved earlier" reasoning loop.
Permission deny list in settings.json — belt and suspenders. If the model ignores the rule, the runtime blocks the call.

This pattern is general. Pick the two or three actions in your stack where mistakes are unrecoverable. Block them at both layers. Then let the agent move fast on everything else.

8.Five takeaways you can apply Monday

Measure the boot tax. Count the tokens your current MCP set burns before the first prompt. You can't optimize what you don't measure.
Adopt the decision tree. Next time you reach for an MCP, walk down the four-step list first. Build the CLI when step 2 or 3 fits.
Pick a launcher prefix and commit to it. cc-<context> is mine. Yours can be anything. The discipline is that every AI session starts through a launcher, never raw claude in some random directory.
Symlink your config into a synced repo. One commit gives you cross-machine parity. One cc-pull brings any new laptop up to full speed.
Block production writes at two layers. Behavioral rule plus permission deny list. Costs nothing, prevents the worst single outcome.

9.Lines you can steal

"Most AI-GTM practitioners are paying rent on context they're not using."

"Every MCP is a tax. Every CLI is a transaction."

"Memory is engineering, not stenography."

"The right place to compress isn't in the model. It's in the protocol."

"cc-sync is the most important command in my shell. Everything else compiles down to that."