A working playbook for treating context tokens like cap-table.
Most teams ship MCP servers like they're Chrome extensions — install one, install another, never measure the cost of either. By the time you have five connected, your context window is being eaten alive before the first prompt. This kit is the playbook I use to keep context cheap, capability high, and the stack reproducible across machines.
Context tokens are a fixed cost you pay before any work happens. Most AI-GTM practitioners are stacking MCP servers without measuring the boot tax. Every MCP loads its entire tool surface at session start whether you call it or not. Across a typical revenue stack, that's tens of thousands of context tokens burned at every launch, on every machine, for every team member — before the first prompt.
CLIs invert the model: lazy load, near-zero boot cost, a couple hundred tokens per call, and the heavy response stays out of context because the binary processes the bulk and returns a summary.
What changes when you adopt this framing:
Apply in order. Stop at the first one that fits.
| Order | Condition | Action |
|---|---|---|
| 1 | Does an official CLI exist? | Wrap it with a Claude Code skill. Examples: Vercel CLI, Stripe CLI, Wrangler, Supabase, gh, Mercury. The vendor maintains the binary; you maintain the skill that tells Claude when to use it. |
| 2 | Public REST API but no official CLI? | Build a thin Go wrapper (or Python, or Bun — whatever your shop ships). Most vendor APIs are 200 lines of HTTP + a JSON marshaller away from being a perfectly usable CLI. |
| 3 | Vendor only ships an MCP — no public REST? | Build a JSON-RPC client that hits the same MCP HTTP endpoint. You get the capability without paying the boot tax. This is the "MCP-endpoint-passthrough" pattern. |
| 4 | None of the above? | Install the MCP, but write down why in the commit message. Audit quarterly. Most vendors ship REST within 6 months once they see usage. |
The five auth patterns that cover ~95% of real-world vendor APIs. Once you have one of each working, every new CLI is a copy-paste-and-rename job.
| Pattern | What it solves | Where you'll meet it |
|---|---|---|
| REST + Bearer | Standard Authorization: Bearer <key>. Easiest path. |
Slack, most modern SaaS APIs, JWT auth in general |
| REST + custom header | Vendor invents their own header name (api_key:, X-API-Key, X-Auth-Token). |
Many enrichment vendors, smaller SaaS |
| REST + non-standard auth scheme | Authorization: api-key <key> (literal word "api-key", not "Bearer"). |
Vendors that misread the OAuth spec |
| OAuth installed-app flow | Browser consent once, refresh-token-managed forever. Includes rotation handling. | Google Workspace, Microsoft Graph, Intuit, anything enterprise |
| JSON-RPC passthrough over MCP HTTP | Vendor only exposes an MCP. You hit the same endpoint manually, skip the protocol overhead. | MCP-only vendors during the transition years |
One worked example. Today's migration: a vendor in the prospecting space shipped a brand-new REST API alongside its existing MCP. Same API key now works as Authorization: Bearer <key> against the REST surface. Result: dropped a session-stateful three-step MCP handshake (initialize → notifications/initialized → tools/call) down to a single HTTP request per operation, and gained three new endpoints the MCP never exposed. The migration was one afternoon. The first version of the MCP wrapper had been ~750 lines. The REST version is ~560 lines and faster.
The trick isn't any one alias. It's that they all share a single convention so muscle memory transfers.
A consistent cc-<context> prefix that cds into the right project root, loads project-scoped credentials, and execs claude in one step.
cc # launch in current dir
cc-research # cd to sandbox dir, then claude
cc-personal # cd to personal projects, then claude
cc-<client> # one per client engagement
Each launcher's cd target drives session-aware memory injection (see section 6) — entering a client's directory automatically surfaces only that client's memories at session start. No prompting required.
| Command | What it does |
|---|---|
cc-sync |
Commit and push your local config repo. Three-step rebase fallback on push reject. Net effect: every config change is immediately available on every machine. |
cc-pull |
Twelve-step idempotent provision script. Pulls config, heals symlinks, decrypts secrets, syncs runtimes (npm/mise/brew/uv), installs MCPs, merges .zshrc, builds CLIs, optionally pulls every engagement repo. Re-runs are safe. |
A single shell function that runs at every shell start and at every launcher invocation. Pulls a curated set of credentials from your password manager into env via op read (1Password CLI). The list is in a synced template file. Persistent edits flow through the sync pipeline; machine-local overrides live in a gitignored file.
cc-<client>, you have the keys, you start working. Onboarding a new teammate becomes brew install op && ./bootstrap.sh.
Config-as-code, applied to the AI workspace.
| Layer | Mechanism |
|---|---|
| Source of truth | A git repo containing every file Claude Code reads from ~/.claude. |
| The symlink trick | ~/.claude is a symlink into the synced repo. Everything Claude Code writes is automatically inside version control. Zero file copying. |
| Shared secrets | sops-encrypted with each machine's age public key. The encrypted file is committed; decryption happens during cc-pull. |
| Machine-local secrets | A gitignored .env.local for keys that should never leave one host. |
| Cache / runtime state | Vector indexes, plugin caches, telemetry — all gitignored. They rebuild locally on first use. |
~/Downloads go stale silently when the cloud provider rotates the secret. The password-manager copy is always canonical. Never trust a two-week-old download.
Two halves: auto-capture (so nothing important is lost) and hand-curated memory (so the load-bearing rules persist across sessions).
| Hook event | What it captures | Where it lands |
|---|---|---|
PostToolUse (MCP calls) | Every tool invocation: input, output, latency | Postgres table for cross-session recall |
PostToolUseFailure | Failed tool calls — the most useful debug surface | Same table |
SessionEnd | Full session transcript as JSONL | Cloud storage bucket |
SessionStart | Injects a project-relevant memory slice as additionalContext | Read at every launch |
A directory of typed markdown files — user profile, feedback (corrections + confirmed preferences), project state, references. Each file has YAML frontmatter and structured body. Across a year of daily use, you accumulate a few hundred load-bearing memories. They are auto-loaded at session start through an index file.
The key insight: auto-captured raw transcripts are noise; hand-curated memories are signal. The work is choosing what to remember, not how to capture everything.
To search across hundreds of memory files without guessing filenames, run a small local embeddings index (model2vec or similar — pure NumPy inference, no PyTorch, around 30 MB on disk). 256-dim vectors. Sub-second cold search over 250+ files. Indexes incrementally by file mtime. No API keys, no network calls beyond a one-time model download. The memory dir stays local, the search stays local, the recall stays free.
mem-search "the CPA in NJ". Top result is the right file at 0.34 cosine. No keyword overlap with the filename. This is the difference between "I remember writing something about this" and actually finding it.
Speed and autonomy are great until you accidentally DROP TABLE users on a Friday. The single most important rule in my entire setup:
Claude Code never writes to production. Read-only access only. Any production write must be triggered by a human, by name, on a date you can quote.
Mechanically, that's enforced two ways:
CLAUDE.md — first thing the model reads, takes precedence over every "auto mode" or "approved earlier" reasoning loop.settings.json — belt and suspenders. If the model ignores the rule, the runtime blocks the call.This pattern is general. Pick the two or three actions in your stack where mistakes are unrecoverable. Block them at both layers. Then let the agent move fast on everything else.
cc-<context> is mine. Yours can be anything. The discipline is that every AI session starts through a launcher, never raw claude in some random directory.cc-pull brings any new laptop up to full speed."Most AI-GTM practitioners are paying rent on context they're not using."
"Every MCP is a tax. Every CLI is a transaction."
"Memory is engineering, not stenography."
"The right place to compress isn't in the model. It's in the protocol."
"cc-sync is the most important command in my shell. Everything else compiles down to that."