Caveman mode for Claude Code — verified cost savings 2026

Q: Does caveman work outside Claude Code?

Yes. The skill auto-activates on Claude Code, Codex, and Gemini; on Cursor, Windsurf, Cline, and Copilot it installs as always-on rule files via the --with-init flag; and on 30+ additional agents it's triggered per-session by typing /caveman or "talk like caveman". The token-reduction mechanic (output style only, no model swap) is identical across hosts.

Q: Does it hurt code quality or reasoning?

The skill only changes the OUTPUT style. Thinking/reasoning tokens — the chain-of-thought Claude uses internally before emitting an answer — are untouched. Independent research published in March 2026 (arXiv 2604.00025) found that brevity constraints actually improved accuracy by up to 26 percentage points on certain benchmarks across 31 models tested — overelaboration was a source of error, not a feature.

Q: What about input tokens? I thought those were most of my bill.

You're right — output is rarely more than 20% of total Claude Code spend, so the 65% headline is misleading on its own. The companion command /caveman-compress <file> rewrites memory files (CLAUDE.md, agent instructions, persona docs) into caveman-speak, cutting their input-token weight by 36–60% on the benchmarked sample. Caveman handles output; caveman-compress handles the input side. Stack them.

Q: How do I turn caveman off mid-session?

Say "normal mode". The skill drops the style instruction and Claude resumes verbose responses. There is no need to restart the session or remove the skill.

If you run Claude Code on Sonnet 4.6 for thirty hours a week, your monthly output bill is roughly $30. A community skill called caveman claims to cut that by two-thirds. The 65% headline is real — but only for output tokens, and output is rarely more than 20% of what you actually pay for.

What caveman actually is

caveman is a Claude Code skill (plus rule-file installer for 30+ other agents) that enforces ultra-terse output. It drops articles (“the”, “a”), drops fillers (“just”, “really”, “basically”), drops pleasantries (“Sure! Happy to help”), drops hedging, and lets fragments stand in for full sentences. Code blocks, file paths, function names, and URLs stay byte-for-byte exact.

The repo’s tagline puts it plainly: “why use many token when few token do trick.”

A few facts worth pinning before we walk the math, all verified against github.com/JuliusBrussee/caveman on 2026-06-01:

Version: v1.8.2, released May 12, 2026
Author: Julius Brussee
License: MIT
Stars / forks: 67.4k / 3.8k at time of writing
Install size: one shell command, takes about 30 seconds, Node 18+ required

The benchmark — output tokens, real Claude API counts

The README publishes a 10-task benchmark with actual Claude API token counts before and after caveman is enabled. The full table:

Task	Normal	Caveman	Saved
Explain React re-render bug	1,180	159	87%
Fix auth middleware token expiry	704	121	83%
PostgreSQL connection pool tuning	2,347	380	84%
Git rebase vs merge explanation	702	292	58%
Convert callback to async/await	387	301	22%
Microservices vs monolith	446	310	30%
PR security review	678	398	41%
Docker multi-stage build	1,042	290	72%
PostgreSQL race condition	1,200	232	81%
React error boundary	3,454	456	87%
Average	1,214	294	65%

Verified against github.com/JuliusBrussee/caveman v1.8.2, README “Benchmark” section, 2026-06-01.

A few honest reads of this table that the repo itself is careful about:

The range matters more than the average. Three tasks hit 87% reduction; one hit 22%. Explainer questions (“rebase vs merge”, “microservices vs monolith”) compress less than bug fixes — caveman shines when the answer is a small diff, not a paragraph of context.
These are output tokens only. The “Normal” column counts what Claude emits, not what you sent it or what its tools fed back.
Average task is 1,214 → 294 = 920 output tokens saved per task. Hold onto that number — it’s what drives the dollar math below.

The 65% is output tokens only — and that matters

So if you read “caveman saves 65%” and think you’ve cut your bill by two-thirds, you’ve inherited the wrong mental model. The skill is genuine, the numbers are honest, but the bottom-line impact depends on your output-to-input ratio.

What it actually saves in dollars

Take the average 920-output-token saving per task and apply it to current Anthropic rates for the two models Claude Code users actually run:

Model	Input /M	Output /M
Claude Opus 4.8 Anthropic	$5	$25
Claude Sonnet 4.6 Anthropic	$3	$15
Claude Haiku 4.5 Anthropic	$1	$5

For someone running 1,000 Claude Code tasks a month (a moderate week-day cadence — roughly one task every 12 minutes of an 8-hour workday):

On Claude Sonnet 4.6 ($15/M output):

Normal: 1,000 × 1,214 = 1,214,000 output tokens × $15/M = $18.21 / month on output
Caveman: 1,000 × 294 = 294,000 output tokens × $15/M = $4.41 / month on output
Output savings: $13.80 / month

On Claude Opus 4.8 ($25/M output):

Normal: 1,000 × 1,214 × $25/M = $30.35 / month on output
Caveman: 1,000 × 294 × $25/M = $7.35 / month on output
Output savings: $23.00 / month

These numbers are honest as long as you also remember the next sentence: that’s the savings on output, not on your bill. If output represents 10% of your total Claude Code spend (typical for agent workflows with heavy tool use and long context), then $23/month off the output side translates to roughly 6–7% off the total bill — call it $30/month off a $450/month total on Opus 4.8.

That’s still real money. But it’s not 65% of your bill.

Installation — 30 seconds

# macOS / Linux / WSL / Git Bash
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows PowerShell 5.1+
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

Requires Node 18 or newer. The installer drops skill files into each agent’s skill directory. It is safe to re-run — re-running upgrades existing installations rather than corrupting them. Manual install instructions and per-agent guidance are in the repo’s INSTALL.md.

On Claude Code the skill auto-activates every session and shows a small status-line badge. On other agents you trigger it per-session by typing /caveman or saying “talk like caveman.” Say “normal mode” to turn it off without restarting.

The four modes (and when to use each)

The skill ships with four compression levels. Each sticks until session end (or until you say “normal mode”):

Mode	What it drops	When to use
`/caveman lite`	Filler words only (“just”, “really”, “basically”, “happy to”)	Mixed sessions where you still want readable prose. The conservative default for documentation pairs and design reviews.
`/caveman full`	Filler + articles + pleasantries. Fragments allowed.	The everyday Claude Code mode. Best fit for bug fixes, refactors, code review, anything where the answer is mostly a diff.
`/caveman ultra`	Maximum terseness — telegraphic style, no conjunctions, abbreviations encouraged.	Long agent loops where output savings compound across hundreds of turns. The 87%-reduction tasks in the benchmark used this mode.
`/caveman wenyan`	Classical-Chinese-style compression even past `ultra`. Sometimes barely English.	Edge cases. Token-budget-constrained long sessions. Not recommended for code review or anything a colleague will read.

Bonus commands — the input-side story

Caveman ships four additional slash commands, and one of them is where the real total-bill savings come from:

/caveman-commit — generates conventional commit messages with ≤50-character subjects. Useful in its own right, not relevant to cost math.
/caveman-review — one-line PR comments. Same — useful, marginal cost impact.
/caveman-stats — displays session token usage, lifetime savings, and a running USD estimate against current Anthropic rates. Honest accountability for the “is this actually working?” question.
/caveman-compress <file> — this is the one that matters for total cost. It rewrites memory files (CLAUDE.md, agent instructions, persona docs, system prompts you maintain) into caveman-speak. The repo benchmarks this at 36–60% input-token reduction per session, depending on how verbose the source file was.

Combine /caveman (cuts output) with /caveman-compress (cuts the input-side memory and system load), and the math changes. Now you’re attacking both halves of the bill at once:

Output reduction: 65% (caveman mode)
Memory/system-prompt reduction: 36–60% (caveman-compress)
Realistic combined total-cost reduction: 25–40%, depending on your workflow’s input-to-output ratio.

That’s the number worth quoting, not the 65% headline.

Before / after — verbatim from the repo

Two examples the README publishes to show what the style actually looks like in practice.

React re-render bug. Normal mode:

“The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”

Caveman mode:

“New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.”

69 tokens → 19 tokens. The technical content is identical. The diagnosis is the same. The fix is the same. The lecture is gone.

Auth middleware bug. Normal mode:

“Sure! I’d be happy to help you with that. The issue you’re experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look and suggest a fix.”

Caveman mode:

“Bug in auth middleware. Token expiry check use < not <=. Fix:”

The caveman version skips the pleasantry, skips the restatement, and ships the fix immediately. Both verbatim from the repo’s README, 2026-06-01.

Stacking with prompt caching

Caveman attacks output. Prompt caching — which we walked through in detail in the cornerstone post on cost math — attacks the input rewrite cost on repeated prefixes. They’re orthogonal optimizations and they stack cleanly.

A worked combined example, again on Sonnet 4.6 at $3/M input and $15/M output, with a 5K-token system prompt that’s reused 10 times per session:

Baseline (neither): 10 × 5,000 input × $3/M + 10 × 1,214 output × $15/M = $0.150 + $0.182 = $0.332 per session
Caching only: First write at 1.25× ($0.019), nine cached reads at 0.1× ($0.014), output unchanged ($0.182) = $0.215 per session (35% savings)
Caveman only: Input unchanged ($0.150), output reduced ($0.044) = $0.194 per session (42% savings)
Both stacked: Input as cached above ($0.033), output as caveman ($0.044) = $0.077 per session (77% savings vs baseline)

Stack them and you’re working both sides of the bill at once.

When NOT to use it

There’s also a stylistic cost worth naming: code review comments in /caveman ultra mode read like a tired engineer at 2am. That’s fine for solo work. It’s awkward in a shared PR thread where a colleague is reading the comment cold. The skill’s /caveman-review command splits the difference — it stays terse but rounds the corners enough that nobody asks if you’re upset.

Does brevity hurt accuracy?

It’s a fair question — forcing a model into a less articulate persona could degrade its reasoning. The empirical answer, from independent research, says no.

A March 2026 paper, Brevity Constraints Reverse Performance Hierarchies in Language Models (arXiv 2604.00025), systematically evaluated 31 models from 0.5B to 405B parameters across 1,485 problems. The headline finding: brevity constraints improved accuracy by up to 26 percentage points on mathematical-reasoning and scientific-knowledge benchmarks. The proposed mechanism: spontaneous, scale-dependent verbosity introduces errors through over-elaboration. Constraining models to brief responses removes the rope.

That doesn’t prove caveman makes Claude smarter — caveman is a style instruction, not a reasoning constraint. But it does undercut the intuition that terseness must come at a cost. On the right tasks, terseness is the win.

Practical guidance

Five concrete recommendations after walking the math:

Install caveman if you spend more than $20/month on Claude Code output. Below that, the dollar savings don’t justify the style adjustment.
Run /caveman-stats after your first week. The skill tracks lifetime savings in USD against current Anthropic rates — if the displayed savings are under $10/month, you probably don’t have enough output volume to bother.
Always stack with /caveman-compress. Output-only savings cap at 5–15% of total spend. Compressing your CLAUDE.md and agent personas is where the other half of the win lives.
Default to /caveman full for everyday work, switch to /caveman lite for documentation tasks, and reach for /caveman ultra only inside long agent loops. The wenyan mode is fun but rarely worth the readability cost.
Toggle "normal mode" for learning sessions. When you’re asking Claude to teach you something new rather than fix something broken, the verbose explanation is doing useful work. Don’t squeeze it.

All numbers in this post verified against github.com/JuliusBrussee/caveman v1.8.2 on 2026-06-01, with vendor rates checked the same day against Anthropic’s prompt-caching docs. We re-verify and update prices on the model pages above when vendors change them — see our methodology for the verification cadence.

Frequently asked questions

Does caveman work outside Claude Code?

Yes. The skill auto-activates on Claude Code, Codex, and Gemini; on Cursor, Windsurf, Cline, and Copilot it installs as always-on rule files via the --with-init flag; and on 30+ additional agents it's triggered per-session by typing /caveman or "talk like caveman". The token-reduction mechanic (output style only, no model swap) is identical across hosts.

Does it hurt code quality or reasoning?

The skill only changes the OUTPUT style. Thinking/reasoning tokens — the chain-of-thought Claude uses internally before emitting an answer — are untouched. Independent research published in March 2026 (arXiv 2604.00025) found that brevity constraints actually improved accuracy by up to 26 percentage points on certain benchmarks across 31 models tested — overelaboration was a source of error, not a feature.

What about input tokens? I thought those were most of my bill.

You're right — output is rarely more than 20% of total Claude Code spend, so the 65% headline is misleading on its own. The companion command /caveman-compress <file> rewrites memory files (CLAUDE.md, agent instructions, persona docs) into caveman-speak, cutting their input-token weight by 36–60% on the benchmarked sample. Caveman handles output; caveman-compress handles the input side. Stack them.

How do I turn caveman off mid-session?

Say "normal mode". The skill drops the style instruction and Claude resumes verbose responses. There is no need to restart the session or remove the skill.

About the author

Yaroslav Vikhariev

Founder

Founder, AI//COST.

LinkedIn Author profile