Last updated 2026-07-11 / 9 items tagged cost-optimization

Tag Independent editorial

Tag / cost optimization

9 posts tagged "cost-optimization".

Posts tracked

New this month

13min avg

Avg read time

§ Filter by category

Free AI compute in 2026: every free tier, trial, startup grant, and student deal — verified

Every legit way to get LLM compute for $0 in 2026: free API tiers, trial credits, student offers, and startup grants up to $350k — verified, with the catch on each.

By Yaroslav V. · Jun 26, 2026 · 17 min read

Showing 8 of 9 posts — page 1 of 1

Tools Howto 02

Compaction: when to summarize an agent's history — and when it backfires

Anthropic's compaction API summarizes an agent's history when it hits a token threshold. How it works, the billing pass you don't see, and when it backfires.

Yaroslav V. · Jul 11, 2026 · 14 min read

Tools Howto 03

Context editing: cut agent token bills ~84% by clearing stale tool results

Anthropic's context editing clears stale tool results from an agent's window, cutting token use up to 84%. How it works, the config, and the prompt-cache catch.

context-engineering context-editing agents

Yaroslav V. · Jul 11, 2026 · 13 min read

Roundup 04

Best LLM APIs with a free tier (2026)

Which LLM APIs are actually free to call in 2026 — not free chat apps or expiring trial credits. Verified free tiers, real rate limits, and the data catch.

free-tier llm-api free-llm

Yaroslav V. · Jun 3, 2026 · 12 min read

Analysis 05

Hidden LLM API costs nobody quotes (2026)

The $/M input/output sticker hides cache-write premiums, a tokenizer tax, context surcharges, invisible reasoning tokens, and more. Ten verified costs, with proofs.

api-pricing cost-optimization prompt-caching

Yaroslav V. · Jun 3, 2026 · 14 min read

Analysis 06

The reasoning dial nobody touches: why your thinking-model bill is so high

In 2026 the biggest lever on your LLM bill isn't which model. It's how hard the model thinks: reasoning tokens bill as output, and most models think by default.

reasoning-models cost-optimization api-pricing

Yaroslav V. · Jun 3, 2026 · 13 min read

Roundup 07

The cheapest LLM API by use case (2026)

There is no single cheapest LLM API. The winner depends on your workload shape. Verified $/M prices for classification, chat, coding, RAG, batch, and reasoning.

cheapest-llm api-pricing cost-optimization

Yaroslav V. · Jun 2, 2026 · 13 min read

Tools Howto 08

Caveman mode for Claude Code: the math behind the 65% headline

Caveman mode strips filler from Claude's output. The 65% headline is real for output tokens. Here's what it actually saves on your monthly bill.

caveman claude-code cost-optimization

Yaroslav V. · Jun 1, 2026 · 9 min read

Analysis 09

How prompt caching actually changes the LLM cost math

Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.

prompt-caching llm-pricing cost-optimization

Yaroslav V. · Jun 1, 2026 · 11 min read