Analysis.

3 posts in this category.

Posts tracked

New this month

13min avg

Avg read time

§ Filter by category

How prompt caching actually changes the LLM cost math

Anthropic charges 10% of input for cache reads, with a 1.25x write fee. OpenAI auto-caches above 1024 tokens. The math changes which LLM is cheapest -- here is when.

By Yaroslav V. · Jun 1, 2026 · 11 min read

Showing 2 of 3 posts — page 1 of 1

Analysis 02

Hidden LLM API costs nobody quotes (2026)

The $/M input/output sticker hides cache-write premiums, a tokenizer tax, context surcharges, invisible reasoning tokens, and more. Ten verified costs, with proofs.

Yaroslav V. · Jun 3, 2026 · 14 min read

Analysis 03

The reasoning dial nobody touches: why your thinking-model bill is so high

In 2026 the biggest lever on your LLM bill isn't which model. It's how hard the model thinks: reasoning tokens bill as output, and most models think by default.

reasoning-models cost-optimization api-pricing

Yaroslav V. · Jun 3, 2026 · 13 min read