Best LLM APIs with a free tier (2026)

By Yaroslav Vikhariev Founder

Most “free LLM API” lists are wrong before the first row. They count free chat apps that have no API, trial credits that expire in a month, and open-weight models you have to pay a GPU to run. A free tier means one specific thing: you can call the API at $0, indefinitely, within rate limits. Here is every API that genuinely qualifies in 2026 — with the real limits, the credit-card rule, and the data catch — and an honest list of what doesn’t.

What counts as a “free tier” — and what doesn’t

Before the list, the definition, because it’s where every other roundup goes soft.

A free tier is an ongoing offer to call the API at $0, capped by rate limits, with no expiry. You can build a side project on it and it keeps working next month. Three things routinely get mislabeled as free tiers and aren’t:

  • Free chat apps. ChatGPT’s free plan, Claude.ai’s free plan, and the Gemini app are free for a human in a browser. They expose no programmatic API. The vendors that give away the chat app (OpenAI, Anthropic) charge for the API.
  • Trial credits. Together AI’s $25–$100 new-account credit, Google Cloud’s $300 credit, AWS and Azure starter credits — these are generous, but they’re a one-time balance that runs out. A trial is not a tier.
  • Open weights. Llama, DeepSeek, Qwen, Gemma, and GLM publish weights you can download for free. Running them costs a GPU. “Free to self-host” is not “free to call.”

Everything below passes that test (with one labeled exception). All limits were verified against each provider’s own documentation on 2026-06-03.

The genuinely-free API tiers (2026)

ProviderBest free model(s)Free daily limitCard?Best for
Google AI StudioGemini 3 Flash, 2.5 Flash-Lite1,500 / 1,000 req/day · 1M ctxNoBest models, biggest context
CerebrasLlama 3.x, Qwen (fast)1,000,000 tokens/day · 8K ctxNoMost tokens, fastest output
GroqLlama 3.1 8B, Llama 4up to 14,400 req/dayNo (email)High request volume, speed
Mistral La PlateformeFull lineup incl. Large, Codestral~1B tokens/monthNo (phone)Trying a full vendor lineup
OpenRouter~25 :free models (DeepSeek, Llama…)50–1,000 req/dayNoOne key, many models
Z.ai (Zhipu)GLM-4.7-Flash, GLM-4.5-Flash~1,000 req/day · concurrency 1NoFree even on the price sheet
Cloudflare Workers AILlama, open models10,000 neurons/dayNo (CF acct)Edge apps already on Cloudflare
GitHub ModelsGPT, Llama, Phi50–150 req/dayNo (GH acct)Trying GPT/Llama without vendor billing

Notes that matter as much as the numbers:

Google AI Studio is the strongest free tier for capability. You get the current Gemini Flash models — Gemini 2.5 Flash-Lite at 15 requests/minute and 1,000/day, Gemini 3 Flash at 10 requests/minute and 1,500/day — with the full 1M-token context window and no credit card. Gemini 2.5 Pro is also on the free tier but throttled hard (5 requests/minute, 50/day). The catch is the data clause, covered below.

Cerebras wins on raw throughput: a flat 1,000,000 tokens/day, free, no card, and it doesn’t expire. The constraint is an 8,192-token context cap across free-tier models — fine for chat and short tasks, not for long-document work. Its draw is speed; Cerebras runs open models at output rates other providers can’t match.

Groq is the other speed play, with the highest request ceiling here — up to 14,400 requests/day on Llama 3.1 8B Instant (most other models sit at 30 requests/minute, 1,000/day). Email signup, no card. Rate limits are per-organization, not per-key, so you can’t multiply them with extra keys.

Mistral’s La Plateforme free (“Experiment”) tier is the most generous on breadth: roughly 1 billion tokens/month, rate-limited, across the entire lineup including Mistral Large and Codestral. It needs a verified phone number rather than a card. As Mistral itself notes, neither OpenAI nor Anthropic offers anything comparable — you can build and test against a full frontier-ish lineup before spending a cent. See the Mistral provider hub for where the paid tiers start if you outgrow it.

OpenRouter isn’t a model vendor — it’s a router — but its :free model variants are the easiest way to touch ~25 models (DeepSeek, Llama, and others) through a single API key. The limit is 20 requests/minute and 50 requests/day, which rises to 1,000/day once you’ve made a one-time purchase of at least 10 credits. Two honest caveats: failed requests still count against your quota, and popular free models get throttled by the upstream provider at peak (you’ll see 429 errors even with credits).

Free even on the price sheet

Most “free tier” models are paid models you’re allowed to call for free up to a limit. A few models are genuinely $0 on the list price itself — you’re not borrowing against a quota, the per-token rate is zero:

Model Input /MOutput /MContext
GLM-4.7 Flash Zhipu (Z.ai / GLM) $0 $0 200K
GLM-4.5 Flash Zhipu (Z.ai / GLM) $0 $0 128K
Hunyuan Lite Tencent (Hunyuan) $0 $0 documented elsewhere
Models priced at $0/M on the list itself (verified in our data layer). Access still runs through the vendor's API and its own free-tier limits.

GLM-4.7-Flash and GLM-4.5-Flash on Z.ai are the standouts: real, capable lightweight models at $0/M, free to all registered users with no subscription. The constraint is a concurrency limit of 1 (one in-flight request at a time) and a daily request cap — fine for a prototype, a bottleneck at scale. Hunyuan Lite from Tencent is similarly $0, though access runs through Tencent Cloud, which adds a data-residency consideration for non-CN teams.

The asterisk: OpenAI’s “free” daily tokens

OpenAI belongs in its own category because its free offer is real but conditional. If your organization has a positive paid balance and opts into sharing your API traffic for model training, you get free daily tokens: up to 1M/day (250K on usage tiers 1–2) shared across the large models (GPT-5, GPT-4.1, GPT-4o, o3 and friends), and up to 10M/day (2.5M on tiers 1–2) across the mini and nano models. Tokens reset at 00:00 UTC. It’s unavailable to Enterprise accounts and anyone with Zero Data Retention enabled.

The catch every free tier shares: your data

Free inference has to be paid for somehow. When it isn’t dollars, it’s usually your data.

The other two catches are quieter but real. Rate limits are the wall you actually hit — a 1,000-request/day cap sounds generous until a single user session burns 200 calls. And there’s no SLA: free traffic is best-effort and the first to be throttled when a provider is busy, so a free tier is the wrong foundation for anything someone is paying you for.

What is NOT a free tier (so you don’t get fooled)

To close the loop on the definition, the specific things that show up on other lists and shouldn’t:

  • ChatGPT / Claude.ai / Gemini app free plans — free chat for humans, no API. Anthropic and OpenAI monetize the API; the app being free tells you nothing about API cost.
  • Together AI’s $25–$100 new-account credit — a generous trial balance across 80+ open models, but it expires. Excellent for a first evaluation; not an ongoing free tier.
  • Google Cloud $300 / AWS / Azure starter credits — same story: one-time balances, not tiers.
  • Open weights (Llama, DeepSeek, Qwen, Gemma, GLM) — free to download, not free to run. Self-hosting for low volume usually costs more than a hosted free tier.
  • Anthropic — no free API tier exists. The only free Claude is the chat app.

How to pick

A quick decision path, assuming you’ve accepted that free = prototype-grade and your-data-is-the-price:

  1. Need the most capable models? Google AI Studio (current Gemini Flash, 1M context, 1,500/day).
  2. Need the most tokens or the fastest output? Cerebras (1M tokens/day) or Groq (up to 14,400 requests/day) — both email-only.
  3. Want to test against a whole vendor lineup? Mistral La Plateforme (~1B tokens/month, full lineup).
  4. Want one key for many models? OpenRouter :free.
  5. Want a model that’s free on the price sheet, not just rate-limited? GLM-4.7-Flash on Z.ai.
  6. Already on Cloudflare, or want GPT/Llama without vendor billing? Workers AI (10K neurons/day) or GitHub Models.

And the moment a free tier’s rate limits or data policy stop fitting — when you’ve validated the idea and you’re about to put real traffic or real customer data through it — graduate to a paid tier. The paid floor is lower than most people think; the cheapest paid API by use case breaks down where to land, and prompt caching cuts the paid bill further once you’re there.


All free-tier limits in this post were verified on 2026-06-03 against each provider’s own documentation — Google AI, Cerebras, Groq, Mistral, OpenRouter, OpenAI, Cloudflare, and GitHub Models. Free-tier limits and model availability change often — see our methodology for how we re-verify. When in doubt, the provider’s live docs win.

Frequently asked questions

Which LLM API has the best free tier in 2026?

For capability, Google's AI Studio free tier is the strongest — it gives you Gemini 3 Flash at 1,500 requests/day and Gemini 2.5 Flash-Lite at 1,000/day with a 1M-token context window, no credit card. The trade-off is that Google may use free-tier prompts to improve its models. For raw token volume, Cerebras gives 1,000,000 tokens/day free (with an 8K-token context cap). Pick by what you're optimizing: model quality vs throughput.

Do Anthropic (Claude) or OpenAI have a free API tier?

Anthropic has no free API tier — new accounts may get a small one-time trial credit, but there is no $0 ongoing tier. OpenAI offers free daily tokens (up to 1M/day on large models, 10M/day on mini/nano) but only if your organization has a positive paid balance and opts into sharing your API traffic for training. That's a data-for-tokens deal layered on a paid account, not a free tier in the usual sense. If you want to call Claude or GPT models for free, the closest path is GitHub Models (rate-limited free access to GPT and Llama) — not the vendors' own APIs.

Is a free chat app the same as a free API?

No, and conflating them is the most common mistake. ChatGPT's free plan, Claude.ai's free plan, and the Gemini app let a human chat for free in a browser. A free API lets your code make programmatic calls for free. The companies that give away the chat app (OpenAI, Anthropic) charge for the API; the companies with the best free APIs (Google AI Studio, Groq, Cerebras, Mistral) are a different list. This post is only about the API.

Are free tiers safe for production data?

Usually not. Most free tiers reserve the right to train on your inputs — Google's free tier states this explicitly, and OpenRouter's free models depend on whatever the upstream provider allows. Free tiers also have no uptime SLA and tight rate limits that throttle real traffic. Treat them as prototyping and personal-project infrastructure. The moment you're routing customer data or need reliability, move to a paid tier — see the cheapest paid options by use case for where to land.

Do free open-weight models like Llama or DeepSeek count?

Only if you count your own server bill as $0, which it isn't. Llama, DeepSeek, Qwen, Gemma, and GLM publish open weights you can download free — but running them means renting or owning a GPU, which costs more than most free hosted tiers for low volume. The 'free' in open weights is freedom to self-host, not free inference. If you want $0 inference, use a hosted free tier (above); if you want control and privacy, self-host and pay the compute.

What's the catch with free tiers?

Three catches, in order of how often they bite: (1) rate limits — daily request or token caps that make them unusable past prototype scale; (2) your data — many free tiers train on your inputs; (3) no SLA — free traffic is best-effort and gets throttled (429 errors) at peak. None of these make free tiers useless — they make them perfect for learning, prototyping, and low-traffic side projects, and wrong for production.

About the author
Was this useful?
This post mentioned by

Nothing yet. Mention this post on any platform — Mastodon, Bluesky, LinkedIn, a blog — and the citation surfaces here.