Pricing, LLM API pricing guide

Understand input tokens, output tokens, cache pricing, batch discounts, context windows, and route differences before choosing an LLM API.

7 min2026-05-13
Conservative reading frame

LLM API pricing looks simple until input, output, cache, batch, context, and provider route details start moving in different directions. Use this as a conservative decision framework: compare models with the same workload assumptions, treat unknown prices as unavailable, and verify production choices against the provider source before committing budget.

Compare with the same token assumptions

Do not treat unknown prices as zero

Verify the official source before production

Start with workload shape

A chatbot, coding agent, summarizer, and RAG pipeline produce different input/output ratios. Compare models with your own monthly request count and expected answer length before optimizing for the lowest headline price.

  • Estimate monthly requests, average input tokens, and average output tokens separately.
  • Keep a small, medium, and high usage scenario so one optimistic estimate does not drive the decision.
  • Use representative prompts instead of generic token guesses when the workload is already known.

Separate input and output cost

Many models charge more for output tokens than input tokens. Long-answer workflows can become expensive even when the input price looks low.

  • Summarization and coding often create longer outputs than routing or classification.
  • If answers are capped in the product UI, model the cap instead of the ideal answer length.
  • Compare total scenario cost, not only the cheaper side of the token table.

Check cache, batch, and context

Prompt caching and batch APIs can reduce cost, but they depend on provider support and workload timing. Context windows also matter because requests above the model limit may fail or require chunking.

  • Cache assumptions should be conservative unless repeated prefixes are measured.
  • Batch discounts are useful for offline jobs, but they may not fit real-time product flows.
  • Long context can simplify architecture, while shorter context plus retrieval can sometimes be cheaper.

Keep routes separate

Direct provider APIs and aggregator routes can expose the same underlying model with different price records, limits, availability, or terms. Compare the route you will actually use.

  • Do not merge direct API and third-party route prices into one number.
  • Check whether the route supports the exact capability your workload needs.
  • Review source confidence and last-updated metadata before using an estimate in planning.
Skip to main content