Large Language Model (LLM) Cost Calculator
Free Large Language Model (LLM) API cost calculator powered by OpenRouter. Compare GPT-5.4, Claude 4.6, Gemini 3.1, and DeepSeek V3.2. Includes cache hit discounts and context-based tiered pricing computations.
4
About Large Language Model (LLM) Cost Calculator
How to Use the Large Language Model (LLM) API Cost Calculator
Pricing data is synchronized from OpenRouter. Compare API costs for the latest flagship models like GPT-5.4, Claude 4.6, Gemini 3.1, and DeepSeek V3.2 to precisely forecast your monthly expenses.
Core Calculation Rules
- Prompt Caching Discounts (Cache Hit Ratio): Modern models support context caching. Use the slider (0-100%) to simulate cache efficiency. The calculator automatically blends the highly discounted "Cache Read" price for the matched percentage with the base input price for the remainder.
- Tiered Pricing: Some providers (e.g., Google, Xiaomi) dynamically scale pricing based on context length. If your combined Input + Output tokens exceed a specific threshold (e.g., 128K), the calculator automatically switches to and highlights the applicable higher-tier pricing.
Understanding Base Token Costs
- Input Tokens: All data sent to the model (system prompts, context, RAG docs) costs money.
- Output Tokens: Text generated by the model. This is typically significantly more expensive than input tokens.
- 1M Token Rule: All prices are calculated per 1 million (1M) tokens. For reference, 1M tokens is roughly 750,000 English words.
Tips for Reducing Costs
- Dynamic Model Routing: Use smaller, cheaper models (like Gemini Flash or Claude Haiku) for simple classification, and route only complex reasoning tasks to flagships.
- Leverage Prompt Caching: Always reuse identical system structures and long document contexts across requests to unlock 50-90% caching discounts.
- Structured Context: Strip out unnecessary filler words, HTML tags, or formatting in your prompts to strictly minimize raw sequence length.