What is a token, in LLM-speak? And why it matters for your bill

Every LLM API charges by the token. Every context window is sized in tokens. Every "why is my bill so high" debugging session ends in tokens. Yet most people use LLMs daily without ever seeing one. Here's what they actually are.

A token is roughly a word piece

When an LLM reads text, the first thing it does is split the text into tokens using a tokenizer. The tokenizer is a fixed lookup table — typically 50,000 to 200,000 entries — built once when the model is trained, and used identically forever.

For English, a token is usually:

A common word ("the", "computer", "running") — one token each
A common subword piece ("-ing", "un-", "ization") — one token
A single character or punctuation, for rare or out-of-vocabulary tokens

A rough rule: 1 token ≈ 0.75 English words. So 1,000 tokens is about 750 words, or two pages of typical English prose.

For Chinese, Japanese, and Korean (CJK) it's different. Most modern tokenizers (Claude's, GPT's, Gemini's) treat each Chinese character as 1 to 2 tokens. So 1,000 Chinese characters ≈ 1,500-2,000 tokens. Chinese is more expensive per character than English.

For code, tokens are smaller. A typical line of Python is 5-15 tokens.

You can see the actual tokenization

If you want to ground this in reality, use a tokenizer playground:

OpenAI: tiktokenizer.vercel.app or platform.openai.com/tokenizer
Anthropic: count_tokens API
Hugging Face: any tokenizer can be loaded with AutoTokenizer.from_pretrained(...)

Paste your text and see exactly which chunks the model sees. Things that surprise people:

"strawberry" is one token in GPT's vocabulary. That's why GPT historically miscounted the rs.
Numbers split unpredictably — "1234" might be one token, but "12345" might split into "123" + "45".
A trailing space matters. "hello" and " hello" (leading space) are different tokens.
Whitespace before code structure ({, },, etc) often gets its own token, which is why code costs more than the character count would suggest.

Why it matters for cost

API pricing is per million tokens, separately for input and output:

Input tokens are everything you send: system prompt + conversation history + your message. Cheaper.
Output tokens are what the model generates back. More expensive (usually 3-5×).

In 2026 typical pricing per million tokens:

Claude Sonnet: ~$3 input, ~$15 output
GPT-5: comparable, varies by tier
Gemini 2.5 Pro: cheaper, ~$1-3 input
DeepSeek V3: very cheap, often <$1 input
Open-weight Llama 70B self-hosted: pay per GPU-hour, not per token

This means prompt length is your bill. A 50K-token system prompt on every request can torpedo your unit economics.

Mitigations: prompt caching (5-10× cheaper for repeated prefixes), shorter system prompts, RAG instead of dumping the whole knowledge base, and using cheaper models for easy queries ("LLM routing").

Why it matters for context windows

If the model has a 200K context window, the limit is in tokens, not pages. So:

A 100-page English document (~50K words ~ 65K tokens) easily fits.
A 100-page Chinese document (~50K characters ~ 75K-100K tokens) might just fit.
A 100-page Python codebase varies wildly depending on density.

When designing an app, plan a token budget. Don't think "the document is 5MB" — count the tokens.

Why it matters for output speed

Models generate text one token at a time. Speed is measured in tokens per second (tps).

A frontier API like Claude Sonnet does ~50-80 tps for output
Smaller models (Haiku, Flash) hit 100-300 tps
Self-hosted with vLLM or TensorRT-LLM you can push 500+ tps for 7B models

If the user wants a 1,000-token response and your model does 50 tps, that's 20 seconds wall-clock time. For chat UIs, this is why streaming matters — show tokens as they arrive, not after the whole response is done.

A quick estimation cheat sheet

Keep these in your head:

1 page of English ≈ 500 words ≈ 650 tokens
1 page of Chinese ≈ 500 chars ≈ 750-1000 tokens
1 typical user chat message ≈ 50-200 tokens
1 system prompt for a serious app ≈ 500-3000 tokens
1 RAG retrieval (5 chunks of ~300 tokens) ≈ 1500 tokens
1 long PDF report ≈ 10K-50K tokens

Multiply by your monthly request volume to get a rough cost forecast before you ship anything serious.

When NOT to obsess over token counts

If you're using a chat product (ChatGPT, Claude.ai) on a flat monthly plan, you don't pay per token — you pay $20/month and the product handles fair-use limits. Counting tokens is mostly an API/builder concern.

For casual chat use, the time you'd spend optimizing prompt length is worth more than the token savings.