MODELS
Yi-Lightning
Cheap, fast Chinese-first chat model from 01.AI, tuned for high-throughput production use.
Specs
- Context window
- 16,000
- Max output
- 4,096
- Modalities
- text
- Tool use
- ✓
- Vision
- —
- Streaming
- ✓
- License
- proprietary
- Released
- 2024-10-17
Pricing
- Input / 1M
- $0.14
- Output / 1M
- $0.14
Cost estimate
Yi-Lightning is 01.AI's flagship low-latency model, priced at $0.14 per million tokens for both input and output. It supports tool calling and streaming but is text-only with a relatively small 16K context window. The model performs well on Chinese benchmarks (it briefly ranked in LMSYS Arena's top tier in late 2024) and is aimed at production chat, classification, and routing workloads where cost-per-call matters more than long-context reasoning.
Editor's verdict
Pick Yi-Lightning if you need a cheap, fast Chinese-capable model for high-volume tasks and Qwen or DeepSeek don't fit your stack. The 16K context is the real limitation — it rules out long-document RAG and longer agent traces, where Qwen-Plus or DeepSeek-V3 give you far more headroom at similar prices. No vision and no open weights either, so it's a narrow but legitimate choice for short-form Chinese inference at scale.
Reviews
No reviews yet. Be the first.
Last updated: 2026-04-29