DICTIONARY
Chinese AI dictionary
Plain-language Chinese explanations of transformer, RAG, agent, fine-tuning, context window, prompt, and other AI technical terms — covering architecture, technique, metrics, companies, people, model families, and tasks.
BLEU
Metric
An automatic metric for machine translation quality, comparing n-gram overlap between the model output and one or more reference translations.
C-Eval
Metric
A Chinese-language counterpart to MMLU — about 14,000 multiple-choice questions across 52 subjects in Chinese, covering everything from middle school to professional certification level.
CMMLU
Metric
Another Chinese MMLU-style benchmark covering 67 subjects with about 12,000 multiple-choice questions, with stronger coverage of China-specific knowledge than C-Eval.
HumanEval
Metric
OpenAI's coding benchmark of 164 hand-written Python problems where models are scored by whether their generated code passes hidden unit tests (pass@k).
MMLU (Massive Multitask Language Understanding)
MMLUMetric
A widely-cited benchmark of 57 multiple-choice subjects (high-school to professional level) used to measure an LLM's broad knowledge — accuracy in % is the headline number.
Perplexity
Metric
A metric measuring how surprised a language model is by the actual next token — lower is better. The exponentiated average negative log-likelihood.
ROUGE
Metric
A family of metrics for summarization quality based on n-gram overlap between generated summary and human reference — ROUGE-1, ROUGE-2, and ROUGE-L are the common variants.
SuperCLUE
Metric
A comprehensive Chinese LLM benchmark suite covering reasoning, knowledge, language, code, and safety — published as a regularly-updated leaderboard.
8 total