DICTIONARY

Chinese AI dictionary

Plain-language Chinese explanations of transformer, RAG, agent, fine-tuning, context window, prompt, and other AI technical terms — covering architecture, technique, metrics, companies, people, model families, and tasks.

CategoryAll Architecture Technique Metric Company Person Model family Task Misc

Llama (family)

Model family

Meta's open-weight LLM family — Llama 1, 2, 3, 4 — the foundational open-source model line that the modern self-hostable AI ecosystem is built on.

LoRA (Low-Rank Adaptation)

Technique

A fine-tuning technique that adapts large models by training small low-rank matrices instead of updating all the original weights.

LSTM (Long Short-Term Memory)

Architecture

A type of recurrent neural network designed to remember information over long sequences, widely used before Transformers took over.

Machine translation

Task

Automatically converting text from one language to another — historically dominated by phrase-based and neural systems, now overwhelmingly handled by LLMs.

Meta AI / FAIR

Company

Meta's AI research division — home of Yann LeCun, creator of the Llama open-source model family and PyTorch.

Mistral (family)

Model family

Mistral AI's model family — Mistral 7B, Mixtral 8x7B/8x22B (sparse MoE), Mistral Large, and Codestral — Europe's flagship LLM line, mixing open and commercial releases.

Mistral AI

Company

A Paris-based AI lab known for efficient open-weight European models — Mistral 7B, Mixtral 8x7B, and the commercial Mistral Large.

Mixture of Experts (MoE)

Architecture

A neural network architecture that splits the model into many specialized "expert" sub-networks and routes each input to only a few of them, giving huge parameter counts at a fraction of the compute cost.

MMLU (Massive Multitask Language Understanding)

MMLU

Metric

A widely-cited benchmark of 57 multiple-choice subjects (high-school to professional level) used to measure an LLM's broad knowledge — accuracy in % is the headline number.

Moonshot AI

Company

A Beijing AI startup founded by Yang Zhilin in 2023, creator of the Kimi chat assistant — known for very long context (200k Chinese characters) and strong consumer adoption.

Multi-head attention

Architecture

A Transformer mechanism that runs several attention operations in parallel, letting the model focus on different relationships in the input at the same time.

Multi-modal

Misc

An AI system that can process or generate multiple types of input/output — text plus images, audio, video — instead of just one modality.

Named entity recognition (NER)

Task

Identifying and classifying named entities — people, organizations, locations, dates, products — in unstructured text.

OpenAI

Company

The AI lab behind ChatGPT, GPT-4, and the o-series reasoning models — founded in 2015, now the most prominent commercial AI company.

Perplexity

Metric

A metric measuring how surprised a language model is by the actual next token — lower is better. The exponentiated average negative log-likelihood.

Prompt engineering

Technique

The craft of writing prompts that consistently get useful, accurate output from an LLM — covering structure, examples, role framing, and constraints.

Prompt injection

Technique

An attack where untrusted input (a document, web page, email) contains hidden instructions that override or hijack the LLM's intended behavior.

QLoRA

Technique

A fine-tuning technique that combines 4-bit quantization with LoRA, letting you fine-tune large models on a single consumer GPU.

Quantization

Technique

Compressing a model by storing its weights in lower precision (8-bit, 4-bit, even 2-bit) instead of 16- or 32-bit floats, dramatically cutting memory and speeding up inference.

Question answering (QA)

Question answering

Task

Producing direct answers to user questions — either from the model's parametric knowledge (closed-book) or by retrieving from documents (open-book / RAG).

Qwen (family) / 通義千問

Qwen (family)

Model family

Alibaba's Qwen open-source LLM family — Qwen 1, 1.5, 2, 2.5, 3 — the most-downloaded Chinese open-weight model line on Hugging Face, with rapid release cadence.

ReAct (Reason + Act)

Technique

An agent loop where the model alternates between writing a reasoning step ("Thought") and choosing a tool to call ("Action"), using the result before reasoning again.

Recurrent Neural Network (RNN)

Architecture

A neural network architecture that processes sequences one step at a time, passing a hidden state forward to remember earlier inputs.

Retrieval-Augmented Generation (RAG)

Technique

A technique that lets an LLM look up relevant documents at query time and use them to ground its answer, reducing hallucinations.

RLHF (Reinforcement Learning from Human Feedback)

Technique

A training technique that uses human preference judgments to teach language models which responses are helpful, honest, and safe.

ROUGE

Metric

A family of metrics for summarization quality based on n-gram overlap between generated summary and human reference — ROUGE-1, ROUGE-2, and ROUGE-L are the common variants.

Sam Altman

Person

CEO of OpenAI, former president of Y Combinator — the most public face of the AI industry, responsible for OpenAI's commercial strategy and high-profile public communication.

Scaling laws

Misc

Empirical observations that LLM performance improves predictably as you increase model size, training data, and compute — fitted as power-law curves.

Self-Attention

Architecture

A mechanism that lets each token in a sequence look at every other token and decide which ones matter most — the core operation inside Transformers.

Sentiment analysis

Task

Classifying text by emotional tone — positive, negative, neutral, or finer-grained emotion labels — used heavily in customer reviews, social media monitoring, and market research.

Speculative decoding

Technique

An inference speed-up where a small "draft" model proposes several tokens and a large model verifies them in parallel — making LLM generation 2-3× faster with no quality loss.

Speech-to-text (STT/ASR)

Task

Converting spoken audio into text — also called Automatic Speech Recognition (ASR). The most-used model is OpenAI's Whisper.

Stable Diffusion (family)

Model family

Stability AI's open-weight image generation diffusion model family — SD 1.5, SDXL, SD3, SD 3.5 — the foundation of the open-source AI art ecosystem.

State-Space Model (Mamba)

Architecture

A sequence model architecture that processes tokens through a compressed hidden state, offering linear-time scaling as an alternative to Transformer attention.

Summarization

Task

The task of compressing a long input — article, transcript, document — into a shorter version that preserves the key information.

SuperCLUE

Metric

A comprehensive Chinese LLM benchmark suite covering reasoning, knowledge, language, code, and safety — published as a regularly-updated leaderboard.

Supervised fine-tuning (SFT)

Technique

Fine-tuning a base LLM on a dataset of (input, ideal-output) pairs so it learns to produce that style of response — the first step of post-training.

System prompt

Misc

A special instruction at the start of an LLM conversation that sets the model's role, tone, behavior rules, and constraints for the rest of the session.

Temperature

Temperature (sampling)

Misc

A sampling parameter that controls randomness in LLM output — 0 is deterministic and "safe", higher values make output more diverse but more error-prone.

Text generation

Task

The core LLM task of producing free-form text in response to a prompt — covers chat, writing, completion, and any output that is itself natural language.

Text-to-speech (TTS)

Task

Converting written text into spoken audio — modern neural TTS systems (ElevenLabs, OpenAI TTS, Google) produce near-human-quality voices that can clone, emote, and speak many languages.

Tokenization

Technique

The process of splitting raw text into tokens — the units (sub-words, words, or characters) that an LLM actually processes.

Tool use / Function calling

Tool use / function calling

Technique

An LLM capability where the model decides to call external functions (search, code, APIs) and uses the results to produce its final answer.

Top-k sampling

Misc

A sampling method that restricts each next-token choice to the k highest-probability tokens — simpler but less adaptive than top-p.

Top-p (nucleus) sampling

Top-p / nucleus sampling

Misc

A sampling method that picks each next token from the smallest set whose cumulative probability is ≥ p — adaptive to how confident the model is.

Transformer

Architecture

A neural network architecture introduced by Google in 2017 that uses self-attention to process sequences in parallel — the foundation of modern LLMs like GPT and Claude.

Variational Autoencoder (VAE)

Architecture

A generative neural network that learns to compress data into a probabilistic latent space, then sample from it to generate new examples.

Vector database

Technique

A database optimized for storing high-dimensional vectors (embeddings) and finding the nearest neighbors of a query vector efficiently.