DICTIONARY
Chinese AI dictionary
Plain-language Chinese explanations of transformer, RAG, agent, fine-tuning, context window, prompt, and other AI technical terms — covering architecture, technique, metrics, companies, people, model families, and tasks.
Llama (family)
Model family
Meta's open-weight LLM family — Llama 1, 2, 3, 4 — the foundational open-source model line that the modern self-hostable AI ecosystem is built on.
LoRA (Low-Rank Adaptation)
Technique
A fine-tuning technique that adapts large models by training small low-rank matrices instead of updating all the original weights.
LSTM (Long Short-Term Memory)
Architecture
A type of recurrent neural network designed to remember information over long sequences, widely used before Transformers took over.
Machine translation
Task
Automatically converting text from one language to another — historically dominated by phrase-based and neural systems, now overwhelmingly handled by LLMs.
Meta AI / FAIR
Company
Meta's AI research division — home of Yann LeCun, creator of the Llama open-source model family and PyTorch.
Mistral (family)
Model family
Mistral AI's model family — Mistral 7B, Mixtral 8x7B/8x22B (sparse MoE), Mistral Large, and Codestral — Europe's flagship LLM line, mixing open and commercial releases.
Mistral AI
Company
A Paris-based AI lab known for efficient open-weight European models — Mistral 7B, Mixtral 8x7B, and the commercial Mistral Large.
Mixture of Experts (MoE)
Architecture
A neural network architecture that splits the model into many specialized "expert" sub-networks and routes each input to only a few of them, giving huge parameter counts at a fraction of the compute cost.
MMLU (Massive Multitask Language Understanding)
MMLUMetric
A widely-cited benchmark of 57 multiple-choice subjects (high-school to professional level) used to measure an LLM's broad knowledge — accuracy in % is the headline number.
Moonshot AI
Company
A Beijing AI startup founded by Yang Zhilin in 2023, creator of the Kimi chat assistant — known for very long context (200k Chinese characters) and strong consumer adoption.
Multi-head attention
Architecture
A Transformer mechanism that runs several attention operations in parallel, letting the model focus on different relationships in the input at the same time.
Multi-modal
Misc
An AI system that can process or generate multiple types of input/output — text plus images, audio, video — instead of just one modality.
Named entity recognition (NER)
Task
Identifying and classifying named entities — people, organizations, locations, dates, products — in unstructured text.
OpenAI
Company
The AI lab behind ChatGPT, GPT-4, and the o-series reasoning models — founded in 2015, now the most prominent commercial AI company.
Perplexity
Metric
A metric measuring how surprised a language model is by the actual next token — lower is better. The exponentiated average negative log-likelihood.
Prompt engineering
Technique
The craft of writing prompts that consistently get useful, accurate output from an LLM — covering structure, examples, role framing, and constraints.
Prompt injection
Technique
An attack where untrusted input (a document, web page, email) contains hidden instructions that override or hijack the LLM's intended behavior.
QLoRA
Technique
A fine-tuning technique that combines 4-bit quantization with LoRA, letting you fine-tune large models on a single consumer GPU.
Quantization
Technique
Compressing a model by storing its weights in lower precision (8-bit, 4-bit, even 2-bit) instead of 16- or 32-bit floats, dramatically cutting memory and speeding up inference.
Question answering (QA)
Question answeringTask
Producing direct answers to user questions — either from the model's parametric knowledge (closed-book) or by retrieving from documents (open-book / RAG).
Qwen (family) / 通義千問
Qwen (family)Model family
Alibaba's Qwen open-source LLM family — Qwen 1, 1.5, 2, 2.5, 3 — the most-downloaded Chinese open-weight model line on Hugging Face, with rapid release cadence.
ReAct (Reason + Act)
Technique
An agent loop where the model alternates between writing a reasoning step ("Thought") and choosing a tool to call ("Action"), using the result before reasoning again.
Recurrent Neural Network (RNN)
Architecture
A neural network architecture that processes sequences one step at a time, passing a hidden state forward to remember earlier inputs.
Retrieval-Augmented Generation (RAG)
Technique
A technique that lets an LLM look up relevant documents at query time and use them to ground its answer, reducing hallucinations.
RLHF (Reinforcement Learning from Human Feedback)
Technique
A training technique that uses human preference judgments to teach language models which responses are helpful, honest, and safe.
ROUGE
Metric
A family of metrics for summarization quality based on n-gram overlap between generated summary and human reference — ROUGE-1, ROUGE-2, and ROUGE-L are the common variants.
Sam Altman
Person
CEO of OpenAI, former president of Y Combinator — the most public face of the AI industry, responsible for OpenAI's commercial strategy and high-profile public communication.
Scaling laws
Misc
Empirical observations that LLM performance improves predictably as you increase model size, training data, and compute — fitted as power-law curves.
Self-Attention
Architecture
A mechanism that lets each token in a sequence look at every other token and decide which ones matter most — the core operation inside Transformers.
Sentiment analysis
Task
Classifying text by emotional tone — positive, negative, neutral, or finer-grained emotion labels — used heavily in customer reviews, social media monitoring, and market research.
Speculative decoding
Technique
An inference speed-up where a small "draft" model proposes several tokens and a large model verifies them in parallel — making LLM generation 2-3× faster with no quality loss.
Speech-to-text (STT/ASR)
Task
Converting spoken audio into text — also called Automatic Speech Recognition (ASR). The most-used model is OpenAI's Whisper.
Stable Diffusion (family)
Model family
Stability AI's open-weight image generation diffusion model family — SD 1.5, SDXL, SD3, SD 3.5 — the foundation of the open-source AI art ecosystem.
State-Space Model (Mamba)
Architecture
A sequence model architecture that processes tokens through a compressed hidden state, offering linear-time scaling as an alternative to Transformer attention.
Summarization
Task
The task of compressing a long input — article, transcript, document — into a shorter version that preserves the key information.
SuperCLUE
Metric
A comprehensive Chinese LLM benchmark suite covering reasoning, knowledge, language, code, and safety — published as a regularly-updated leaderboard.
Supervised fine-tuning (SFT)
Technique
Fine-tuning a base LLM on a dataset of (input, ideal-output) pairs so it learns to produce that style of response — the first step of post-training.
System prompt
Misc
A special instruction at the start of an LLM conversation that sets the model's role, tone, behavior rules, and constraints for the rest of the session.
Temperature
Temperature (sampling)Misc
A sampling parameter that controls randomness in LLM output — 0 is deterministic and "safe", higher values make output more diverse but more error-prone.
Text generation
Task
The core LLM task of producing free-form text in response to a prompt — covers chat, writing, completion, and any output that is itself natural language.
Text-to-speech (TTS)
Task
Converting written text into spoken audio — modern neural TTS systems (ElevenLabs, OpenAI TTS, Google) produce near-human-quality voices that can clone, emote, and speak many languages.
Tokenization
Technique
The process of splitting raw text into tokens — the units (sub-words, words, or characters) that an LLM actually processes.
Tool use / Function calling
Tool use / function callingTechnique
An LLM capability where the model decides to call external functions (search, code, APIs) and uses the results to produce its final answer.
Top-k sampling
Misc
A sampling method that restricts each next-token choice to the k highest-probability tokens — simpler but less adaptive than top-p.
Top-p (nucleus) sampling
Top-p / nucleus samplingMisc
A sampling method that picks each next token from the smallest set whose cumulative probability is ≥ p — adaptive to how confident the model is.
Transformer
Architecture
A neural network architecture introduced by Google in 2017 that uses self-attention to process sequences in parallel — the foundation of modern LLMs like GPT and Claude.
Variational Autoencoder (VAE)
Architecture
A generative neural network that learns to compress data into a probabilistic latent space, then sample from it to generate new examples.
Vector database
Technique
A database optimized for storing high-dimensional vectors (embeddings) and finding the nearest neighbors of a query vector efficiently.