TODAY

Zhipu opens up: GLM-5's 'cognitive decay' was a KV-cache race, not the model

Zhipu's GLM-5, called billions of times daily by its Coding Agent, was producing garbled output, repetition loops, and obscure characters — users called it 'the model getting dumber.' A postmortem points to two systems-level bugs: a KV-cache race under their PD-disaggregated architecture, and missing load-ordering in HiCache. Patches dropped the failure rate from ~10-per-10K to under 3-per-10K. The interesting signal isn't the bug itself; it's Zhipu publishing the postmortem — at this scale, the bottleneck stops being Scaling Laws and becomes systems engineering.

Published: 2026-05-04

Sources

量子位 — 智谱公布「降智」的秘密zh-CN