DeepSeek V4 dropped last week, but the technical conversation isn't about what it shipped — it's about what it didn't.
Mid-2025, DeepSeek and Peking University co-open-sourced Engram: a knowledge-lookup module bolted onto the transformer that pulls static facts (historical dates, API signatures, theorems) out of model weights and into a writable memory pool. "Don't compute what you can look up." At every forward layer, the model dynamically decides whether to take the retrieval path. The numbers were sharp: Engram-27B beat its same-size baseline by +3.4 on MMLU, +4.0 on CMMLU, +5.0 on BBH, +3.0 on HumanEval, +2.4 on MATH; Multi-Query NIAH leaped from 84.2% to 97.0% — meaning targeted retrieval over long context approaches its ceiling.
V4's technical report mentioned none of this. The community first read it as a quiet shelving, then learned it was a planned trade-off — V4 prioritized general reasoning while Engram was still scaling from single-shard to multi-host CXL memory pools.
Three follow-up papers fill the gap: CXL memory pooling sustains 512 GB/s bandwidth with under 5% end-to-end throughput loss; conflict-free hot-layer experiments resolved lock contention from concurrent queries hitting hot entries; vision Tiny Engram extended the concept to image-patch retrieval. Stacked together, V4.5 is the natural place to fold Engram into the mainline — which is why the community calls "left out" a regret rather than a failure: this was timing, not design.