Tencent's 1.25-bit translation model fits on a phone and beats Google Translate

Tencent Hunyuan released Hy-MT1.5-1.8B-1.25bit on 2026-04-29 — a 1.8B-parameter translation model compressed from 3.3GB to 440MB through aggressive 1.25-bit and 2-bit hybrid quantization, capable of real-time inference on mobile chips like Snapdragon 865.

Coverage: 33 languages + 5 dialects + 1,056 translation directions. On FLORES-200, it outperforms Google Translate and matches what the team calls "235B-class large model" quality. The model also won 30 first-place finishes across international machine translation competitions.

On-device implications: Fully offline local inference — translation content never leaves the device, no personal data uploaded. For enterprise customers handling legal, financial, or medical document translation, this dissolves the compliance friction that has long blocked cloud translation APIs in regulated industries.

Tencent's internal deployment: Already powering Yuanbao, Tencent Meeting, WeChat Work, and QQ Browser across background translation, email, and customer service workflows. Weights and code are available on Hugging Face and ModelScope under an open-source license, making it directly embeddable into third-party applications.

For Chinese-speaking developers, this means the translation layer of multilingual apps can finally move off Google Translate / DeepL cloud APIs into self-controlled on-device components. For privacy-sensitive markets (European enterprise, healthcare), it's the first competitive offline option from a mainland-China lab.