Ultra-Fast Language Generation
via Discrete Diffusion Divergence Instruct
Published:
1Purdue University 2University of Texas at Austin 3University of Texas at El Paso 4National University of Singapore
5hi-Lab, Xiaohongshu Inc 6ML Research, Morgan Stanley
We unlock high-quality language generation in the blink of an eye with DiDi-Instruct.
🚀 Feel the Ultra-Fast Generation Speed:
DiDi-Instruct (64x) vs. MDMs (2x) vs. ARMs
Contributions
DiDi-Instruct distills a few-step generator from a masked discrete diffusion language model, achieving up to 64× speed-ups with comparable or superior quality to its teacher and GPT-2 baselines.

Abstract
Fast and high-quality language generation is the holy grail pursued in the age of AI. In this work we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained masked discrete diffusion language model and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and a GPT-2 baseline while enabling up to 64× acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage and inference quality. On OpenWebText, DiDi-Instruct achieves perplexities ranging from 62.2 (8 NFEs) to 18.4 (128 NFEs) and reduces additional training wall-clock time by more than 20× compared to competing dLLM distillation methods. We validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling and generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye.
BibTeX
If you find this useful, please cite:
@article{zheng2025ultra,
title={{Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct}},
author={Zheng, Haoyang and Liu, Xinyang and Kong, Cindy Xiangrui and Jiang, Nan and Hu, Zheyuan and Luo, Weijian and Deng, Wei and Lin, Guang},
journal={arXiv preprint arXiv:2509.25035},
year={2025}
}