Google's open-source diffusion language model generates 256 tokens in parallel and self-corrects, hitting 4x speed on one GPU ...
Latency, bandwidth, and fidelity all matter when you’re chasing milliseconds.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results