June 25, 2026 · AI Speeding up LLM inference with MTP and diffusion MTP and diffusion inference on Gemma 4 and Qwen 3.6, fp8 on one H100 #LLM #vLLM #MTP #Speculative Decoding