NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

📅 May 18, 2026  |  🌐 Source: https://www.marktechpost.com/2026/05/18/nvidia-introduces-a-4-bit-pretraining-methodology-using-nvfp4-validated-on-a-12b-hybrid-mamba-transformer-at-10t-token-horizon/  |  🏷️ Technology

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon — via https://www.marktechpost.com/2026/05/18/nvidia-introduces-a-4-bit-pretraining-methodology-using-nvfp4-validated-on-a-12b-hybrid-mamba-transformer-at-10t-token-horizon/

Pretraining frontier-scale LLMs in FP8 is now standard practice, but moving to 4-bit floating point has remained an open research problem because narrower formats compress dynamic range and amplify quantization error at long token horizons. A new research from NVIDIA describes a pretraining methodology built around NVFP4, a 4-bit microscaling format supported natively by Blackwell Tensor Cores, and validates it by pretraining a 12-billion-parameter hybrid Mamba-Transformer on 10 trillion tokens.

What You Need To Know

The research team state this is the longest publicly documented training run in 4-bit precision to date. The resulting model attains 62. 58% on MMLU-Pro 5-shot versus 62.

🔑 Key Highlights

  • A new research from NVIDIA describes a pretraining methodology built around NVFP4, a 4-bit microscaling format supported natively by Blackwell Tensor Cores, and validates it by pretraining a 12-billion-parameter hybrid Mamba-Transformer on 10 trillion tokens.
  • The research team state this is the longest publicly documented training run in 4-bit precision to date.
  • The resulting model attains 62.
  • 58% on MMLU-Pro 5-shot versus 62.

More Details

62% for the FP8 baseline, and is supported in NVIDIA’s Transformer Engine. What NVFP4 Actually is To understand why NVFP4 is important, it helps to revisit how microscaling formats work. In a microscaling (MX) format, a contiguous block of low-precision elements shares a single scale factor, which is used to map the block back into a wider numerical range during the matrix multiply. MXFP4 uses 32-element blocks where each element is stored as E2M1 — 1 sign bit, 2 exponent bits, 1 mantissa bit — encoding only the values ±0, ±0. 5, ±1, ±1.

This is a developing story. Follow the source for live updates and more in-depth coverage as details continue to emerge.


⚡ This article was auto-curated from https://www.marktechpost.com/2026/05/18/nvidia-introduces-a-4-bit-pretraining-methodology-using-nvfp4-validated-on-a-12b-hybrid-mamba-transformer-at-10t-token-horizon/. All rights and credits belong to the original publisher. This blog aggregates tech news for informational purposes only.

Comments