Vultr Achieves NVIDIA Exemplar Cloud for Surpassing AI Training Performance Targets

At Vultr, affordability does not come at the cost of performance – a critical aspect of modern AI workloads as deployments become increasingly complex. In that light, we’re pleased to announce that we've achieved NVIDIA Exemplar Cloud validation by surpassing key AI training performance standards on NVIDIA HGX™ B200 systems.

The NVIDIA Exemplar Cloud initiative entails a series of AI training workload tests using a 512 Blackwell GPUcluster on 11 models, including:

Nemotron-H, Nemotron-4 15B, Nemotron-4 340B
Grok-1 314B
Llama 3.1 8B, Llama 3.1 70B, Llama 3.1 405B
Qwen3 30B, Qwen3 235B
DeepSeek-v3 671B, DeepSeek-v3-TorchTitan 671B

Our latest testing across various Large Language Models (LLMs) demonstrates a consistent reduction in latency as precision moves from BF16 to FP8 and NVFP4. On average, lower-precision formats provided significant throughput improvements without compromising model selection. Some highlights from our testing include:

High-parameter models (300B+)

Nemotron4 340B: Switching from BF16 to FP8 resulted in a 34.8% reduction in time-to-train, dropping from 2381.47ms to 1552.70ms (Avg. step time)

DeepSeek V3: Demonstrated a modest but stable gain of ~7% when moving to FP8 (15.92s). (Avg. step time)

Llama 3.1 405B: The introduction of NVFP4 showed a massive efficiency leap, outperforming FP8 by 28.4% (4.32s vs. 6.03s).

Mid-to-large models (70B - 235B)

Llama 3.1 70B: Consistent with the 405B variant, NVFP4 offered a significant advantage, reducing time-to-train by 35% compared to FP8.

Efficient/small parameter models

Grok1: Transitioning to FP8 yielded a 31% speedup, bringing time-to-train from 8.33s to 5.75s.

Nemotron4 15B: Maintained the fastest overall footprint, sub-1 second in both formats, with FP8 providing a 12% performance boost.

BF16, FP8, and NVFP4 refer to different numerical precision formats used in AI training and inference. Moving from BF16 to lower-precision formats like FP8 and NVFP4 reduces memory usage and increases throughput, enabling faster performance, though it requires advanced scaling techniques to maintain model accuracy. Our benchmarking highlights how Vultr’s architecture consistently delivers these efficiency gains across models.

Performance numbers are a starting point. What matters operationally is workload TCO: cost per token trained, GPU-hours per production run, power per useful output.

The precision improvements above translate directly. A 35% step-time reduction on Llama 3.1 70B with NVFP4 is a 35% reduction in GPU-hours per training run. Fewer resources, faster time to production. Power follows the same curve: lower-precision formats reduce memory bandwidth pressure, increasing effective utilization of the Blackwell GPU's compute capacity and driving more tokens processed per watt. At the scale of real AI deployments with continuous fine-tuning cycles, parallel training runs, and multi-tenant workloads, these gains determine whether an infrastructure budget closes.

Vultr's platform is built to carry those gains into production. Full support for the NVIDIA software stack, including NeMo and the precision formats driving these results, means the performance benchmarked here is the performance deployed. No overhead tax between benchmark and production.

The NVIDIA Exemplar Cloud initiative was created to improve performance per TCO for all cloud providers with hardware and software recipes, references, tools, and capabilities that solve infrastructure challenges across key metrics, including workload performance, security, and reliability. The initiative leverages NVIDIA Performance Benchmarking recipes to establish standardized benchmarks for AI workload performance across the cloud provider ecosystem.

Our NVIDIA Exemplar Cloud achievement provides additional validation of our leadership in cloud-native infrastructure designed to power the next generation of AI applications.

Get started

Want to experience the exceptional performance of NVIDIA GPUs on Vultr? Explore our cloud GPU lineup or contact us.

High-parameter models (300B+)

Mid-to-large models (70B - 235B)

Efficient/small parameter models

Get started

Tech Talks

Loading...

Vultr Docs

Loading...

Products

Features

Solutions

Marketplace

Resources

Company

Tech Talks

Vultr Docs