
Performance
We ran performance tests on release day firmware and an updated Ollama version to see how Ollama performs.
The tests were run using the latest NVIDIA DGX Spark firmware (580.95.05) and Ollama v0.12.6.
Each test is performed:
- 10 times
- Temperature set to 0
- Constrained to 500 tokens output
- Prompt: “write an in-depth summary of this story: $(head -n200 pg98.txt)” (please see the test script for the book, “A Tale of Two Cities”)
- Caching is disabled so repeated tests will not be faster
The test script and its readme are made available and can be customized for your own testing.
| NVIDIA DGX Spark | gpt-oss | 20B | MXFP4 | 3.224k | 58.27 |
| NVIDIA DGX Spark | gpt-oss | 120B | MXFP4 | 1.169k | 41.14 |
| NVIDIA DGX Spark | gemma3 | 12B | q4_K_M | 1.894k | 24.25 |
| NVIDIA DGX Spark | gemma3 | 12B | q8_0 | 1.406k | 15.46 |
| NVIDIA DGX Spark | gemma3 | 27B | q4_K_M | 834.1 | 10.83 |
| NVIDIA DGX Spark | gemma3 | 27B | q8_0 | 585.4 | 7.210 |
| NVIDIA DGX Spark | llama3.1 | 8B | q4_K_M | 7.614k | 38.02 |
| NVIDIA DGX Spark | llama3.1 | 8B | q8_0 | 6.110k | 25.23 |
| NVIDIA DGX Spark | llama3.1 | 70B | q4_K_M | 1.911k | 4.423 |
| NVIDIA DGX Spark | deepseek-r1 | 14B | q4_K_M | 5.919k | 19.99 |
| NVIDIA DGX Spark | deepseek-r1 | 14B | q8_0 | 4.667k | 13.32 |
| NVIDIA DGX Spark | qwen3 | 32B | q4_K_M | 705.0 | 9.411 |
| NVIDIA DGX Spark | qwen3 | 32B | q8_0 | 487.2 | 6.240 |
*OpenAI’s gpt-oss models are tested using models officially provided by OpenAI, distributed via Ollama. Some GGUFs distributed online labeled as MXFP4 are further quantized to q8_0 in the attention layers. The same layers are BF16 on Ollama as intended by OpenAI.
NVIDIA Firmware update
If you are using a DGX Spark firmware version below 580.95.05, it is recommended to use the DGX Dashboard to perform updates.
If you want to upgrade via the CLI, you will need to upgrade both the Ubuntu distribution as well as the firmware. Use the following commands:
sudo apt update sudo apt dist-upgrade sudo fwupdmgr refresh sudo fwupdmgr upgrade sudo rebootGet started with Ollama
Install Ollama:
curl -fsSL https://ollama.com/install.sh | shThen run a model:
ollama run gpt-ossCoding with Codex & Ollama
OpenAI’s Codex and Ollama work seamlessly together.
Install OpenAI’s Codex:
npm install -g @openai/codexOnce Codex is installed, use:
codex --oss --model gpt-ossThe DGX Spark also supports the larger gpt-oss-120b model, fitting the entire model into the 120GB of VRAM provided by the GB10 Grace Blackwell Superchip:
codex --oss --model gpt-oss:120b.png)
