Nvidia DGX Spark: great hardware, early days for the ecosystem

82 points by GavinAnderegg 6 hours ago

rcarmo 12 minutes ago

About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.

Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.

simonw 2 hours ago

It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.

I'm running VLLM on it now and it was as simple as:

  docker run --gpus all -it --rm \
    --ipc=host --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    nvcr.io/nvidia/vllm:25.09-py3

(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )

And then in the Docker container:

  vllm serve &
  vllm chat

The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.

behnamoh 16 minutes ago

I'm curious, does its architecture support all CUDA features out of the box or is it limited compared to 5090/6000 Blackwell?

jhcuii 16 minutes ago

Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.

reenorap an hour ago

Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?

simonw an hour ago

There are several 70B+ models that are genuinely useful these days.
I'm looking forward to GLM 4.6 Air - I expect that one should be pretty excellent, based on experiments with a quantized version of its predecessor on my Mac. https://simonwillison.net/2025/Jul/29/space-invaders/
behnamoh 14 minutes ago

the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...

two_handfuls 2 hours ago

I wonder how this compares financially with renting something on the cloud.

fnordpiglet 2 hours ago

This seems to be missing the obligatory pelican on a bicycle.

simonw an hour ago

Here's one I made with it - I didn't include it in the blog post because I had so many experiments running that I lost track of which model I'd used to create it! https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...
- fnordpiglet an hour ago
  
  That seat post looks fairly unpleasant.

monster_truck an hour ago

Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.

I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training

rubatuga 16 minutes ago

When the networking is 25GB/s and the memory bandwidth is 210GB/s you know something is seriously wrong.

rgovostes an hour ago

I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.

matt3210 2 hours ago

> even in a Docker container

I should be allowed to do stupid things when I want. Give me an override!

simonw 2 hours ago
A couple of people have since tipped me off that this works around that:
```
  IS_SANDBOX=0 claude --dangerously-skip-permissions
```
You can run that as root and Claude won't complain.

fisian 2 hours ago

The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).

wmf an hour ago

That can't be right because RAM has always been reported in binary units. Only storage and networking use lame decimal units.
- simonw an hour ago
  
  Looks like Claude reported it based on this:
  ● Bash(free -h) ⎿ total used free shared buff/cache available Mem: 119Gi 7.5Gi 100Gi 17Mi 12Gi 112Gi Swap: 0B 0B 0B
  That 119Gi is indeed gibibytes, and 119Gi in GB is 128GB.
simonw 2 hours ago

Ugh, that one gets me every time!

ur-whale 3 hours ago

As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.

kanwisher 3 hours ago

If you think their software is bad try using any other vendor , makes nvidia looks amazing. Apple is only one close
- enoch2090 2 hours ago
  
  Although a bit off the GPU topic, I think Apple's Rosetta is the smoothest binary transition I've ever used.
pjmlp an hour ago

Try to use Intel or AMD stuff instead.
p_l 3 hours ago

And yet CUDA has looked way better than ATi/AMD offerings in the same area despite ATi/AMD technically being first to deliver GPGPU (major difference is that CUDA arrived year later but supported everything from G80 up, and nicely evolved, while AMD managed to have multiple platforms with patchy support and total rewrites in between)
- cylemons 39 minutes ago
  
  What was the AMD GPGPU called?
jasonjmcghee 2 hours ago

Except the performance people are seeing is way below expectations. It seems to be slower than an M4. Which kind of defeats the purpose. It was advertised as 1 Petaflop on your desk.
But maybe this will change? Software issues somehow?
It also runs CUDA, which is useful
- airstrike 2 hours ago
  
  it fits bigger models and you can stack them.
  plus apparently some of the early benchmarks were made with ollama and should be disregarded

ChrisArchitect 4 hours ago

More discussion: https://news.ycombinator.com/item?id=45575127