NVIDIA and Siemens Healthineers Are Rewriting How Ultrasound Works

Ultrasound has always been a bit of a compromise. It’s safe, real-time, portable, and cheap — all good things. But the images you see on the screen? They’re the result of a reconstruction pipeline that makes some pretty big assumptions. Like that sound travels at the same speed through every part of the body. Which it doesn’t. Not even close.

So what happens when you skip the pipeline and let an AI learn directly from the raw sensor data? That’s the question NVIDIA and Siemens Healthineers set out to answer. The result is NV-Raw2Insights-US, a model that works with raw ultrasound channel data — the actual echoes coming back from the body — instead of the processed image.

What’s Actually Different

Most ultrasound AI today works on finished images. That’s fine for some tasks, but you’ve already thrown away a lot of information by the time you see that image. The raw channel data contains a much richer picture of how sound actually moved through tissue. NV-Raw2Insights-US starts there.

The first application is speed-of-sound estimation. Every patient’s tissue composition is different — fat, muscle, bone, fluid all affect how fast sound travels. Traditional beamforming assumes a fixed value, which introduces focusing errors. This model generates a personalized sound-speed map for each patient and uses it to correct the image in real time. What used to require heavy computation is now a single AI pass.

This is the kind of thing that sounds subtle but matters a lot in practice. Better focusing means clearer boundaries between tissues, less speckle noise, and more confidence in what you’re looking at. For something like breast ultrasound or liver imaging, that’s not just nice to have.

The Hard Part: Getting the Data

Here’s the thing about raw ultrasound channel data — it’s massive. Clinical scanners don’t typically expose it because the bandwidth is insane. So NVIDIA built something called the Holoscan Sensor Bridge (HSB), an open-source FPGA IP that pipes data from the scanner’s DisplayPort output straight to the GPU via RDMA over Ethernet. They’re using an Altera Agilex-7 FPGA dev kit paired with an ACUSON Sequoia scanner.

It’s a clever workaround. Instead of redesigning the scanner, they tap into an existing high-bandwidth output and packetize the data. That gets sent to an NVIDIA IGX system for inference on a Blackwell-class GPU. The sound-speed estimate gets streamed back to the scanner to adjust focus on the live feed.

I like this approach because it doesn’t require new hardware on the scanner side. Software-only integration with existing DisplayPort outputs means this could work with a lot of current systems. That’s pragmatic engineering.

What This Unlocks

NVIDIA is calling this class of models Raw2Insights. The idea is to move from “process ultrasound images” to “understand ultrasound physics per patient.” Once you have raw channel data in GPU memory, you’re not limited to speed-of-sound correction. Modular expansion is built in — new AI models can be dropped in without changing the pipeline.

This is software-defined ultrasound, which means continuous improvement through updates rather than hardware refreshes. That’s a big deal for a field where scanner lifecycles are long and upgrades are expensive.

The Catch

This is still investigational. The paper and model weights are out there (GitHub, Hugging Face, dataset included), but this isn’t FDA-cleared or CE-marked yet. Clinical deployment will take time. Also, not every ultrasound setup has a DisplayPort output that plays nice with this pipeline, though the HSB is open source and designed to be adaptable.

The reliance on NVIDIA’s own hardware stack — IGX, Holoscan, Blackwell — means this is tied to their ecosystem. If you’re not already invested in NVIDIA’s medical computing platform, the barrier to entry is higher than just downloading the model.

My Take

This is one of those projects where the engineering is genuinely interesting, not just the AI. The Data over DisplayPort trick, the FPGA pipeline, the real-time feedback loop — that’s hard stuff. The fact that they open-sourced the sensor bridge and the model weights tells me they want this to actually get used, not just sit in a paper.

The speed-of-sound correction is a solid first application. I’d like to see what else they can pull out of raw channel data. Tissue characterization, motion tracking, maybe even pathology detection directly from the raw signals. That’s where this gets really interesting.

For now, it’s a proof of concept that works. And it’s a better direction than trying to fix ultrasound by slapping AI on top of already-compressed images.