DeepInfra Joins Hugging Face Inference Providers – What That Actually Means

Hugging Face just added DeepInfra to its Inference Providers lineup, and honestly, this is one of those moves that makes the platform noticeably more useful for anyone who actually runs models regularly.

DeepInfra isn’t some fly-by-night operation. They’ve been quietly running one of the more cost-effective serverless inference platforms out there, with over 100 models in their catalog. The pricing per token has always been competitive, and they cover everything from LLMs to text-to-image, text-to-video, and embeddings. The catch was always that you had to go to their own dashboard, set up accounts, and manage yet another API key. Now it’s all baked into the Hugging Face Hub directly.

What’s Actually Available Right Now

For this initial integration, DeepInfra is supporting conversational and text-generation tasks. That means you can hit up models like DeepSeek V4, Kimi-K2.6, GLM-5.1, and others straight from the model pages on Hugging Face. They’re promising support for other tasks like image generation, video, and embeddings soon, but for now it’s text-focused. That’s fine – LLMs are where most of the action is anyway.

How It Works

There are two ways to use DeepInfra through Hugging Face, and the difference matters depending on how you want to handle billing.

Option 1: Your own DeepInfra key. You go into your Hugging Face account settings, paste in your DeepInfra API key, and requests go directly to DeepInfra’s servers. You get billed on your DeepInfra account. Nothing changes except you don’t have to leave the Hugging Face interface.

Option 2: Routed through Hugging Face. You don’t need a DeepInfra key at all. You authenticate with your Hugging Face token, and the request gets routed through HF’s infrastructure. They charge you exactly what DeepInfra would charge – no markup. Hugging Face says they might add revenue-sharing agreements with providers later, but for now it’s a straight pass-through. PRO users get $2 worth of inference credits every month that work across providers, which is a nice bonus if you’re already paying for the plan.

The setup is straightforward. In your account settings, you can order providers by preference, and the model pages will show available third-party providers sorted by your choices. It’s clean, it’s not cluttered, and it doesn’t force you into any particular workflow.

SDK and Agent Harness Support

This isn’t just a UI thing. DeepInfra is available through the Hugging Face SDKs – huggingface_hub >= 1.11.2 for Python and @huggingface/inference for JavaScript. The code is minimal:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro:deepinfra",
    messages=[{"role": "user", "content": "Write a Python function that returns the nth Fibonacci number using memoization."}]
)

print(completion.choices[0].message)

Same pattern for JavaScript. The model name format is org/model-name:provider, which is clean enough. You don’t need to change your existing OpenAI-compatible client code – just swap the base URL and model string.

What’s more interesting is that Hugging Face Inference Providers are already integrated into agent harnesses like Pi, OpenCode, Hermes Agents, and OpenClaw. That means you can plug DeepInfra-hosted models into your agent workflows without writing glue code. If you’re building agents that need to switch between providers based on cost or availability, this removes a lot of friction.

My Take

DeepInfra being on Hugging Face is good news, but it’s not a game-changer by itself. What makes it useful is the routing infrastructure Hugging Face has built. Being able to switch between providers from the same SDK, with the same authentication pattern, and compare costs without managing five different dashboards – that’s the real value.

The $2 monthly credit for PRO users is a nice touch, but it’s not going to cover serious usage. It’s more of a “try before you buy” allowance. The free tier for signed-in users is even more limited, so don’t expect to run production workloads on it.

One thing I’d like to see is better documentation on which models are available through DeepInfra specifically. The full list is supposedly linked, but I’ve seen these provider integrations get stale fast when new models drop. Hugging Face and DeepInfra both need to keep the catalog synced or the whole thing loses credibility.

Also, the billing setup is worth paying attention to. If you use the routed mode, you’re trusting Hugging Face to pass through costs accurately. They say there’s no markup, and I believe them for now, but keep an eye on it. The planned revenue-sharing agreements could change the economics later.

What’s Next

DeepInfra is promising broader task support soon – text-to-image, video, embeddings. That’ll make this integration more compelling, especially if they can match the pricing they’re known for. For now, if you’re already using Hugging Face and want to test DeepInfra’s models without leaving the ecosystem, this is a solid addition. If you’re a DeepInfra user who’s been managing things manually, this saves you some clicks.

Give it a spin, set your provider preferences, and see how the routing feels. It’s not revolutionary, but it’s the kind of infrastructure polish that makes the platform better over time.