Cerebras Inference on Hugging Face

This guide will walk you step-by-step through using the Hugging Face InferenceClient to run inference on Cerebras hardware. Hugging Face acts as our “pay-as-you-go“ provider. We currently offer Llama 3.3 models and Llama 4 Scout through Hugging Face. Currently, we support the Chat Completion endpoint via the Hugging Face Python client. To get started, follow the steps below.

Install the Hugging Face Hub client

pip install huggingface_hub --upgrade

Create a new Hugging Face API key

Next, you’ll need to create a new Hugging Face API key. You’ll use this key to authenticate with Hugging Face and access the Cerebras provider.

Go to hf.co/settings/tokens
Click “New token”
Give it a name and copy your API key

Make an API call

Here’s an example using InferenceClient to query Llama 3.3-70B on Cerebras.Be sure to replace "hf_your_api_key_here" with your actual API key.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="cerebras",
    api_key="hf_your_api_key_here",
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=500,
)

print(completion.choices[0].message.content)

Differences Between Cerebras Cloud and Hugging Face

Cerebras Cloud is primarily intended for free tier users and high-throughput startups that need a dedicated plan to handle their inference. Hugging Face acts as our “pay-as-you-go“ provider. We currently offer Llama 3.3 models and Llama 4 Scout through Hugging Face. DeepSeek r1-distilled-70b can only be accessed on Cerebras Cloud with a paid plan.

FAQ

What context length can I run?

What additional latency can I expect when using Cerebras through Hugging Face?

Why do I see “Wrong API Format“ when running the Hugging Face test code?

Get Started

Capabilities

Integrations

Support

Cerebras Inference on Hugging Face

Differences Between Cerebras Cloud and Hugging Face

FAQ

Get Started

Capabilities

Integrations

Support

​Differences Between Cerebras Cloud and Hugging Face

​FAQ

Differences Between Cerebras Cloud and Hugging Face

FAQ