Cerebras Inference on HuggingFace
Learn how to use Cerebras Inference on HuggingFace.
This guide will walk you step-by-step through using the Hugging Face InferenceClient to run inference on Cerebras hardware. Hugging Face acts as our “pay-as-you-go“ provider. We currently offer Llama 3.3 models and Llama 4 Scout through Hugging Face.
Currently, we support the Chat Completion endpoint via the Hugging Face Python client. To get started, follow the steps below.
Install the Hugging Face Hub client
Create a new Hugging Face API key
Next, you’ll need to create a new Hugging Face API key. You’ll use this key to authenticate with Hugging Face and access the Cerebras provider.
- Go to hf.co/settings/tokens
- Click “New token”
- Give it a name and copy your API key
Make an API call
Here’s an example using InferenceClient to query Llama 3.3-70B on Cerebras.
Be sure to replace "hf_your_api_key_here"
with your actual API key.
Differences Between Cerebras Cloud and Hugging Face
Cerebras Cloud is primarily intended for free tier users and high-throughput startups that need a dedicated plan to handle their inference. Hugging Face acts as our “pay-as-you-go“ provider. We currently offer Llama 3.3 models and Llama 4 Scout through Hugging Face.
DeepSeek r1-distilled-70b can only be accessed on Cerebras Cloud with a paid plan.
FAQ
Was this page helpful?