1
Install the Hugging Face Hub client
2
Create a new Hugging Face API key
Next, you’ll need to create a new Hugging Face API key. You’ll use this key to authenticate with Hugging Face and access the Cerebras provider.
- Go to hf.co/settings/tokens
- Click “New token”
- Give it a name and copy your API key
3
Make an API call
Here’s an example using InferenceClient to query Llama 3.3-70B on Cerebras.Be sure to replace
"hf_your_api_key_here"
with your actual API key.Differences Between Cerebras Cloud and Hugging Face
Cerebras Cloud is primarily intended for free tier users and high-throughput startups that need a dedicated plan to handle their inference. Hugging Face acts as our “pay-as-you-go“ provider. We currently offer Llama 3.3 models and Llama 4 Scout through Hugging Face.FAQ
What additional latency can I expect when using Cerebras through Hugging Face?
What additional latency can I expect when using Cerebras through Hugging Face?
Since inference is routed through Hugging Face’s proxy, users may experience slightly higher latency compared to calling Cerebras Cloud directly.
Why do I see “Wrong API Format“ when running the Hugging Face test code?
Why do I see “Wrong API Format“ when running the Hugging Face test code?
The official Hugging Face inference example uses a multimodal input call, which is not currently supported by Cerebras. To avoid this error, use the code provided in Step 3 of the tutorial above.