What is Hugging Face?
Hugging Face Inference Providers is a unified API that gives you access to multiple AI inference providers, including Cerebras, through a single interface. This means you can use familiar Hugging Face tools and SDKs to access Cerebras’s world-class inference speed without changing your existing code structure. Key features include:- Unified API - Use the same code structure across multiple providers
- Simple Integration - Works with OpenAI SDK, Hugging Face Hub client, and standard HTTP requests
- Model Discovery - Browse all available Cerebras models through Hugging Face’s model hub
- Flexible Authentication - Use your Hugging Face token to access Cerebras inference
Prerequisites
Before you begin, ensure you have:- Hugging Face Account - Create a free account at huggingface.co
- Hugging Face API Token - Generate a token at hf.co/settings/tokens
- Python 3.7 or higher - Required for running the Python examples
While you can use Hugging Face tokens for authentication, you can also use your Cerebras API key directly. Get one here.
Getting Started
1
Install the required dependencies
You can use either the Hugging Face Hub client or the OpenAI SDK to access Cerebras through Hugging Face. Install your preferred client:
2
Set up your API token
Create a Alternatively, you can set it as an environment variable:
.env file in your project directory to store your Hugging Face token securely:Your Hugging Face token authenticates you with the Inference Providers API, which then routes your requests to Cerebras’s infrastructure.
3
Make your first inference request
Now you’re ready to make your first request to Cerebras through Hugging Face. Here’s how to use the chat completion endpoint with different clients:
When using the OpenAI SDK, append
:cerebras to the model name to specify Cerebras as the provider. With the Hugging Face Hub client, set provider="cerebras" instead.4
Try streaming responses
For real-time applications, you can stream responses token-by-token as they’re generated. This is especially useful for chat interfaces and interactive applications where you want to display results as they arrive:
Available Cerebras Models
View all available models on the Hugging Face model hub.Advanced Usage
Using Custom Parameters
You can customize your requests with additional parameters supported by the Cerebras API to control response generation:Error Handling
Implement proper error handling to manage API errors gracefully in production applications:Using Environment Variables
For better security and configuration management, load environment variables using python-dotenv:Next Steps
- Explore the Hugging Face Inference Providers documentation for more advanced features
- Browse community Cerebras models on Hugging Face
- Learn about Chat Completion parameters in our API reference
- Try different Cerebras models to find the best fit for your use case
FAQ
What additional latency can I expect when using Cerebras through Hugging Face?
What additional latency can I expect when using Cerebras through Hugging Face?
Since inference is routed through Hugging Face’s proxy, users may experience slightly higher latency compared to calling Cerebras Cloud directly. The overhead is typically minimal (10-50ms), but for applications requiring the absolute lowest latency, consider using the Cerebras API directly.However, Hugging Face Inference Providers offers benefits like:
- Unified API across multiple providers
- Simplified authentication with Hugging Face tokens
- Integration with Hugging Face’s ecosystem and tools
- Easy provider switching without code changes
Why do I see 'Wrong API Format' when running the Hugging Face test code?
Why do I see 'Wrong API Format' when running the Hugging Face test code?
The official Hugging Face inference example uses a multimodal input call, which is not currently supported by Cerebras.Cerebras currently supports:
- Text-based chat completions
- Standard message formats with role and content
- Streaming responses
- Common parameters (temperature, top_p, max_tokens, etc.)
How do I specify which Cerebras model to use?
How do I specify which Cerebras model to use?
When using the Hugging Face Hub client, use the full model name from Hugging Face:When using the OpenAI SDK through Hugging Face router, append You can find all available models at huggingface.co/models?inference_provider=cerebras.
:cerebras to specify the provider:Why am I getting authentication errors?
Why am I getting authentication errors?
Make sure you’re using a valid Hugging Face token with the correct permissions. You can generate a new token at hf.co/settings/tokens. The token should have at least read access.If you’re using environment variables, ensure they’re properly loaded:Common issues:
- Token not set in environment variables
- Token has expired or been revoked
- Token doesn’t have necessary permissions
- Typo in token value
Can I use Hugging Face Inference Providers for production applications?
Can I use Hugging Face Inference Providers for production applications?
Yes! Hugging Face Inference Providers is production-ready and used by many applications. However, consider these factors:
- Latency: The routing layer adds minimal overhead, but direct API calls to Cerebras will be slightly faster
- Rate Limits: Check Hugging Face’s rate limits for your account tier
- Monitoring: Implement proper logging and error handling for production use
- Reliability: Both Hugging Face and Cerebras maintain high uptime SLAs
- Costs: Review pricing for both Hugging Face and Cerebras services
What's the difference between using the Hugging Face Hub client and OpenAI SDK?
What's the difference between using the Hugging Face Hub client and OpenAI SDK?
Both clients work well with Cerebras through Hugging Face, but there are some differences:Hugging Face Hub Client:
- Native integration with Hugging Face ecosystem
- Set provider explicitly with
provider="cerebras" - Use standard Hugging Face model names
- Better integration with Hugging Face datasets and tools
- Familiar interface if you’re already using OpenAI
- Append
:cerebrasto model names - Easy migration from OpenAI to Cerebras
- Compatible with OpenAI-style tooling

