Documentation Index
Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
What is Hugging Face Inference Providers?
Hugging Face Inference Providers is a unified API that gives you access to multiple AI inference providers, including Cerebras, through a single interface. This means you can use familiar Hugging Face tools and SDKs to access Cerebras’s world-class inference speed without changing your existing code structure. Key features include:- Unified API - Use the same code structure across multiple providers
- Simple Integration - Works with OpenAI SDK, Hugging Face Hub client, and standard HTTP requests
- Model Discovery - Browse all available Cerebras models through Hugging Face’s model hub
- Flexible Authentication - Use your Hugging Face token to access Cerebras inference
Prerequisites
Before you begin, ensure you have:- Hugging Face Account - Create a free account at huggingface.co
- Hugging Face API Token - Generate a token at hf.co/settings/tokens
- Python 3.11 or higher - Required for running the Python examples
Getting Started
Install the required dependencies
Set up your API token
.env file in your project directory to store your Hugging Face token securely:Make your first inference request
:cerebras to the model name to specify Cerebras as the provider. With the Hugging Face Hub client, set provider="cerebras" instead.Available Cerebras Models
You can use any of the following Cerebras models through Hugging Face Inference Providers:- llama3.1-8b: Fastest option for simple tasks and high-throughput scenarios
- zai-org/GLM-4.7: Largest model for the most demanding tasks
- zai-glm-4.7: Advanced 357B parameter model with strong reasoning capabilities
Advanced Usage
Using Custom Parameters
You can customize your requests with additional parameters supported by the Cerebras API to control response generation:Error Handling
Implement proper error handling to manage API errors gracefully in production applications:Using Environment Variables
For better security and configuration management, load environment variables using python-dotenv:Next Steps
- Explore the Hugging Face Inference Providers documentation for more advanced features
- Browse Cerebras models on Hugging Face
- Learn about Chat Completion parameters in our API reference
- Try different Cerebras models to find the best fit for your use case
- Check out Hugging Face’s guide on building AI apps
- Explore structured outputs with LLMs for JSON generation
- Migrate to GLM4.7: Ready to upgrade? Follow our migration guide to start using our latest model
FAQ
What additional latency can I expect when using Cerebras through Hugging Face?
What additional latency can I expect when using Cerebras through Hugging Face?
- Unified API across multiple providers
- Simplified authentication with Hugging Face tokens
- Integration with Hugging Face’s ecosystem and tools
- Easy provider switching without code changes
Why do I see 'Wrong API Format' when running the Hugging Face test code?
Why do I see 'Wrong API Format' when running the Hugging Face test code?
- Text-based chat completions
- Standard message formats with role and content
- Streaming responses
- Common parameters (temperature, top_p, max_tokens, etc.)
How do I specify which Cerebras model to use?
How do I specify which Cerebras model to use?
provider="cerebras", use the model name without the provider suffix::cerebras to specify the provider:Why am I getting authentication errors?
Why am I getting authentication errors?
- Token not set in environment variables
- Token has expired or been revoked
- Token doesn’t have necessary permissions
- Typo in token value
Can I use Hugging Face Inference Providers for production applications?
Can I use Hugging Face Inference Providers for production applications?
- Latency: The routing layer adds minimal overhead, but direct API calls to Cerebras will be slightly faster
- Rate Limits: Check Hugging Face’s rate limits for your account tier
- Monitoring: Implement proper logging and error handling for production use
- Reliability: Both Hugging Face and Cerebras maintain high uptime SLAs
- Costs: Review pricing for both Hugging Face and Cerebras services
What's the difference between using the Hugging Face Hub client and OpenAI SDK?
What's the difference between using the Hugging Face Hub client and OpenAI SDK?
- Native integration with Hugging Face ecosystem
- Set provider explicitly with
provider="cerebras" - Use standard Hugging Face model names
- Better integration with Hugging Face datasets and tools
- Familiar interface if you’re already using OpenAI
- Append
:cerebrasto model names - Easy migration from OpenAI to Cerebras
- Compatible with OpenAI-style tooling

