What is TrueFoundry AI Gateway?
TrueFoundry AI Gateway is a unified API gateway that provides observability, cost tracking, rate limiting, and access control for AI model inference. By routing your Cerebras requests through TrueFoundry, you gain comprehensive visibility into your AI operations while maintaining centralized control over access and spending. Key benefits include:- Comprehensive Observability - Track all API calls, latencies, and errors in one place with detailed request logging
- Cost Management - Monitor and control spending across models and teams with real-time cost tracking
- Access Control - Manage API keys and permissions centrally with role-based access
- Rate Limiting - Protect your applications from unexpected usage spikes with configurable limits
- Analytics Dashboard - Visualize usage patterns and performance metrics across your organization
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here
- TrueFoundry Account - Visit TrueFoundry and create an account or log in
- Python 3.12 or higher - For running the code examples
Configure TrueFoundry AI Gateway
Navigate to Cerebras in AI Gateway
From the TrueFoundry dashboard, navigate to the Cerebras models section:
This opens the Cerebras configuration panel where you’ll add your account and models.
- Go to AI Gateway > Models > Cerebras

Add your Cerebras account
Click Add Cerebras Account to configure your Cerebras API credentials:

- Click the Add Cerebras Account button
- Enter your Account Name (e.g., “Production” or “Development”)
- Enter your Cerebras API Key from the prerequisites step
- Optionally add Collaborators who should have access
- Click Save
You can configure multiple Cerebras accounts with different access controls. This is useful for separating production and development environments or managing different teams. See Access Control for more details.
Add Cerebras models
Click + Add Model to add Cerebras models to your gateway. Unlike other providers, you need to get the Model ID directly from the Cerebras documentation.To add a model:
- Click + Add Model
- Enter the Model ID exactly as shown in the Cerebras Models documentation
- Configure any model-specific settings like rate limits or access controls
- Click Save to activate the model
Get your TrueFoundry API credentials
After configuring your models, TrueFoundry will provide you with gateway credentials. These credentials authenticate your application to the TrueFoundry gateway, which then routes requests to Cerebras.Find your credentials in the AI Gateway settings:
- Navigate to AI Gateway > API Credentials
- Copy your Gateway Base URL (e.g.,
https://gateway.truefoundry.ai) - Copy your Gateway API Key (a JWT token)
Install required dependencies
Install the OpenAI Python SDK, which is compatible with Cerebras through TrueFoundry’s OpenAI-compatible API:The
python-dotenv package helps manage environment variables securely.Configure environment variables
Create a Replace the placeholder values with your actual credentials from Step 4.
.env file in your project directory to store your TrueFoundry credentials securely:Initialize the client
Set up the OpenAI client to route requests through TrueFoundry’s gateway. The gateway intercepts your requests, captures observability data, and forwards them to Cerebras.The
base_url parameter routes all requests through TrueFoundry’s gateway, which then forwards them to Cerebras while collecting metrics and logs.Make your first request
Now you can make requests to Cerebras models through TrueFoundry. The gateway will automatically track this request in your analytics dashboard.
Use the format
cbrs/MODEL_NAME when specifying models through TrueFoundry (e.g., cbrs/gpt-oss-120b). This prefix tells the gateway to route the request to your configured Cerebras account.Advanced Features
Streaming Responses
TrueFoundry supports streaming responses from Cerebras models, allowing you to process tokens as they’re generated. This is ideal for building responsive chat interfaces or processing long-form content.Custom Metadata and Tagging
Add custom metadata to your requests for better tracking and analytics. TrueFoundry captures these headers and makes them available in your analytics dashboard for filtering and analysis.Rate Limiting and Budget Controls
TrueFoundry allows you to set rate limits and budget controls directly in the dashboard to prevent unexpected costs and manage usage across your organization. To configure rate limits:- Navigate to AI Gateway > Rate Limiting
- Configure limits per user, API key, or model
- Set daily or monthly budget caps to prevent unexpected costs
- Configure alerts to notify you when limits are approached
Virtual Models
Create virtual models that combine multiple Cerebras models with custom routing logic, fallback strategies, and load balancing. This allows you to optimize for cost, performance, or availability. Learn more in the TrueFoundry Virtual Models documentation.Monitoring and Analytics
View Request Logs
Access detailed logs for all requests through the TrueFoundry dashboard. Each log entry includes the full request and response payload, latency metrics, token usage, and any custom metadata you’ve added. To view logs:- Go to AI Gateway > Observability > Request Logging
- Filter by model, user, time range, or custom metadata
- View request and response payloads, latencies, and errors
- Export logs for further analysis or compliance requirements
Cost Tracking
Monitor your Cerebras spending in real-time with TrueFoundry’s cost tracking dashboard. View costs broken down by model, user, team, or any custom dimension you’ve configured. To access cost tracking:- Navigate to AI Gateway > Cost Tracking
- View costs broken down by model, user, or time period
- Set up alerts for budget thresholds
- Export cost reports for billing or analysis
Analytics Dashboard
Visualize usage patterns and performance metrics across your organization with TrueFoundry’s analytics dashboard. Track key metrics like request volume, latency percentiles, error rates, and token usage. Key metrics available:- Request Volume - Total requests over time, broken down by model
- Latency - P50, P95, and P99 latency percentiles
- Error Rates - Track errors by type and model
- Token Usage - Monitor input and output tokens across models
- Cost Trends - Visualize spending patterns over time
Export Metrics
TrueFoundry supports exporting metrics to external monitoring tools for integration with your existing observability stack. Supported export formats:- OpenTelemetry - Export traces and metrics to your observability platform
- Prometheus - Scrape metrics for custom dashboards
- Grafana - Visualize performance and cost data
Troubleshooting
Authentication Errors
Authentication Errors
If you receive authentication errors when making requests:Check your TrueFoundry API key:
- Verify the key is correct in your
.envfile - Ensure there are no extra spaces or newlines
- Confirm the key hasn’t been revoked in the TrueFoundry dashboard
- Go to AI Gateway > Models > Cerebras
- Ensure your Cerebras account is properly configured
- Check that your Cerebras API key is valid and active
- Verify your TrueFoundry account has access to the Cerebras models you’re trying to use
- Ensure you’re using the correct account if you have multiple configured
Model Not Found Errors
Model Not Found Errors
If you see “model not found” errors:Verify model configuration:
- Check that you’ve added the specific Cerebras model in the TrueFoundry dashboard
- Go to AI Gateway > Models > Cerebras and confirm the model is listed
- Use the format
cbrs/MODEL_NAME(e.g.,cbrs/gpt-oss-120b) - Ensure the Model ID matches exactly what’s in the Cerebras documentation
cbrs/gpt-oss-120bcbrs/qwen-3-32bcbrs/llama3.1-8b
Rate Limit Errors
Rate Limit Errors
If you’re hitting rate limits:Check your rate limit configuration:
- Navigate to AI Gateway > Rate Limiting
- Review your current limits and usage
- Adjust limits for your use case or upgrade your TrueFoundry plan
- Add exponential backoff in your application code
- Use the
Retry-Afterheader to determine when to retry - Consider implementing request queuing for high-volume applications
- Batch requests where possible
- Use caching for repeated queries
- Consider using smaller models for simpler tasks
High Latency Issues
High Latency Issues
If you’re experiencing higher latency than expected:Check gateway region:
- Ensure the TrueFoundry gateway region is close to your application
- Contact TrueFoundry support to discuss multi-region deployment options
- Go to AI Gateway > Observability > Request Logging
- Identify bottlenecks in request processing
- Check for network issues or timeouts
- Use streaming for long responses to reduce perceived latency
- Consider using TrueFoundry’s caching features for repeated queries
- Reduce
max_tokensif you’re generating unnecessarily long responses
- Set up alerts for latency thresholds
- Track P95 and P99 latencies in the analytics dashboard
- Compare latency across different models to find the best fit
Missing Metrics or Logs
Missing Metrics or Logs
If you’re not seeing expected metrics or logs in the dashboard:Verify integration header:
- Ensure you’re including the
X-Cerebras-3rd-Party-Integration: Foundryheader - Check that the header is properly formatted with no typos
- Review your TrueFoundry plan’s data retention policy
- Older logs may have been archived or deleted
- Ensure custom headers are properly formatted (e.g.,
X-TrueFoundry-User-ID) - Check that metadata is being sent with each request
- If issues persist, contact TrueFoundry support with example request IDs
Next Steps
Explore Advanced Features
Discover TrueFoundry’s full capabilities including virtual models, guardrails, and advanced routing
Set Up Rate Limiting
Configure rate limits and budget controls to manage costs and prevent unexpected usage
Configure Custom Metadata
Add custom metadata to requests for better tracking and analytics
Try Different Models
Explore Cerebras models to find the best fit for your use case
Set Up Alerting
Configure alerts for cost thresholds, error rates, and performance issues
Enable Caching
Reduce costs and latency by caching repeated queries

