What is Helicone?
Helicone is an open-source observability platform for LLM applications that provides logging, monitoring, and analytics for your AI API calls. With Helicone, you can track usage, debug issues, analyze costs, and optimize performance across all your Cerebras Inference requests. Learn more at https://www.helicone.ai/ Key features include:- Request Logging - Automatically log all API requests and responses
- Cost Tracking - Monitor spending across models and users
- Performance Analytics - Analyze latency, token usage, and throughput
- Custom Properties - Tag requests for filtering and analysis
- User Tracking - Monitor usage by user or session
- Caching - Reduce costs with semantic caching
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Helicone Account - Visit Helicone and create a free account.
- Helicone API Key - After signing up, generate an API key from your Helicone dashboard.
- Python 3.11 or higher (for Python examples)
Configure Helicone
Install required dependencies
Install the OpenAI Python SDK, which is compatible with Cerebras Inference:The
python-dotenv package helps manage your API keys securely through environment variables.Configure environment variables
Create a The Helicone API key enables authentication and links your requests to your Helicone account for monitoring and analytics. Keep these keys secure and never commit them to version control.
.env file in your project directory with your API keys:Initialize the client with Helicone
Set up the OpenAI client to route requests through Helicone’s proxy. This configuration automatically logs all your Cerebras API calls to Helicone without requiring any code changes to your existing application logic:
The
base_url points to Helicone’s Cerebras proxy endpoint (https://cerebras.helicone.ai/v1), which forwards requests to Cerebras while capturing metrics. The Helicone-Auth header authenticates your requests with Helicone.Make your first request
Now you can make API calls as usual. Helicone will automatically log the request, response, latency, and token usage:After running this code, visit your Helicone dashboard to see the logged request with full details including prompt, response, latency, and token counts.
View your logs in Helicone
Navigate to your Helicone dashboard to view detailed logs of your requests. You’ll see:
- Complete request and response data
- Token usage and cost breakdowns
- Latency metrics and performance trends
- Custom properties and user tracking data
- Error logs and debugging information
Advanced Features
Custom Properties
Add custom metadata to your requests for better filtering and analysis in the Helicone dashboard. Custom properties help you segment your data by environment, feature, user cohort, or any other dimension relevant to your application:- Filter requests by environment (development, staging, production)
- Track usage per user or session
- Analyze performance across different features or segments
- Create custom dashboards and reports
User Tracking
Track requests by user to monitor individual usage patterns, costs, and behavior. User tracking helps you understand how different users interact with your AI application and identify power users or potential issues:Helicone-User-Id header associates requests with specific users, enabling per-user analytics and cost tracking in your dashboard.
Caching
Enable semantic caching to reduce costs and latency for similar requests. Helicone’s intelligent caching can match semantically similar queries even if they’re not identical, significantly reducing API costs for common questions:Helicone’s semantic cache can match similar queries even if they’re not identical, significantly reducing API costs for common questions. Configure cache settings and time-to-live (TTL) in your Helicone dashboard.
Streaming Responses
Helicone fully supports streaming responses from Cerebras, logging complete metrics once the stream completes. Streaming is ideal for real-time applications where you want to display responses as they’re generated:Request Tagging and Feedback
Tag requests and add feedback scores to track quality and performance over time. This is particularly useful for evaluating model outputs and identifying areas for improvement:Monitoring Your Usage
After making requests, you can leverage Helicone’s comprehensive dashboard to gain insights into your AI application:- View Request Logs - See all requests with timestamps, models, prompts, and responses in the Helicone dashboard
- Analyze Costs - Track spending across different models, users, and time periods with detailed cost breakdowns
- Monitor Performance - Visualize latency trends, identify slow requests, and optimize your application’s responsiveness
- Filter by Properties - Use custom properties to segment your analytics by environment, feature, user cohort, or any custom dimension
- Set Up Alerts - Configure notifications for usage thresholds, error rates, or cost limits to stay informed
- Export Data - Download logs and analytics for further analysis or compliance requirements
Frequently Asked Questions
How does Helicone affect request latency?
How does Helicone affect request latency?
Helicone adds minimal latency (typically 10-50ms) to your requests. The proxy architecture is optimized for performance, and the observability benefits far outweigh the small latency overhead. For latency-critical applications, you can use Helicone’s async logging mode.
Can I use Helicone with multiple model providers?
Can I use Helicone with multiple model providers?
Yes! Helicone supports multiple LLM providers including OpenAI, Anthropic, Azure, and now Cerebras. You can monitor all your AI API calls in a single dashboard, making it easy to compare performance and costs across providers.
Is my data secure with Helicone?
Is my data secure with Helicone?
Helicone takes security seriously. All data is encrypted in transit and at rest. You can also self-host Helicone for complete control over your data. Review Helicone’s security documentation for details.
What happens if Helicone is down?
What happens if Helicone is down?
Helicone is designed with high availability, but if the proxy is unavailable, your requests will fail. For production applications, consider implementing fallback logic or using Helicone’s async logging mode, which doesn’t block your requests.
How much does Helicone cost?
How much does Helicone cost?
Helicone offers a generous free tier for development and small-scale production use. For higher volumes, check the Helicone pricing page for current plans and pricing.
Can I filter out sensitive data from logs?
Can I filter out sensitive data from logs?
Yes! Helicone provides data redaction features to filter sensitive information from your logs. You can configure redaction rules in your dashboard to automatically remove PII, API keys, or other sensitive data before it’s stored.
Next Steps
Now that you have Helicone set up with Cerebras, explore these resources to get the most out of your integration:- Explore the Helicone documentation for advanced features and best practices
- Try different Cerebras models to compare performance and find the best fit for your use case
- Set up custom properties for better analytics and segmentation
- Enable caching to reduce costs and improve response times
- Configure alerts for proactive usage monitoring and cost management
- Review Cerebras documentation for model-specific guidance
Troubleshooting
Requests Not Appearing in Dashboard
Issue: API calls succeed but don’t show up in Helicone dashboard. Solution:- Verify your
Helicone-Authheader includes the correct API key with theBearerprefix - Check that you’re using the correct base URL:
https://cerebras.helicone.ai/v1 - Ensure your Helicone API key is active in your account settings
- Wait a few seconds - there may be a slight delay in log processing (typically under 10 seconds)
- Check your browser’s network tab to confirm requests are reaching Helicone’s proxy
Authentication Errors
Issue: Receiving 401 or 403 errors when making requests. Solution:- Confirm your Cerebras API key is valid and active in your Cerebras dashboard
- Verify the
Helicone-Authheader format:Bearer YOUR_API_KEY(note the space after “Bearer”) - Check that both API keys are correctly loaded from environment variables using
os.getenv()orprocess.env - Regenerate your Helicone API key if needed from the dashboard
- Ensure there are no extra spaces or newline characters in your API keys
Missing Custom Properties
Issue: Custom properties not appearing in Helicone dashboard. Solution:- Ensure property headers use the
Helicone-Property-prefix (e.g.,Helicone-Property-Environment) - Property names are case-sensitive - use consistent casing throughout your application
- Use
extra_headersparameter in Python or the headers option in JavaScript when making requests - Check the custom properties documentation for supported formats and naming conventions
- Verify properties appear in the request logs before filtering by them in the dashboard
Cache Not Working
Issue: Requests not being cached as expected. Solution:- Verify
Helicone-Cache-Enabledheader is set to"true"(as a string, not boolean) - Caching works best with deterministic queries - ensure your prompts are consistent
- Check your cache settings and TTL configuration in the Helicone dashboard
- Review the caching documentation for configuration options and best practices
- Note that streaming requests and requests with high temperature values may not be cached effectively
High Latency Issues
Issue: Requests are slower than expected when using Helicone. Solution:- Helicone typically adds 10-50ms of latency - if you’re seeing more, check your network connection
- Consider using Helicone’s async logging mode for latency-critical applications
- Monitor the Helicone status page for any service disruptions
- Compare latency with and without Helicone to isolate the issue
- Contact Helicone support if latency remains consistently high
Need More Help?
If you continue to experience issues:- Check the Helicone documentation for detailed guides
- Join the Helicone Discord community for community support
- Contact Helicone support for technical assistance
- Review Cerebras documentation for model-specific guidance

