Get Started with Helicone

What is Helicone?

Helicone is an open-source observability platform for LLM applications that provides logging, monitoring, and analytics for your AI API calls. With Helicone, you can track usage, debug issues, analyze costs, and optimize performance across all your Cerebras Inference requests. Learn more at https://www.helicone.ai/ Key features include:

Request Logging - Automatically log all API requests and responses
Cost Tracking - Monitor spending across models and users
Performance Analytics - Analyze latency, token usage, and throughput
Custom Properties - Tag requests for filtering and analysis
User Tracking - Monitor usage by user or session
Caching - Reduce costs with semantic caching

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here.
Helicone Account - Visit Helicone and create a free account.
Helicone API Key - After signing up, generate an API key from your Helicone dashboard.
Python 3.11 or higher (for Python examples)

Configure Helicone

Install required dependencies

Install the OpenAI Python SDK, which is compatible with Cerebras Inference:

pip install openai python-dotenv

The python-dotenv package helps manage your API keys securely through environment variables.

Configure environment variables

Create a .env file in your project directory with your API keys:

CEREBRAS_API_KEY=your-cerebras-api-key-here
HELICONE_API_KEY=your-helicone-api-key-here

The Helicone API key enables authentication and links your requests to your Helicone account for monitoring and analytics. Keep these keys secure and never commit them to version control.

Initialize the client with Helicone

Set up the OpenAI client to route requests through Helicone’s proxy. This configuration automatically logs all your Cerebras API calls to Helicone without requiring any code changes to your existing application logic:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

The base_url points to Helicone’s Cerebras proxy endpoint (https://cerebras.helicone.ai/v1), which forwards requests to Cerebras while capturing metrics. The Helicone-Auth header authenticates your requests with Helicone.

Make your first request

Now you can make API calls as usual. Helicone will automatically log the request, response, latency, and token usage:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what observability means in AI applications."}
    ],
    max_tokens=150,
    temperature=0.7
)

print(response.choices[0].message.content)

After running this code, visit your Helicone dashboard to see the logged request with full details including prompt, response, latency, and token counts.

View your logs in Helicone

Navigate to your Helicone dashboard to view detailed logs of your requests. You’ll see:

Complete request and response data
Token usage and cost breakdowns
Latency metrics and performance trends
Custom properties and user tracking data
Error logs and debugging information

The dashboard provides powerful filtering and search capabilities to help you analyze your AI application’s behavior and optimize performance.

Advanced Features

Custom Properties

Add custom metadata to your requests for better filtering and analysis in the Helicone dashboard. Custom properties help you segment your data by environment, feature, user cohort, or any other dimension relevant to your application:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    extra_headers={
        "Helicone-Property-Environment": "production",
        "Helicone-Property-User-Id": "user-123",
        "Helicone-Property-Session-Id": "session-456",
        "Helicone-Property-Feature": "chat-assistant"
    }
)

These custom properties allow you to:

Filter requests by environment (development, staging, production)
Track usage per user or session
Analyze performance across different features or segments
Create custom dashboards and reports

Learn more about custom properties in the Helicone documentation.

User Tracking

Track requests by user to monitor individual usage patterns, costs, and behavior. User tracking helps you understand how different users interact with your AI application and identify power users or potential issues:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[
        {"role": "user", "content": "Summarize this article..."}
    ],
    extra_headers={
        "Helicone-User-Id": "[email protected]"
    }
)

The Helicone-User-Id header associates requests with specific users, enabling per-user analytics and cost tracking in your dashboard.

Caching

Enable semantic caching to reduce costs and latency for similar requests. Helicone’s intelligent caching can match semantically similar queries even if they’re not identical, significantly reducing API costs for common questions:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="llama3.1-8b",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_headers={
        "Helicone-Cache-Enabled": "true"
    }
)

Helicone’s semantic cache can match similar queries even if they’re not identical, significantly reducing API costs for common questions. Configure cache settings and time-to-live (TTL) in your Helicone dashboard.

Explore advanced caching options in the Helicone caching documentation.

Streaming Responses

Helicone fully supports streaming responses from Cerebras, logging complete metrics once the stream completes. Streaming is ideal for real-time applications where you want to display responses as they’re generated:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Write a short story about AI."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Streaming requests are logged with complete metrics including total tokens, latency, and cost once the stream completes.

Request Tagging and Feedback

Tag requests and add feedback scores to track quality and performance over time. This is particularly useful for evaluating model outputs and identifying areas for improvement:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ],
    extra_headers={
        "Helicone-Request-Id": "req-123",
        "Helicone-Property-Prompt-Version": "v2.1"
    }
)

# Later, add feedback via Helicone API
# This helps track which responses were helpful

Monitoring Your Usage

After making requests, you can leverage Helicone’s comprehensive dashboard to gain insights into your AI application:

View Request Logs - See all requests with timestamps, models, prompts, and responses in the Helicone dashboard
Analyze Costs - Track spending across different models, users, and time periods with detailed cost breakdowns
Monitor Performance - Visualize latency trends, identify slow requests, and optimize your application’s responsiveness
Filter by Properties - Use custom properties to segment your analytics by environment, feature, user cohort, or any custom dimension
Set Up Alerts - Configure notifications for usage thresholds, error rates, or cost limits to stay informed
Export Data - Download logs and analytics for further analysis or compliance requirements

The dashboard provides real-time insights and historical trends to help you understand your AI application’s behavior and make data-driven optimization decisions.

Frequently Asked Questions

How does Helicone affect request latency?

Helicone adds minimal latency (typically 10-50ms) to your requests. The proxy architecture is optimized for performance, and the observability benefits far outweigh the small latency overhead. For latency-critical applications, you can use Helicone’s async logging mode.

Can I use Helicone with multiple model providers?

Yes! Helicone supports multiple LLM providers including OpenAI, Anthropic, Azure, and now Cerebras. You can monitor all your AI API calls in a single dashboard, making it easy to compare performance and costs across providers.

Is my data secure with Helicone?

Helicone takes security seriously. All data is encrypted in transit and at rest. You can also self-host Helicone for complete control over your data. Review Helicone’s security documentation for details.

What happens if Helicone is down?

Helicone is designed with high availability, but if the proxy is unavailable, your requests will fail. For production applications, consider implementing fallback logic or using Helicone’s async logging mode, which doesn’t block your requests.

How much does Helicone cost?

Helicone offers a generous free tier for development and small-scale production use. For higher volumes, check the Helicone pricing page for current plans and pricing.

Can I filter out sensitive data from logs?

Yes! Helicone provides data redaction features to filter sensitive information from your logs. You can configure redaction rules in your dashboard to automatically remove PII, API keys, or other sensitive data before it’s stored.

Next Steps

Now that you have Helicone set up with Cerebras, explore these resources to get the most out of your integration:

Explore the Helicone documentation for advanced features and best practices
Try different Cerebras models to compare performance and find the best fit for your use case
Set up custom properties for better analytics and segmentation
Enable caching to reduce costs and improve response times
Configure alerts for proactive usage monitoring and cost management
Review Cerebras documentation for model-specific guidance

Troubleshooting

Requests Not Appearing in Dashboard

Issue: API calls succeed but don’t show up in Helicone dashboard. Solution:

Verify your Helicone-Auth header includes the correct API key with the Bearer prefix
Check that you’re using the correct base URL: https://cerebras.helicone.ai/v1
Ensure your Helicone API key is active in your account settings
Wait a few seconds - there may be a slight delay in log processing (typically under 10 seconds)
Check your browser’s network tab to confirm requests are reaching Helicone’s proxy

Authentication Errors

Issue: Receiving 401 or 403 errors when making requests. Solution:

Confirm your Cerebras API key is valid and active in your Cerebras dashboard
Verify the Helicone-Auth header format: Bearer YOUR_API_KEY (note the space after “Bearer”)
Check that both API keys are correctly loaded from environment variables using os.getenv() or process.env
Regenerate your Helicone API key if needed from the dashboard
Ensure there are no extra spaces or newline characters in your API keys

Missing Custom Properties

Issue: Custom properties not appearing in Helicone dashboard. Solution:

Ensure property headers use the Helicone-Property- prefix (e.g., Helicone-Property-Environment)
Property names are case-sensitive - use consistent casing throughout your application
Use extra_headers parameter in Python or the headers option in JavaScript when making requests
Check the custom properties documentation for supported formats and naming conventions
Verify properties appear in the request logs before filtering by them in the dashboard

Cache Not Working

Issue: Requests not being cached as expected. Solution:

Verify Helicone-Cache-Enabled header is set to "true" (as a string, not boolean)
Caching works best with deterministic queries - ensure your prompts are consistent
Check your cache settings and TTL configuration in the Helicone dashboard
Review the caching documentation for configuration options and best practices
Note that streaming requests and requests with high temperature values may not be cached effectively

High Latency Issues

Issue: Requests are slower than expected when using Helicone. Solution:

Helicone typically adds 10-50ms of latency - if you’re seeing more, check your network connection
Consider using Helicone’s async logging mode for latency-critical applications
Monitor the Helicone status page for any service disruptions
Compare latency with and without Helicone to isolate the issue
Contact Helicone support if latency remains consistently high

Need More Help?

If you continue to experience issues:

Check the Helicone documentation for detailed guides
Join the Helicone Discord community for community support
Contact Helicone support for technical assistance
Review Cerebras documentation for model-specific guidance

Get Started

Capabilities

Compatibility

Resources

Support

What is Helicone?

Prerequisites

Configure Helicone

Advanced Features

Custom Properties

User Tracking

Caching

Streaming Responses

Request Tagging and Feedback

Monitoring Your Usage

Frequently Asked Questions

Next Steps

Troubleshooting

Requests Not Appearing in Dashboard

Authentication Errors

Missing Custom Properties

Cache Not Working

High Latency Issues

Need More Help?

Get Started

Capabilities

Compatibility

Resources

Support

​What is Helicone?

​Prerequisites

​Configure Helicone

​Advanced Features

​Custom Properties

​User Tracking

​Caching

​Streaming Responses

​Request Tagging and Feedback

​Monitoring Your Usage

​Frequently Asked Questions

​Next Steps

​Troubleshooting

​Requests Not Appearing in Dashboard

​Authentication Errors

​Missing Custom Properties

​Cache Not Working

​High Latency Issues

​Need More Help?

What is Helicone?

Prerequisites

Configure Helicone

Advanced Features

Custom Properties

User Tracking

Caching

Streaming Responses

Request Tagging and Feedback

Monitoring Your Usage

Frequently Asked Questions

Next Steps

Troubleshooting

Requests Not Appearing in Dashboard

Authentication Errors

Missing Custom Properties

Cache Not Working

High Latency Issues

Need More Help?