Skip to main content

What is Helicone?

Helicone is an open-source observability platform for LLM applications that provides logging, monitoring, and analytics for your AI API calls. With Helicone, you can track usage, debug issues, analyze costs, and optimize performance across all your Cerebras Inference requests. Learn more at https://www.helicone.ai/ Key features include:
  • Request Logging - Automatically log all API requests and responses
  • Cost Tracking - Monitor spending across models and users
  • Performance Analytics - Analyze latency, token usage, and throughput
  • Custom Properties - Tag requests for filtering and analysis
  • User Tracking - Monitor usage by user or session
  • Caching - Reduce costs with semantic caching

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Helicone Account - Visit Helicone and create a free account.
  • Helicone API Key - After signing up, generate an API key from your Helicone dashboard.
  • Python 3.11 or higher (for Python examples)

Configure Helicone

1

Install required dependencies

Install the OpenAI Python SDK, which is compatible with Cerebras Inference:
pip install openai python-dotenv
The python-dotenv package helps manage your API keys securely through environment variables.
2

Configure environment variables

Create a .env file in your project directory with your API keys:
CEREBRAS_API_KEY=your-cerebras-api-key-here
HELICONE_API_KEY=your-helicone-api-key-here
The Helicone API key enables authentication and links your requests to your Helicone account for monitoring and analytics. Keep these keys secure and never commit them to version control.
3

Initialize the client with Helicone

Set up the OpenAI client to route requests through Helicone’s proxy. This configuration automatically logs all your Cerebras API calls to Helicone without requiring any code changes to your existing application logic:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)
The base_url points to Helicone’s Cerebras proxy endpoint (https://cerebras.helicone.ai/v1), which forwards requests to Cerebras while capturing metrics. The Helicone-Auth header authenticates your requests with Helicone.
4

Make your first request

Now you can make API calls as usual. Helicone will automatically log the request, response, latency, and token usage:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain what observability means in AI applications."}
    ],
    max_tokens=150,
    temperature=0.7
)

print(response.choices[0].message.content)
After running this code, visit your Helicone dashboard to see the logged request with full details including prompt, response, latency, and token counts.
5

View your logs in Helicone

Navigate to your Helicone dashboard to view detailed logs of your requests. You’ll see:
  • Complete request and response data
  • Token usage and cost breakdowns
  • Latency metrics and performance trends
  • Custom properties and user tracking data
  • Error logs and debugging information
The dashboard provides powerful filtering and search capabilities to help you analyze your AI application’s behavior and optimize performance.

Advanced Features

Custom Properties

Add custom metadata to your requests for better filtering and analysis in the Helicone dashboard. Custom properties help you segment your data by environment, feature, user cohort, or any other dimension relevant to your application:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    extra_headers={
        "Helicone-Property-Environment": "production",
        "Helicone-Property-User-Id": "user-123",
        "Helicone-Property-Session-Id": "session-456",
        "Helicone-Property-Feature": "chat-assistant"
    }
)
These custom properties allow you to:
  • Filter requests by environment (development, staging, production)
  • Track usage per user or session
  • Analyze performance across different features or segments
  • Create custom dashboards and reports
Learn more about custom properties in the Helicone documentation.

User Tracking

Track requests by user to monitor individual usage patterns, costs, and behavior. User tracking helps you understand how different users interact with your AI application and identify power users or potential issues:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[
        {"role": "user", "content": "Summarize this article..."}
    ],
    extra_headers={
        "Helicone-User-Id": "[email protected]"
    }
)
The Helicone-User-Id header associates requests with specific users, enabling per-user analytics and cost tracking in your dashboard.

Caching

Enable semantic caching to reduce costs and latency for similar requests. Helicone’s intelligent caching can match semantically similar queries even if they’re not identical, significantly reducing API costs for common questions:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="llama3.1-8b",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_headers={
        "Helicone-Cache-Enabled": "true"
    }
)
Helicone’s semantic cache can match similar queries even if they’re not identical, significantly reducing API costs for common questions. Configure cache settings and time-to-live (TTL) in your Helicone dashboard.
Explore advanced caching options in the Helicone caching documentation.

Streaming Responses

Helicone fully supports streaming responses from Cerebras, logging complete metrics once the stream completes. Streaming is ideal for real-time applications where you want to display responses as they’re generated:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Write a short story about AI."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Streaming requests are logged with complete metrics including total tokens, latency, and cost once the stream completes.

Request Tagging and Feedback

Tag requests and add feedback scores to track quality and performance over time. This is particularly useful for evaluating model outputs and identifying areas for improvement:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://cerebras.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.getenv('HELICONE_API_KEY')}",
        "X-Cerebras-3rd-Party-Integration": "Helicone"
    }
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ],
    extra_headers={
        "Helicone-Request-Id": "req-123",
        "Helicone-Property-Prompt-Version": "v2.1"
    }
)

# Later, add feedback via Helicone API
# This helps track which responses were helpful

Monitoring Your Usage

After making requests, you can leverage Helicone’s comprehensive dashboard to gain insights into your AI application:
  1. View Request Logs - See all requests with timestamps, models, prompts, and responses in the Helicone dashboard
  2. Analyze Costs - Track spending across different models, users, and time periods with detailed cost breakdowns
  3. Monitor Performance - Visualize latency trends, identify slow requests, and optimize your application’s responsiveness
  4. Filter by Properties - Use custom properties to segment your analytics by environment, feature, user cohort, or any custom dimension
  5. Set Up Alerts - Configure notifications for usage thresholds, error rates, or cost limits to stay informed
  6. Export Data - Download logs and analytics for further analysis or compliance requirements
The dashboard provides real-time insights and historical trends to help you understand your AI application’s behavior and make data-driven optimization decisions.

Frequently Asked Questions

Helicone adds minimal latency (typically 10-50ms) to your requests. The proxy architecture is optimized for performance, and the observability benefits far outweigh the small latency overhead. For latency-critical applications, you can use Helicone’s async logging mode.
Yes! Helicone supports multiple LLM providers including OpenAI, Anthropic, Azure, and now Cerebras. You can monitor all your AI API calls in a single dashboard, making it easy to compare performance and costs across providers.
Helicone takes security seriously. All data is encrypted in transit and at rest. You can also self-host Helicone for complete control over your data. Review Helicone’s security documentation for details.
Helicone is designed with high availability, but if the proxy is unavailable, your requests will fail. For production applications, consider implementing fallback logic or using Helicone’s async logging mode, which doesn’t block your requests.
Helicone offers a generous free tier for development and small-scale production use. For higher volumes, check the Helicone pricing page for current plans and pricing.
Yes! Helicone provides data redaction features to filter sensitive information from your logs. You can configure redaction rules in your dashboard to automatically remove PII, API keys, or other sensitive data before it’s stored.

Next Steps

Now that you have Helicone set up with Cerebras, explore these resources to get the most out of your integration:
  • Explore the Helicone documentation for advanced features and best practices
  • Try different Cerebras models to compare performance and find the best fit for your use case
  • Set up custom properties for better analytics and segmentation
  • Enable caching to reduce costs and improve response times
  • Configure alerts for proactive usage monitoring and cost management
  • Review Cerebras documentation for model-specific guidance

Troubleshooting

Requests Not Appearing in Dashboard

Issue: API calls succeed but don’t show up in Helicone dashboard. Solution:
  • Verify your Helicone-Auth header includes the correct API key with the Bearer prefix
  • Check that you’re using the correct base URL: https://cerebras.helicone.ai/v1
  • Ensure your Helicone API key is active in your account settings
  • Wait a few seconds - there may be a slight delay in log processing (typically under 10 seconds)
  • Check your browser’s network tab to confirm requests are reaching Helicone’s proxy

Authentication Errors

Issue: Receiving 401 or 403 errors when making requests. Solution:
  • Confirm your Cerebras API key is valid and active in your Cerebras dashboard
  • Verify the Helicone-Auth header format: Bearer YOUR_API_KEY (note the space after “Bearer”)
  • Check that both API keys are correctly loaded from environment variables using os.getenv() or process.env
  • Regenerate your Helicone API key if needed from the dashboard
  • Ensure there are no extra spaces or newline characters in your API keys

Missing Custom Properties

Issue: Custom properties not appearing in Helicone dashboard. Solution:
  • Ensure property headers use the Helicone-Property- prefix (e.g., Helicone-Property-Environment)
  • Property names are case-sensitive - use consistent casing throughout your application
  • Use extra_headers parameter in Python or the headers option in JavaScript when making requests
  • Check the custom properties documentation for supported formats and naming conventions
  • Verify properties appear in the request logs before filtering by them in the dashboard

Cache Not Working

Issue: Requests not being cached as expected. Solution:
  • Verify Helicone-Cache-Enabled header is set to "true" (as a string, not boolean)
  • Caching works best with deterministic queries - ensure your prompts are consistent
  • Check your cache settings and TTL configuration in the Helicone dashboard
  • Review the caching documentation for configuration options and best practices
  • Note that streaming requests and requests with high temperature values may not be cached effectively

High Latency Issues

Issue: Requests are slower than expected when using Helicone. Solution:
  • Helicone typically adds 10-50ms of latency - if you’re seeing more, check your network connection
  • Consider using Helicone’s async logging mode for latency-critical applications
  • Monitor the Helicone status page for any service disruptions
  • Compare latency with and without Helicone to isolate the issue
  • Contact Helicone support if latency remains consistently high

Need More Help?

If you continue to experience issues: