> ## Documentation Index
> Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Started with Cloudflare AI Gateway

> Learn how to route Cerebras Inference requests through Cloudflare AI Gateway for enhanced observability, caching, and rate limiting.

Cloudflare AI Gateway acts as a proxy between your application and Cerebras Inference, providing powerful features like request logging, caching, rate limiting, and analytics. This integration allows you to monitor and optimize your AI workloads while maintaining the ultra-low latency of Cerebras hardware.

## Prerequisites

Before you begin, ensure you have:

* **Cerebras API Key** - Get a free API key [here](https://cloud.cerebras.ai/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc)
* **Cloudflare Account** - Visit [Cloudflare](https://dash.cloudflare.com/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) and create an account or log in
* **AI Gateway Created** - Set up an AI Gateway in your Cloudflare dashboard
* **Python 3.11 or higher** (for Python examples)

## Configure Cloudflare AI Gateway

<Steps>
  <Step title="Create an AI Gateway">
    First, you'll need to create an AI Gateway in your Cloudflare dashboard to enable request routing and monitoring.

    1. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc)
    2. Navigate to **AI** > **AI Gateway**
    3. Click **Create Gateway**
    4. Give your gateway a name (e.g., "cbrs" or "cerebras-gateway") - this will be your **Gateway ID**
    5. Click **Create** to complete the setup

    <Note>
      **Important:** After creating your gateway, note these two values:

      * **Gateway ID**: This is the name you just chose for your gateway (e.g., "cbrs")
      * **Account ID**: This is visible in your browser's URL bar (e.g., `xxxxxx`)

      Your gateway URL will look like: `https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras`

      For example: `https://gateway.ai.cloudflare.com/v1/xxxxxx/cbrs/cerebras`
    </Note>

    <Warning>
      **Configure Cerebras Provider:** In your AI Gateway settings, you may need to add Cerebras as a custom provider. If you see an error about configuring the gateway, go to your gateway settings and add Cerebras with the base URL: `https://api.cerebras.ai/v1`
    </Warning>
  </Step>

  <Step title="Install required dependencies">
    Install the necessary Python packages:

    ```bash theme={null}
    pip install requests python-dotenv
    ```
  </Step>

  <Step title="Configure environment variables">
    Create a `.env` file in your project directory to securely store your credentials:

    ```bash theme={null}
    CEREBRAS_API_KEY=your-cerebras-api-key-here
    CLOUDFLARE_ACCOUNT_ID=your-account-id-here  # From your browser URL
    CLOUDFLARE_GATEWAY_ID=your-gateway-name  # The name you gave your gateway (e.g., "cbrs")
    ```

    Replace the placeholder values with your actual credentials:

    * `CEREBRAS_API_KEY`: Your Cerebras API key from the dashboard
    * `CLOUDFLARE_ACCOUNT_ID`: The account ID from your Cloudflare dashboard URL (e.g., `xxxxxx`)
    * `CLOUDFLARE_GATEWAY_ID`: The gateway name you chose when creating the gateway
  </Step>

  <Step title="Initialize the client with AI Gateway">
    Set up your client to route requests through Cloudflare AI Gateway. The key is to use Cloudflare's gateway URL as your base URL, which automatically enables logging, caching, and analytics.

    ```python theme={null}
    import os
    import requests
    from dotenv import load_dotenv

    load_dotenv()

    # Set up the gateway URL
    account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
    gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
    api_key = os.getenv("CEREBRAS_API_KEY")

    url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    ```
  </Step>

  <Step title="Make your first request">
    Now you can make requests exactly as you would with the standard Cerebras API. All requests will automatically flow through AI Gateway, enabling logging and analytics.

    ```python theme={null}
    import os
    import requests
    from dotenv import load_dotenv

    load_dotenv()

    account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
    gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
    api_key = os.getenv("CEREBRAS_API_KEY")

    url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    data = {
        "model": "gpt-oss-120b",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain how AI Gateway improves observability."}
        ],
        "max_tokens": 500,
        "temperature": 0.7
    }

    response = requests.post(url, headers=headers, json=data)
    print(response.json()["choices"][0]["message"]["content"])
    ```
  </Step>

  <Step title="Enable streaming responses">
    Cloudflare AI Gateway fully supports streaming responses from Cerebras, allowing you to display results in real-time as they're generated.

    ```python theme={null}
    import os
    import requests
    from dotenv import load_dotenv

    load_dotenv()

    account_id = os.getenv("CLOUDFLARE_ACCOUNT_ID")
    gateway_id = os.getenv("CLOUDFLARE_GATEWAY_ID")
    api_key = os.getenv("CEREBRAS_API_KEY")

    url = f"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/cerebras/chat/completions"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    data = {
        "model": "gpt-oss-120b",
        "messages": [
            {"role": "user", "content": "Write a short story about AI."}
        ],
        "stream": True,
        "max_tokens": 1000
    }

    # Stream responses for real-time output
    response = requests.post(url, headers=headers, json=data, stream=True)

    for line in response.iter_lines():
        if line:
            import json
            decoded = line.decode('utf-8').replace('data: ', '')
            if decoded.strip() == '[DONE]':
                break
            chunk = json.loads(decoded)
            if 'choices' in chunk and chunk['choices'][0].get('delta', {}).get('content'):
                print(chunk['choices'][0]['delta']['content'], end="")
    ```
  </Step>
</Steps>

## Key Features

<AccordionGroup>
  <Accordion title="Request Logging and Analytics">
    All requests routed through AI Gateway are automatically logged in your Cloudflare dashboard. You can view:

    * Request volume and patterns over time
    * Token usage and cost tracking
    * Response times and latency metrics
    * Error rates and types for debugging
    * Model usage distribution

    Access your analytics by navigating to **AI** > **AI Gateway** in your Cloudflare dashboard and selecting your gateway.
  </Accordion>

  <Accordion title="Caching">
    Enable caching to reduce costs and improve response times for repeated queries. Cached responses are served instantly without hitting the Cerebras API:

    1. Go to your AI Gateway in the Cloudflare dashboard
    2. Navigate to **Settings** > **Caching**
    3. Enable caching and configure TTL (time-to-live)
    4. Cached responses will be served instantly for identical requests

    Learn more about [caching strategies](https://developers.cloudflare.com/ai-gateway/configuration/caching/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) in the Cloudflare documentation.
  </Accordion>

  <Accordion title="Rate Limiting">
    Protect your application and control costs with flexible rate limiting:

    1. In your AI Gateway settings, go to **Rate Limiting**
    2. Set limits per user, IP address, or API key
    3. Configure time windows (per minute, hour, or day)
    4. Requests exceeding limits will receive a 429 status code

    Explore [rate limiting configuration](https://developers.cloudflare.com/ai-gateway/configuration/rate-limiting/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) for advanced options.
  </Accordion>
</AccordionGroup>

## Next Steps

* Explore the [Cloudflare AI Gateway documentation](https://developers.cloudflare.com/ai-gateway/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) for advanced features
* Learn about [caching strategies](https://developers.cloudflare.com/ai-gateway/configuration/caching/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) to optimize performance and reduce costs
* Set up [custom analytics](https://developers.cloudflare.com/ai-gateway/observability/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) dashboards to monitor your AI workloads
* Try different [Cerebras models](/models) to find the best fit for your use case
* Implement [rate limiting](https://developers.cloudflare.com/ai-gateway/configuration/rate-limiting/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc) to control costs and protect your application
* Review the [Cerebras API reference](/api-reference/chat-completions) for all available parameters
* Want to use the latest model? Check out the [GLM4.7 migration guide](https://inference-docs.cerebras.ai/resources/glm-47-migration?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc)

## FAQ

<Accordion title="Why am I getting a 401 Unauthorized error?">
  This usually means your Cerebras API key is invalid or missing. Double-check that:

  1. Your `CEREBRAS_API_KEY` environment variable is set correctly
  2. The API key is active and hasn't been revoked
  3. You're using the correct Authorization header format: `Bearer YOUR_API_KEY`

  You can verify your API key by making a direct request to Cerebras (without AI Gateway) to isolate the issue.
</Accordion>

<Accordion title="How do I view my request logs?">
  To view request logs and analytics:

  1. Log in to your [Cloudflare dashboard](https://dash.cloudflare.com/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc)
  2. Navigate to **AI** > **AI Gateway**
  3. Select your gateway
  4. Click on the **Analytics** tab to see request metrics and logs

  Logs include request/response bodies, timestamps, token usage, latency metrics, and error details. You can filter by date range, model, and status code.
</Accordion>

<Accordion title="Does AI Gateway add latency to my requests?">
  Cloudflare AI Gateway adds minimal latency (typically 10-50ms) as requests are routed through Cloudflare's global network. However, this is often offset by:

  * **Caching**: Repeated queries are served instantly from cache with near-zero latency
  * **Edge network**: Cloudflare's global edge network may provide faster routing than direct connections
  * **Optimization insights**: The observability features help you identify and fix performance bottlenecks

  For latency-critical applications, you can measure the impact by comparing direct Cerebras requests with gateway-routed requests using the same prompts.
</Accordion>

<Accordion title="Can I use AI Gateway with streaming responses?">
  Yes! AI Gateway fully supports streaming responses from Cerebras. Simply set `stream=True` (Python) or `stream: true` (JavaScript) in your request, and chunks will be streamed through the gateway in real-time. All streaming requests are still logged and counted in your analytics, giving you complete visibility into your streaming workloads.
</Accordion>

<Accordion title="What happens if AI Gateway is down?">
  Cloudflare AI Gateway is built on Cloudflare's highly reliable global network with 99.99%+ uptime. In the rare event of an outage:

  1. You can temporarily switch to direct Cerebras API calls by changing your `base_url` to `https://api.cerebras.ai/v1`
  2. Cloudflare provides real-time status updates at [cloudflarestatus.com](https://www.cloudflarestatus.com/?utm_source=3pi_cloudflare-ai-gateway\&utm_campaign=partner_doc)
  3. Your application code doesn't need to change - just update the base URL configuration

  Consider implementing automatic fallback logic in production applications to switch between gateway and direct endpoints based on availability.
</Accordion>

<Accordion title="Which Cerebras models are available through AI Gateway?">
  All current Cerebras models are available through AI Gateway:

  * `gpt-oss-120b` - Fastest option for simple tasks and high-throughput scenarios
  * `gpt-oss-120b` - Largest model for the most demanding tasks
  * `zai-glm-4.7` - Advanced 357B parameter model with strong reasoning capabilities

  You can use any of these models by specifying the model name in your request. When using the OpenAI-compatible endpoint, prefix the model name with `cerebras/` (e.g., `cerebras/gpt-oss-120b`).
</Accordion>
