Skip to main content
Operant AI is a real-time security platform for AI applications, agents, and MCP. It discovers, detects, and defends the full spectrum of AI workloads — from LLMs and APIs to orchestration layers, MCP servers, and autonomous agents. Operant AI Gateway acts as a proxy between your application and Cerebras Inference, applying inline security controls like OWASP-LLM threat detection, PII redaction, and rate limiting with no changes to your application logic.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get an API key at cloud.cerebras.ai
  • Operant AI Gatekeeper and Gateway - Sign up at operant.ai and ensure both the Gatekeeper and Gateway container are installed and running in your environment

Configure Operant AI Gateway

1

Configure environment variables

Create a .env file in your project with your credentials and gateway settings:
CEREBRAS_API_KEY=your-cerebras-api-key
OPERANT_GATEWAY_BASE_URL=https://<your-gateway-host>/ai-gateway/v1
CEREBRAS_APP_NAME=my-cerebras-app
Your gateway base URL will look like one of these:
  • https://operant-gateway.mydomain.com/ai-gateway/v1
  • http://operant-gateway.operant-namespace.svc:9000/ai-gateway/v1 (if using the Kubernetes service directly)
2

Install required dependencies

Install the OpenAI Python SDK and dotenv:
pip install openai python-dotenv
Cerebras exposes an OpenAI-compatible API, so the standard OpenAI client works without modification.
3

Initialize the client

Configure the OpenAI client to route requests through the Operant Gateway. Two headers are required on every request:
  • x-gateway-source — identifies your application
  • x-gateway-target — tells the gateway where to forward traffic
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url=os.getenv("OPERANT_GATEWAY_BASE_URL"),
    default_headers={
        "x-gateway-source": os.getenv("CEREBRAS_APP_NAME"),
        "x-gateway-target": "https://api.cerebras.ai/v1",
    },
)
4

Make your first request

Make a chat completion request. The call is identical to a standard Cerebras request — the gateway handles routing and security enforcement transparently.
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url=os.getenv("OPERANT_GATEWAY_BASE_URL"),
    default_headers={
        "x-gateway-source": os.getenv("CEREBRAS_APP_NAME"),
        "x-gateway-target": "https://api.cerebras.ai/v1",
    },
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Nevada?"},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)
5

Enable streaming

Operant Gateway fully supports streaming responses. Set stream=True to receive tokens in real time:
def stream_chat(client):
    chunks = []

    stream = client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain transformers in 3 sentences."},
        ],
        max_tokens=256,
        stream=True,
    )

    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            chunks.append(chunk.choices[0].delta.content)

    return "".join(chunks)

Supported Models

The Operant Gateway supports gpt-oss-120b and zai-glm-4.7.

What You Get from Operant

Once traffic flows through the Operant Gateway, the following capabilities are available:
Gatekeeper detects and blocks prompt injection attempts including jailbreaks and system override tricks before they reach the model.
Sensitive data — including PII, PCI, PHI, and API keys — is automatically detected and redacted across over 100 data types before requests are forwarded to Cerebras.
Gatekeeper monitors responses and stops attempts to extract sensitive data through the model.
Configure per-team, per-agent, or per-address rate limits to control abuse and manage costs. Exceeded limits return a 429 status code.
Protects against data poisoning attacks targeting your AI application’s training or inference pipeline.
Operant provides a visual security graph that surfaces all AI traffic once inference calls are flowing through the gateway. It highlights PII detections, prompt injections, and secrets found in prompts and responses.
Operant provides a configurable set of guardrails covering governance policies, prompt injection blocking, and sensitive data inline blocking and redaction. These can be applied per application through the Operant dashboard.