Get Started with Operant

Operant AI is a real-time security platform for AI applications, agents, and MCP. It discovers, detects, and defends the full spectrum of AI workloads — from LLMs and APIs to orchestration layers, MCP servers, and autonomous agents. Operant AI Gateway acts as a proxy between your application and Cerebras Inference, applying inline security controls like OWASP-LLM threat detection, PII redaction, and rate limiting with no changes to your application logic.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get an API key at cloud.cerebras.ai
Operant AI Gatekeeper and Gateway - Sign up at operant.ai and ensure both the Gatekeeper and Gateway container are installed and running in your environment

Configure Operant AI Gateway

Configure environment variables

Create a .env file in your project with your credentials and gateway settings:

CEREBRAS_API_KEY=your-cerebras-api-key
OPERANT_GATEWAY_BASE_URL=https://<your-gateway-host>/ai-gateway/v1
CEREBRAS_APP_NAME=my-cerebras-app

Your gateway base URL will look like one of these:

https://operant-gateway.mydomain.com/ai-gateway/v1
http://operant-gateway.operant-namespace.svc:9000/ai-gateway/v1 (if using the Kubernetes service directly)

Install required dependencies

Install the OpenAI Python SDK and dotenv:

pip install openai python-dotenv

Cerebras exposes an OpenAI-compatible API, so the standard OpenAI client works without modification.

Initialize the client

Configure the OpenAI client to route requests through the Operant Gateway. Two headers are required on every request:

x-gateway-source — identifies your application
x-gateway-target — tells the gateway where to forward traffic

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url=os.getenv("OPERANT_GATEWAY_BASE_URL"),
    default_headers={
        "x-gateway-source": os.getenv("CEREBRAS_APP_NAME"),
        "x-gateway-target": "https://api.cerebras.ai/v1",
    },
)

Make your first request

Make a chat completion request. The call is identical to a standard Cerebras request — the gateway handles routing and security enforcement transparently.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url=os.getenv("OPERANT_GATEWAY_BASE_URL"),
    default_headers={
        "x-gateway-source": os.getenv("CEREBRAS_APP_NAME"),
        "x-gateway-target": "https://api.cerebras.ai/v1",
    },
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Nevada?"},
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Enable streaming

Operant Gateway fully supports streaming responses. Set stream=True to receive tokens in real time:

def stream_chat(client):
    chunks = []

    stream = client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain transformers in 3 sentences."},
        ],
        max_tokens=256,
        stream=True,
    )

    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            chunks.append(chunk.choices[0].delta.content)

    return "".join(chunks)

Supported Models

The Operant Gateway supports gpt-oss-120b and zai-glm-4.7.

What You Get from Operant

Once traffic flows through the Operant Gateway, the following capabilities are available:

Prompt Injection Blocking

Gatekeeper detects and blocks prompt injection attempts including jailbreaks and system override tricks before they reach the model.

PII and Secrets Redaction

Sensitive data — including PII, PCI, PHI, and API keys — is automatically detected and redacted across over 100 data types before requests are forwarded to Cerebras.

Sensitive Data Extraction Prevention

Gatekeeper monitors responses and stops attempts to extract sensitive data through the model.

Rate Limiting

Configure per-team, per-agent, or per-address rate limits to control abuse and manage costs. Exceeded limits return a 429 status code.

LLM Poisoning Prevention

Protects against data poisoning attacks targeting your AI application’s training or inference pipeline.

Security Graph

Operant provides a visual security graph that surfaces all AI traffic once inference calls are flowing through the gateway. It highlights PII detections, prompt injections, and secrets found in prompts and responses.

Guardrails

Operant provides a configurable set of guardrails covering governance policies, prompt injection blocking, and sensitive data inline blocking and redaction. These can be applied per application through the Operant dashboard.

Integrations

Agentic Frameworks

AI Development Kits

Coding Tools

Containerization

Observability & Evaluation

LLM Integration Tools

LLM Application Frameworks

Document Processing

Real-Time Audio

Multi-LLM Management

No-Code/Low-Code Platforms

Prerequisites

Configure Operant AI Gateway

Supported Models

What You Get from Operant

​Prerequisites

​Configure Operant AI Gateway

​Supported Models

​What You Get from Operant

Prerequisites

Configure Operant AI Gateway

Supported Models

What You Get from Operant