# Cerebras Inference ## Docs - [Authentication](https://inference-docs.cerebras.ai/api-reference/authentication.md) - [Cancel batch](https://inference-docs.cerebras.ai/api-reference/batch/cancel-batch.md) - [Create batch](https://inference-docs.cerebras.ai/api-reference/batch/create-batch.md) - [List batch](https://inference-docs.cerebras.ai/api-reference/batch/list-batch.md) - [Retrieve batch](https://inference-docs.cerebras.ai/api-reference/batch/retrieve-batch.md) - [Chat Completions](https://inference-docs.cerebras.ai/api-reference/chat-completions.md) - [Completions](https://inference-docs.cerebras.ai/api-reference/completions.md) - [Delete model version](https://inference-docs.cerebras.ai/api-reference/customer_management_api/delete-model-version.md) - [Deploy model to endpoint](https://inference-docs.cerebras.ai/api-reference/customer_management_api/deploy-model-to-endpoint.md) - [List endpoints](https://inference-docs.cerebras.ai/api-reference/customer_management_api/list-endpoints.md) - [List model architectures](https://inference-docs.cerebras.ai/api-reference/customer_management_api/list-model-architectures.md) - [List model versions](https://inference-docs.cerebras.ai/api-reference/customer_management_api/list-model-versions.md) - [Retrieve endpoint status](https://inference-docs.cerebras.ai/api-reference/customer_management_api/retrieve-endpoint-status.md) - [Retrieve model version status](https://inference-docs.cerebras.ai/api-reference/customer_management_api/retrieve-model-version-status.md) - [Update model version aliases](https://inference-docs.cerebras.ai/api-reference/customer_management_api/update-model-version-aliases.md) - [Upload model version](https://inference-docs.cerebras.ai/api-reference/customer_management_api/upload-model-version.md) - [Delete file](https://inference-docs.cerebras.ai/api-reference/file/delete-file.md) - [List files](https://inference-docs.cerebras.ai/api-reference/file/list-files.md) - [Retrieve file](https://inference-docs.cerebras.ai/api-reference/file/retrieve-file.md) - [Retrieve file content](https://inference-docs.cerebras.ai/api-reference/file/retrieve-file-content.md) - [Upload file](https://inference-docs.cerebras.ai/api-reference/file/upload-file.md) - [Retrieve metrics](https://inference-docs.cerebras.ai/api-reference/metrics/retrieve-metrics.md): Retrieve operational metrics for your organization's inference endpoints in Prometheus format. - [List models](https://inference-docs.cerebras.ai/api-reference/models/list-models.md) - [Public models](https://inference-docs.cerebras.ai/api-reference/models/public-models.md) - [Retrieve model](https://inference-docs.cerebras.ai/api-reference/models/retrieve-model.md) - [Versions](https://inference-docs.cerebras.ai/api-reference/versions.md): Understand how Cerebras uses versioning to manage breaking changes. - [Batch](https://inference-docs.cerebras.ai/capabilities/batch.md): Run large-scale inference workloads asynchronously. - [CePO: Cerebras Planning & Optimization​](https://inference-docs.cerebras.ai/capabilities/cepo.md): Improving Llama's reasoning abilities with test-time compute - [Metrics](https://inference-docs.cerebras.ai/capabilities/metrics.md): Monitor your dedicated inference endpoints with Prometheus-compatible metrics for requests, tokens, latency, and endpoint health. - [Payload Optimization](https://inference-docs.cerebras.ai/capabilities/payload-optimization.md): Reduce latency by compressing request payloads with msgpack encoding and gzip. - [Predicted Outputs](https://inference-docs.cerebras.ai/capabilities/predicted-outputs.md): Reduce latency by specifying parts of the response that are already known. - [Prompt Caching](https://inference-docs.cerebras.ai/capabilities/prompt-caching.md): Store and reuse previously processed prompts to reduce latency and increase response times for similar or repeated queries. - [Reasoning](https://inference-docs.cerebras.ai/capabilities/reasoning.md): Reasoning models generate intermediate thinking tokens before their final response, enabling better problem-solving and allowing you to inspect the model's thought process. - [Service Tiers](https://inference-docs.cerebras.ai/capabilities/service-tiers.md): Control request prioritization with service tiers. - [Streaming Responses](https://inference-docs.cerebras.ai/capabilities/streaming.md): Learn how to enable streaming responses in the Cerebras API. - [Structured Outputs](https://inference-docs.cerebras.ai/capabilities/structured-outputs.md): Generate structured data with the Cerebras Inference API - [Tool Calling](https://inference-docs.cerebras.ai/capabilities/tool-use.md): Learn how to connect models to external tools with tool calling. - [Projects](https://inference-docs.cerebras.ai/console/projects.md): Organize workloads, isolate environments, control costs, and manage access using Projects in the Cerebras Cloud console. - [Inference Cookbook](https://inference-docs.cerebras.ai/cookbook.md) - [Academic Research Agent](https://inference-docs.cerebras.ai/cookbook/agents/academic-research-agent.md): Generate arXiv search queries, analyze academic papers, download and process PDFs, and synthesize research insights with a conversational AI research assistant powered by PydanticAI + Cerebras + Unstructured. - [Automate User Research with LangChain](https://inference-docs.cerebras.ai/cookbook/agents/automate-user-research.md): Learn how to build an AI-powered user research system that can automatically generate user personas, conduct interviews, and synthesize insights using LangGraph's multi-agent workflow in under 60 seconds. - [Build Your Own Docs Checker with Cerebras & Browserbase](https://inference-docs.cerebras.ai/cookbook/agents/build-a-docs-checker.md): Build a docs checker that can crawl your documentation site and analyze each page for quality issues. - [Build Your Own Perplexity with Exa](https://inference-docs.cerebras.ai/cookbook/agents/build-your-own-perplexity.md): Build a Perplexity-style deep research assistant that can automatically search the web, analyzes multiple sources, and provide structured insights in under 60 seconds. - [Build Your Own Content Fact Checker with gpt-oss-120B, Cerebras, and Parallel](https://inference-docs.cerebras.ai/cookbook/agents/docs-checker.md) - [Implementing Gist Memory: Summarizing and Searching Long Documents with a ReadAgent](https://inference-docs.cerebras.ai/cookbook/agents/gist-memory.md): Build an AI agent that reads, summarizes, and answers questions about long documents using Gist Memory and the Cerebras Inference SDK. - [Hyper-Personalized Web Pages](https://inference-docs.cerebras.ai/cookbook/agents/hyper-personalization.md): Build hyper-personalized web pages using Cerebras AI with Pydantic structured outputs and Jinja2 templating—pages that adapt to each visitor's preferred colors, tone, and products in real-time. - [Interviewer Voice Agent with LiveKit](https://inference-docs.cerebras.ai/cookbook/agents/livekit.md): Learn how to integrate LiveKit's voice capabilities with Cerebras's fast inference to build a real-time voice interview agent that analyzes your resume and job descriptions to conduct personalized mock interviews. - [Realtime Voice Translation Agent](https://inference-docs.cerebras.ai/cookbook/agents/realtime-voice-translation.md): Translate spoken conversations to any language with sub-second latency by building a realtime voice translation agent powered by Cerebras and LiveKit. - [Build a Real-Time AI Sales Agent with LiveKit](https://inference-docs.cerebras.ai/cookbook/agents/sales-agent-cerebras-livekit.md): Build a sophisticated real-time voice sales agent that can have natural conversations with potential customers. The resulting AI agent will be able to process audio input and generate spoken replies by drawing information directly from your company's sales materials. - [Automating Search-Based Report Generation with a Multi-Agent AI Pipeline](https://inference-docs.cerebras.ai/cookbook/agents/search-agent.md): Build an AI agent pipeline that searches, summarizes, and synthesizes information from multiple sources to generate comprehensive reports. - [Management API](https://inference-docs.cerebras.ai/dedicated/management-api.md): Programmatically upload, version, and deploy custom model weights on your dedicated endpoint. - [Dedicated Endpoints](https://inference-docs.cerebras.ai/dedicated/overview.md): Deploy private, high-performance inference endpoints for enterprise workloads. - [Integrations](https://inference-docs.cerebras.ai/integrations.md): We currently support a number of integrations that allow you to do more with the Cerebras Inference SDK. - [Get Started with Cline and Cerebras](https://inference-docs.cerebras.ai/integrations/cline.md): Set up Cline, an open-source AI coding assistant, to work with Cerebras inference. - [Cerebras Code MCP Server](https://inference-docs.cerebras.ai/integrations/code-mcp.md): Learn how to install, configure, and integrate the Cerebras Code MCP server with supported editors. - [Get Started with KiloCode](https://inference-docs.cerebras.ai/integrations/kilocode.md): Learn how to integrate KiloCode, an AI-powered autonomous coding assistant for VS Code, with Cerebras's ultra-fast inference for planning, building, and fixing code. - [Get Started with OpenCode](https://inference-docs.cerebras.ai/integrations/opencode.md): Configure OpenCode to use Cerebras Inference for AI coding assistance. - [Get Started with Cerebras for VS Code](https://inference-docs.cerebras.ai/integrations/vscode.md): Configure Visual Studio Code to use Cerebras Inference for autonomous coding assistance. - [Build with the Speed of Cerebras](https://inference-docs.cerebras.ai/introduction.md): Experience real-time AI responses for code generation, summarization, and autonomous tasks with the world’s fastest AI inference. - [Llama 3.1 8B](https://inference-docs.cerebras.ai/models/llama-31-8b.md): This model excels in speed-critical scenarios like real-time chat, customer service, interactive gaming, and live content generation. Perfect for high-throughput tasks including batch processing, concurrent API requests, and data pipelines. - [OpenAI GPT OSS](https://inference-docs.cerebras.ai/models/openai-oss.md): This model excels at efficient reasoning across science, math, and coding applications. It's ideal for real-time coding assistance, processing large documents for Q&A and summarization, agentic research workflows, and regulated on-premises workloads. - [Supported Models](https://inference-docs.cerebras.ai/models/overview.md) - [Qwen 3 235B Instruct](https://inference-docs.cerebras.ai/models/qwen-3-235b-2507.md): This non-thinking version offers powerful multilingual capabilities with significant improvements in instruction following, logical reasoning, mathematics, coding, and tool usage. - [Z.ai GLM 4.7](https://inference-docs.cerebras.ai/models/zai-glm-47.md): This model delivers strong coding performance with advanced reasoning capabilities, superior tool use, and enhanced real-world performance in agentic coding applications. - [Quickstart](https://inference-docs.cerebras.ai/quickstart.md): Get started with the Cerebras API. - [Get Started with Cerebras Code](https://inference-docs.cerebras.ai/resources/cerebras-code.md): Choose from direct platform integrations or standalone tools to get fast, flexible AI-powered coding assistance that integrates seamlessly with your existing development tools. - [Designing for Cerebras](https://inference-docs.cerebras.ai/resources/designing-for-cerebras.md): Architectural patterns that take advantage of ultra-fast inference. - [Migrate to GLM 4.7](https://inference-docs.cerebras.ai/resources/glm-47-migration.md): Learn how to migrate to Z.ai GLM 4.7 on the Cerebras API, including reasoning controls, streaming, and updated limits. - [OpenAI Compatibility](https://inference-docs.cerebras.ai/resources/openai.md): Use the OpenAI Client Libraries with Cerebras Inference - [API Playground](https://inference-docs.cerebras.ai/resources/playground.md) - [Change Log](https://inference-docs.cerebras.ai/support/change-log.md) - [Deprecations](https://inference-docs.cerebras.ai/support/deprecation.md): A list of all deprecations, with the most recent announcements appearing first. - [Error Codes](https://inference-docs.cerebras.ai/support/error.md) - [Policies](https://inference-docs.cerebras.ai/support/policies.md) - [Preview Releases](https://inference-docs.cerebras.ai/support/preview-releases.md): Understand the different release stages for features and their support policies. - [Pricing](https://inference-docs.cerebras.ai/support/pricing.md) - [Rate Limits](https://inference-docs.cerebras.ai/support/rate-limits.md): Learn how rate limits are applied and measured. - [Service Status](https://inference-docs.cerebras.ai/support/status.md) ## OpenAPI Specs - [openapi](https://inference-docs.cerebras.ai/api-reference/customer_management_api/openapi.yaml) ## Optional - [Python SDK](https://github.com/Cerebras/cerebras-cloud-sdk-python) - [Node.js SDK](https://github.com/Cerebras/cerebras-cloud-sdk-node)