# Cerebras Inference ## Docs - [Authentication](https://inference-docs.cerebras.ai/api-reference/authentication.md) - [Cancel batch](https://inference-docs.cerebras.ai/api-reference/batch/cancel-batch.md) - [Create batch](https://inference-docs.cerebras.ai/api-reference/batch/create-batch.md) - [Retrieve batch](https://inference-docs.cerebras.ai/api-reference/batch/retrieve-batch.md) - [Retrieve batch results](https://inference-docs.cerebras.ai/api-reference/batch/retrieve-batch-results.md) - [Chat Completions](https://inference-docs.cerebras.ai/api-reference/chat-completions.md) - [Completions](https://inference-docs.cerebras.ai/api-reference/completions.md) - [Delete file](https://inference-docs.cerebras.ai/api-reference/file/delete-file.md) - [List files](https://inference-docs.cerebras.ai/api-reference/file/list-files.md) - [Retrieve file](https://inference-docs.cerebras.ai/api-reference/file/retrieve-file.md) - [Retrieve file content](https://inference-docs.cerebras.ai/api-reference/file/retrieve-file-content.md) - [Upload file](https://inference-docs.cerebras.ai/api-reference/file/upload-file.md) - [List models](https://inference-docs.cerebras.ai/api-reference/models/list-models.md) - [Retrieve model](https://inference-docs.cerebras.ai/api-reference/models/retrieve-model.md) - [Batch API](https://inference-docs.cerebras.ai/capabilities/batch.md): Run large-scale inference workloads asynchronously at half the cost. - [CePO: Cerebras Planning & Optimization​](https://inference-docs.cerebras.ai/capabilities/cepo.md): Improving Llama's reasoning abilities with test-time compute - [Predicted Outputs](https://inference-docs.cerebras.ai/capabilities/predicted-outputs.md): Reduce latency by specifying parts of the response that are already known. - [Prompt Caching](https://inference-docs.cerebras.ai/capabilities/prompt-caching.md): Store and reuse previously processed prompts to reduce latency and increase response times for similar or repeated queries. - [Reasoning](https://inference-docs.cerebras.ai/capabilities/reasoning.md): Reasoning allows models to provide transparent insight into their thought process by generating reasoning tokens before producing their final response. These reasoning tokens show the step-by-step logic the model uses to arrive at its answer. - [Streaming Responses](https://inference-docs.cerebras.ai/capabilities/streaming.md): Learn how to enable streaming responses in the Cerebras API. - [Structured Outputs](https://inference-docs.cerebras.ai/capabilities/structured-outputs.md): Generate structured data with the Cerebras Inference API - [Tool Calling](https://inference-docs.cerebras.ai/capabilities/tool-use.md): Learn how to connect models to external tools with tool calling. - [Inference Cookbook](https://inference-docs.cerebras.ai/cookbook.md) - [Automate User Research with LangChain](https://inference-docs.cerebras.ai/cookbook/agents/automate-user-research.md): Learn how to build an AI-powered user research system that can automatically generate user personas, conduct interviews, and synthesize insights using LangGraph's multi-agent workflow in under 60 seconds. - [Build Your Own Perplexity with Exa](https://inference-docs.cerebras.ai/cookbook/agents/build-your-own-perplexity.md): Learn how to build a Perplexity-style deep research assistant that can automatically search the web, analyzes multiple sources, and provide structured insights in under 60 seconds. - [Implementing Gist Memory: Summarizing and Searching Long Documents with a ReadAgent](https://inference-docs.cerebras.ai/cookbook/agents/gist-memory.md): Build an AI agent that reads, summarizes, and answers questions about long documents using Gist Memory and the Cerebras Inference SDK. - [Interviewer Voice Agent with LiveKit](https://inference-docs.cerebras.ai/cookbook/agents/livekit.md): Learn how to build an interview voice agent. - [Build a Real-Time AI Sales Agent with LiveKit](https://inference-docs.cerebras.ai/cookbook/agents/sales-agent-cerebras-livekit.md): Learn how to build a sophisticated real-time voice sales agent that can have natural conversations with potential customers. The resulting AI agent will be able to process audio input and generate spoken replies by drawing information directly from your company's sales materials. - [Automating Search-Based Report Generation with a Multi-Agent AI Pipeline](https://inference-docs.cerebras.ai/cookbook/agents/search-agent.md): Build an AI agent pipeline that searches, summarizes, and synthesizes information from multiple sources to generate comprehensive reports. - [Integrations](https://inference-docs.cerebras.ai/integrations.md): We currently support a number of integrations that allow you to do more with the Cerebras Inference SDK. - [Get Started with Cline and Cerebras](https://inference-docs.cerebras.ai/integrations/cline.md): Set up Cline, an open-source AI coding assistant, to work with Cerebras inference. - [Cerebras Code MCP Server](https://inference-docs.cerebras.ai/integrations/code-mcp.md): Learn how to install, configure, and integrate the Cerebras Code MCP server with supported editors. - [Get Started with KiloCode](https://inference-docs.cerebras.ai/integrations/kilocode.md): Learn how to integrate KiloCode, an AI-powered autonomous coding assistant for VS Code, with Cerebras's ultra-fast inference for planning, building, and fixing code. - [Get Started with OpenCode](https://inference-docs.cerebras.ai/integrations/opencode.md): Configure OpenCode to use Cerebras Inference for AI coding assistance. - [Get Started with RooCode and Cerebras](https://inference-docs.cerebras.ai/integrations/roocode.md): Configure RooCode to use Cerebras Inference for autonomous coding assistance. - [Get Started with Cerebras for VS Code](https://inference-docs.cerebras.ai/integrations/vscode.md): Configure Visual Studio Code to use Cerebras Inference for autonomous coding assistance. - [Build with the Speed of Cerebras](https://inference-docs.cerebras.ai/introduction.md): Experience real-time AI responses for code generation, summarization, and autonomous tasks with the world’s fastest AI inference. - [Llama 3.1 8B](https://inference-docs.cerebras.ai/models/llama-31-8b.md): This model excels in speed-critical scenarios like real-time chat, customer service, interactive gaming, and live content generation. Perfect for high-throughput tasks including batch processing, concurrent API requests, and data pipelines. - [Llama 3.3 70B](https://inference-docs.cerebras.ai/models/llama-33-70b.md): This model delivers enhanced performance for chat, coding, instruction following, mathematics, and reasoning use cases. - [OpenAI GPT OSS](https://inference-docs.cerebras.ai/models/openai-oss.md): This model excels at efficient reasoning across science, math, and coding applications. It's ideal for real-time coding assistance, processing large documents for Q&A and summarization, agentic research workflows, and regulated on-premises workloads. - [Supported Models](https://inference-docs.cerebras.ai/models/overview.md) - [Qwen 3 235B Instruct](https://inference-docs.cerebras.ai/models/qwen-3-235b-2507.md): This non-thinking version offers powerful multilingual capabilities with significant improvements in instruction following, logical reasoning, mathematics, coding, and tool usage. - [Qwen 3 32B](https://inference-docs.cerebras.ai/models/qwen-3-32b.md): This is a hybrid reasoning model that can operate with or without thinking tokens. It's ideal for complex reasoning tasks, multi-step workflows, and applications requiring both speed and intelligence. - [Z.ai GLM 4.6](https://inference-docs.cerebras.ai/models/zai-glm-46.md): This model delivers strong coding performance with advanced reasoning capabilities, superior tool use, and enhanced real-world performance in agentic coding applications. - [QuickStart](https://inference-docs.cerebras.ai/quickstart.md): Get started with the Cerebras API. - [Get Started with Cerebras Code](https://inference-docs.cerebras.ai/resources/cerebras-code.md): Choose from direct platform integrations or standalone tools to get fast, flexible AI-powered coding assistance that integrates seamlessly with your existing development tools. - [Migrate to GLM 4.6](https://inference-docs.cerebras.ai/resources/glm-46-migration.md): Learn how to optimize GLM 4.6 performance when migrating from other models. - [OpenAI Compatibility](https://inference-docs.cerebras.ai/resources/openai.md): Use the OpenAI Client Libraries with Cerebras Inference - [API Playground](https://inference-docs.cerebras.ai/resources/playground.md) - [Change Log](https://inference-docs.cerebras.ai/support/change-log.md) - [Deprecations](https://inference-docs.cerebras.ai/support/deprecation.md): A list of all deprecations, with the most recent announcements appearing first. - [Error Codes](https://inference-docs.cerebras.ai/support/error.md) - [Policies](https://inference-docs.cerebras.ai/support/policies.md) - [Pricing](https://inference-docs.cerebras.ai/support/pricing.md) - [Rate Limits](https://inference-docs.cerebras.ai/support/rate-limits.md): Learn how rate limits are applied and measured. - [Service Status](https://inference-docs.cerebras.ai/support/status.md) ## Optional - [Python SDK](https://github.com/Cerebras/cerebras-cloud-sdk-python) - [Node.js SDK](https://github.com/Cerebras/cerebras-cloud-sdk-node)