Skip to main content

Documentation Index

Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt

Use this file to discover all available pages before exploring further.

Not sure where to start? Ask the assistant — it can recommend a model based on your use case.
Use this guide to find the right model for your use case on Cerebras. All models listed below are served on dedicated endpoints for enterprise workloads with reserved capacity. A subset is also accessible on public endpoints with no additional setup. For architectural guidance on getting the most out of Cerebras speed, see Designing for Cerebras.
CategoryUse CaseLarge (>200B)Medium (20B–200B)Small (<20B)Why Cerebras?
Code & DevelopmentCode generation & reasoningKimi K2.6
GLM 5.1
GLM 4.7 (Public)
Gemma 4 31B
Qwen3 32B
More reasoning over code, requirements, and edge cases without breaking developer flows.
Code completion & bug fixingMiniMax M2.5
GLM 4.7 (Public)
Gemma 4 31B
Qwen3 32B
Generate, critique, and repair code in multiple passes at the speed you type.
Terminal tasksKimi K2.6
GLM 5.1
MiniMax M2.5
GLM 4.7 (Public)
GPT OSS 120B (Public)Agents can reason between commands, inspect results, and continue acting while the experience remains interactive.
AI-Powered AppsAgents with tool useKimi K2.6
MiniMax M2.5
GLM 5.1
GLM 4.7 (Public)
GPT OSS 120B (Public)
Qwen3 32B
More tool calls, plan/act/observe loops, and recovery attempts per turn, keeping users engaged.
General reasoning & planningKimi K2.6
MiniMax M2.5
GLM 5.1
GLM 4.7 (Public)
Gemma 4 31B
Qwen3 235B-A22B (Public)
More planning, comparison, and verification steps within the same practical response window.
SummarizationMiniMax M2.5
Qwen3 235B-A22B (Public)
Gemma 4 31B
GPT OSS 120B (Public)
Longer context, deeper synthesis, and less aggressive compression without making users wait.
Low-latency NLU & extractionGPT OSS 120B (Public)Llama 3.1 8B (Public)Inline extraction with validation, correction, and structured outputs fast enough for production workflows.
Vision & MultimodalVision & document understandingKimi K2.6Gemma 4 31BRicher reasoning across text, image, and other inputs while keeping multimodal workflows responsive.
Looking for the full dedicated model catalog? See Dedicated Endpoints.

Migrate from Closed Models

If you’re moving from Claude, GPT, or Gemini, here are open-source alternatives available on Cerebras.
ProviderClosed SourceUse CaseOpen Source Alternatives
ClaudeClaude Opus 4.7Complex multi-step reasoning where end-to-end correctness is crucialKimi K2.6
GLM 5.1
Claude Sonnet 4.7Multi-file refactors, agentic coding loops, code reviewKimi K2.6
GLM 5.1
MiniMax M2.5
GLM 4.7
Claude Haiku 4.5Customer support, classification, extraction, short-form generation, sub-agents in multi-agent systemsGemma 4 31B
GPT OSS 120B
MiniMax M2.5
OpenAI GPTGPT 5.5Frontier reasoning, complex coding, long agentic chainsKimi K2.6
GLM 5.1
GPT 5.4 Nano/MiniBalanced reasoning and coding, sub-agents in multi-agent systems, structured tasksMiniMax M2.5
GLM 4.7
Gemma 4 31B
GPT OSS 120B
GeminiGemini 3.1 ProImage understanding for coding, document analysis, and scientific tasksKimi K2.6
GLM 5.1
Gemini 3.1 Pro Flash & Flash LiteLow-latency multimodal chat and tool calling for real-time UXGemma 4 31B
GPT OSS 120B
MiniMax M2.5
Explore the full model catalog at Dedicated Endpoints, or get started with the Quickstart.