Use this guide to find the right model for your use case on Cerebras. All models listed below are served on dedicated endpoints for enterprise workloads with reserved capacity. A subset is also accessible on public endpoints with no additional setup. For architectural guidance on getting the most out of Cerebras speed, see Designing for Cerebras.Documentation Index
Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Category | Use Case | Large (>200B) | Medium (20B–200B) | Small (<20B) | Why Cerebras? |
|---|---|---|---|---|---|
| Code & Development | Code generation & reasoning | Kimi K2.6 GLM 5.1 GLM 4.7 (Public) | Gemma 4 31B Qwen3 32B | More reasoning over code, requirements, and edge cases without breaking developer flows. | |
| Code completion & bug fixing | MiniMax M2.5 GLM 4.7 (Public) | Gemma 4 31B Qwen3 32B | Generate, critique, and repair code in multiple passes at the speed you type. | ||
| Terminal tasks | Kimi K2.6 GLM 5.1 MiniMax M2.5 GLM 4.7 (Public) | GPT OSS 120B (Public) | Agents can reason between commands, inspect results, and continue acting while the experience remains interactive. | ||
| AI-Powered Apps | Agents with tool use | Kimi K2.6 MiniMax M2.5 GLM 5.1 GLM 4.7 (Public) | GPT OSS 120B (Public) Qwen3 32B | More tool calls, plan/act/observe loops, and recovery attempts per turn, keeping users engaged. | |
| General reasoning & planning | Kimi K2.6 MiniMax M2.5 GLM 5.1 GLM 4.7 (Public) | Gemma 4 31B Qwen3 235B-A22B (Public) | More planning, comparison, and verification steps within the same practical response window. | ||
| Summarization | MiniMax M2.5 Qwen3 235B-A22B (Public) | Gemma 4 31B GPT OSS 120B (Public) | Longer context, deeper synthesis, and less aggressive compression without making users wait. | ||
| Low-latency NLU & extraction | GPT OSS 120B (Public) | Llama 3.1 8B (Public) | Inline extraction with validation, correction, and structured outputs fast enough for production workflows. | ||
| Vision & Multimodal | Vision & document understanding | Kimi K2.6 | Gemma 4 31B | Richer reasoning across text, image, and other inputs while keeping multimodal workflows responsive. |
Migrate from Closed Models
If you’re moving from Claude, GPT, or Gemini, here are open-source alternatives available on Cerebras.| Provider | Closed Source | Use Case | Open Source Alternatives |
|---|---|---|---|
| Claude | Claude Opus 4.7 | Complex multi-step reasoning where end-to-end correctness is crucial | Kimi K2.6 GLM 5.1 |
| Claude Sonnet 4.7 | Multi-file refactors, agentic coding loops, code review | Kimi K2.6 GLM 5.1 MiniMax M2.5 GLM 4.7 | |
| Claude Haiku 4.5 | Customer support, classification, extraction, short-form generation, sub-agents in multi-agent systems | Gemma 4 31B GPT OSS 120B MiniMax M2.5 | |
| OpenAI GPT | GPT 5.5 | Frontier reasoning, complex coding, long agentic chains | Kimi K2.6 GLM 5.1 |
| GPT 5.4 Nano/Mini | Balanced reasoning and coding, sub-agents in multi-agent systems, structured tasks | MiniMax M2.5 GLM 4.7 Gemma 4 31B GPT OSS 120B | |
| Gemini | Gemini 3.1 Pro | Image understanding for coding, document analysis, and scientific tasks | Kimi K2.6 GLM 5.1 |
| Gemini 3.1 Pro Flash & Flash Lite | Low-latency multimodal chat and tool calling for real-time UX | Gemma 4 31B GPT OSS 120B MiniMax M2.5 |

