Choose a Model - Cerebras Inference

Not sure where to start? Ask the assistant — it can recommend a model based on your use case.

Use this guide to find the right model for your use case on Cerebras. All models listed below are served on dedicated endpoints for enterprise workloads with reserved capacity. A subset is also accessible on public endpoints with no additional setup. For architectural guidance on getting the most out of Cerebras speed, see Designing for Cerebras.

Category	Use Case	Large (>200B)	Medium (20B–200B)	Why Cerebras?
Code & Development	Code generation & reasoning	Kimi K2.6 GLM 5.1 GLM 4.7 (Public)	Gemma 4 31B Qwen3 32B	More reasoning over code, requirements, and edge cases without breaking developer flows.
	Code completion & bug fixing	MiniMax M2.5 GLM 4.7 (Public)	Gemma 4 31B Qwen3 32B	Generate, critique, and repair code in multiple passes at the speed you type.
	Terminal tasks	Kimi K2.6 GLM 5.1 MiniMax M2.5 GLM 4.7 (Public)	GPT OSS 120B (Public)	Agents can reason between commands, inspect results, and continue acting while the experience remains interactive.
AI-Powered Apps	Agents with tool use	Kimi K2.6 MiniMax M2.5 GLM 5.1 GLM 4.7 (Public)	GPT OSS 120B (Public) Qwen3 32B	More tool calls, plan/act/observe loops, and recovery attempts per turn, keeping users engaged.
	General reasoning & planning	Kimi K2.6 MiniMax M2.5 GLM 5.1 GLM 4.7 (Public)	Gemma 4 31B	More planning, comparison, and verification steps within the same practical response window.
	Summarization	MiniMax M2.5	Gemma 4 31B GPT OSS 120B (Public)	Longer context, deeper synthesis, and less aggressive compression without making users wait.
	Low-latency NLU & extraction		GPT OSS 120B (Public)	Inline extraction with validation, correction, and structured outputs fast enough for production workflows.
Vision & Multimodal	Vision & document understanding	Kimi K2.6	Gemma 4 31B	Richer reasoning across text, image, and other inputs while keeping multimodal workflows responsive.

Looking for the full dedicated model catalog? See Dedicated Endpoints.

Migrate from Closed Models

If you’re moving from Claude, GPT, or Gemini, here are open-source alternatives available on Cerebras.

Provider	Closed Source	Use Case	Open Source Alternatives
Claude	Claude Opus 4.7	Complex multi-step reasoning where end-to-end correctness is crucial	Kimi K2.6 GLM 5.1
	Claude Sonnet 4.7	Multi-file refactors, agentic coding loops, code review	Kimi K2.6 GLM 5.1 MiniMax M2.5 GLM 4.7
	Claude Haiku 4.5	Customer support, classification, extraction, short-form generation, sub-agents in multi-agent systems	Gemma 4 31B GPT OSS 120B MiniMax M2.5
OpenAI GPT	GPT 5.5	Frontier reasoning, complex coding, long agentic chains	Kimi K2.6 GLM 5.1
	GPT 5.4 Nano/Mini	Balanced reasoning and coding, sub-agents in multi-agent systems, structured tasks	MiniMax M2.5 GLM 4.7 Gemma 4 31B GPT OSS 120B
Gemini	Gemini 3.1 Pro	Image understanding for coding, document analysis, and scientific tasks	Kimi K2.6 GLM 5.1
	Gemini 3.1 Pro Flash & Flash Lite	Low-latency multimodal chat and tool calling for real-time UX	Gemma 4 31B GPT OSS 120B MiniMax M2.5

Explore the full model catalog at Dedicated Endpoints, or get started with the Quickstart.

Documentation Index

​Migrate from Closed Models

Migrate from Closed Models