Production Models

Production models are are fully supported offerings intended for use in production environments.
Model NameModel IDParametersSpeed (tokens/s)
Llama 4 Scoutllama-4-scout-17b-16e-instruct109 billion~2600
Llama 3.1 8Bllama3.1-8b8 billion~2200
Llama 3.3 70Bllama-3.3-70b70 billion~2100
Qwen 3 32Bqwen-3-32b32 billion~2600

Preview Models

Preview models are hosted on Cerebras with full accuracy and performance. Please note that these preview models are intended for evaluation purposes only and should not be used in production, as they may be discontinued with short notice.
Model NameModel IDParametersSpeed (tokens/s)
Llama 4 Maverickllama-4-maverick-17b-128e-instruct17 billion~1500
Qwen 3 235B Instructqwen-3-235b-a22b-instruct-2507235 billion~1400
Qwen 3 235B Thinkingqwen-3-235b-a22b-thinking-2507235 billion~1700
DeepSeek R1 Distill Llama 70B*deepseek-r1-distill-llama-70b70 billion~1700
* DeepSeek R1 Distill Llama 70B is available in private preview. Please contact us to request access.