Production Models

Production models are are fully supported offerings intended for use in production environments.
Model NameModel IDParametersSpeed (tokens/s)
Llama 4 Scoutllama-4-scout-17b-16e-instruct109 billion~2600
Llama 3.1 8Bllama3.1-8b8 billion~2200
Llama 3.3 70Bllama-3.3-70b70 billion~2100
OpenAI GPT OSSgpt-oss-120b120 billion~2800
Qwen 3 32Bqwen-3-32b32 billion~2600

Preview Models

Preview models are hosted on Cerebras with full accuracy and performance. Please note that these preview models are intended for evaluation purposes only and should not be used in production, as they may be discontinued with short notice.
Model NameModel IDParametersSpeed (tokens/s)
Llama 4 Maverickllama-4-maverick-17b-128e-instruct400 billion~2400
Qwen 3 235B Instructqwen-3-235b-a22b-instruct-2507235 billion~1400
Qwen 3 235B Thinkingqwen-3-235b-a22b-thinking-2507235 billion~1700
Qwen 3 480B Coderqwen-3-coder-480b480 billion~2000