Production Models
Production models are are fully supported offerings intended for use in production environments.Model Name | Model ID | Parameters | Speed (tokens/s) |
---|---|---|---|
Llama 4 Scout | llama-4-scout-17b-16e-instruct | 109 billion | ~2600 |
Llama 3.1 8B | llama3.1-8b | 8 billion | ~2200 |
Llama 3.3 70B | llama-3.3-70b | 70 billion | ~2100 |
OpenAI GPT OSS | gpt-oss-120b | 120 billion | ~3000 |
Qwen 3 32B | qwen-3-32b | 32 billion | ~2600 |
Preview Models
Preview models are hosted on Cerebras with full accuracy and performance. Please note that these preview models are intended for evaluation purposes only and should not be used in production, as they may be discontinued with short notice.Model Name | Model ID | Parameters | Speed (tokens/s) |
---|---|---|---|
Llama 4 Maverick | llama-4-maverick-17b-128e-instruct | 400 billion | ~2400 |
Qwen 3 235B Instruct | qwen-3-235b-a22b-instruct-2507 | 235 billion | ~1400 |
Qwen 3 235B Thinking | qwen-3-235b-a22b-thinking-2507 | 235 billion | ~1700 |
Qwen 3 480B Coder | qwen-3-coder-480b | 480 billion | ~2000 |