The Cerebras Inference API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
The Cerebras Inference API currently provides access to the following models:
Model Name | Model ID | Parameters | Speed (tokens/s) |
---|---|---|---|
Llama 4 Scout | llama-4-scout-17b-16e-instruct | 109 billion | ~2600 tokens/s |
Llama 3.1 8B | llama3.1-8b | 8 billion | ~2200 tokens/s |
Llama 3.3 70B | llama-3.3-70b | 70 billion | ~2100 tokens/s |
Qwen 3 32B* | qwen-3-32b | 32 billion | ~2100 tokens/s |
DeepSeek R1 Distill Llama 70B* | deepseek-r1-distill-llama-70b | 70 billion | ~1700 tokens/s |
/no_think
in the prompt. For example: "Write a python script to calculate the area of a circle /no_think"
.Play with our live chatbot demo.
For information on pricing and context length, visit our pricing page.
Experiment with our inference solution in the playground before making an API call.
Explore our API reference documentation.
The Cerebras Inference API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
The Cerebras Inference API currently provides access to the following models:
Model Name | Model ID | Parameters | Speed (tokens/s) |
---|---|---|---|
Llama 4 Scout | llama-4-scout-17b-16e-instruct | 109 billion | ~2600 tokens/s |
Llama 3.1 8B | llama3.1-8b | 8 billion | ~2200 tokens/s |
Llama 3.3 70B | llama-3.3-70b | 70 billion | ~2100 tokens/s |
Qwen 3 32B* | qwen-3-32b | 32 billion | ~2100 tokens/s |
DeepSeek R1 Distill Llama 70B* | deepseek-r1-distill-llama-70b | 70 billion | ~1700 tokens/s |
/no_think
in the prompt. For example: "Write a python script to calculate the area of a circle /no_think"
.Play with our live chatbot demo.
For information on pricing and context length, visit our pricing page.
Experiment with our inference solution in the playground before making an API call.
Explore our API reference documentation.