Get Started
Overview
The Cerebras Inference API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
The Cerebras Inference API currently provides access to models from Meta’s Llama family, including Llama 4 Scout and Llama 3.3 70B, as well as DeepSeek R1 Distill Llama 70B (available upon request).
Model Name | Model ID | Parameters | Knowledge |
---|---|---|---|
Llama 4 Scout | llama-4-scout-17b-16e-instruct | 109 billion | August 2024 |
Llama 4 Maverick | Coming Soon! | 400 billion | August 2024 |
Llama 3.1 8B | llama3.1-8b | 8 billion | March 2023 |
Llama 3.3 70B | llama-3.3-70b | 70 billion | December 2023 |
DeepSeek R1 Distill Llama 70B* | deepSeek-r1-distill-llama-70B | 70 billion | December 2023 |
* DeepSeek R1 Distill Llama 70B is available in private preview. Please contact us to request access.
Our free tier supports a context length of 8,192 tokens. For all supported models, we also offer context lengths up to 128K upon request. To gain access, please contact us here!
-
Play with our live chatbot demo.
-
Experiment with our inference solution in the playground before making an API call.
-
Explore our API reference documentation.
Was this page helpful?