Get Started
Overview
The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
Currently, the Cerebras API provides access to Meta’s Llama 3.1 8B and Llama 3.3 70B models, as well as DeepSeek R1 Distill Llama 70B (available upon request). All models are instruction-tuned and can be used for conversational applications.
Model Name | Model ID | Parameters | Knowledge | Context |
---|---|---|---|---|
Llama 3.1 8B | llama3.1-8b | 8 billion | March 2023 | 8192 |
Llama 3.3 70B | llama-3.3-70b | 70 billion | December 2023 | 8192 |
DeepSeek R1 Distill Llama 70B* | deepSeek-r1-distill-llama-70B | 70 billion | December 2023 | 8192 |
*DeepSeek R1 Distill Llama 70B is available upon request. Please contact us to request access.
Due to high demand in our early launch phase, we are temporarily limiting Llama 3.1 and 3.3 models to a context window of 8192 in our Free Tier. If your use case or application would benefit from longer context windows, please let us know!
-
Play with our live chatbot demo.
-
Experiment with our inference solution in the playground before making an API call.
-
Explore our API reference documentation.
Was this page helpful?