Get Started
Overview
The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
Currently, the Cerebras API provides access to two models: Meta’s Llama 3.1 8B and 70B models. Both models are instruction-tuned and can be used for conversational applications.
Llama 3.1 8B
- Model ID:
llama3.1-8b
- Parameters: 8 billion
- Knowledge cutoff: March 2023
- Context Length: 8192
- Training Tokens: 15 trillion
Llama 3.1 70B
- Model ID:
llama3.1-70b
- Parameters: 70 billion
- Knowledge cutoff: December 2023
- Context Length: 8192
- Training Tokens: 15 trillion
Due to high demand in our early launch phase, we are temporarily limiting Llama 3.1 models to a context window of 8192 in our Free Tier. If your use case or application would benefit from longer context windows, please let us know!
QuickStart Guide
Resources
- Play with our live chatbot demo.
- Experiment with our inference solution in the playground before making an API call.
- Explore our API reference documentation.
Was this page helpful?