Overview
The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
Currently, the Cerebras API provides access to two models: Meta’s Llama 3.1 8B and 70B models. Both models are instruction-tuned and can be used for conversational applications.
Llama 3.1 8B
-
Model ID:
llama3.1-8b
-
Parameters: 8 billion
-
Knowledge cutoff: March 2023
-
Context Length: 8192
-
Training Tokens: 15 trillion
Llama 3.1 70B
-
Model ID:
llama3.1-70b
-
Parameters: 70 billion
-
Knowledge cutoff: December 2023
-
Context Length: 8192
-
Training Tokens: 15 trillion
QuickStart Guide
Get started by building your first application using our QuickStart guide.
-
Play with our live chatbot demo.
-
Experiment with our inference solution in the playground before making an API call.
-
Explore our API reference documentation.
Was this page helpful?