Overview
The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.
Currently, the Cerebras API provides access to Meta’s Llama 3.1 8B and Llama 3.3 70B models. All models are instruction-tuned and can be used for conversational applications.
Llama 3.1 8B
-
Model ID:
llama3.1-8b
-
Parameters: 8 billion
-
Knowledge cutoff: March 2023
-
Context Length: 8192
-
Training Tokens: 15 trillion+
Llama 3.3 70B
-
Model ID:
llama-3.3-70b
-
Parameters: 70 billion
-
Knowledge cutoff: December 2023
-
Context Length: 8192
-
Training Tokens: 15 trillion+
-
Play with our live chatbot demo.
-
Experiment with our inference solution in the playground before making an API call.
-
Explore our API reference documentation.
Was this page helpful?