The Cerebras API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.

Currently, the Cerebras API provides access to two models: Meta’s Llama 3.1 8B and 70B models. Both models are instruction-tuned and can be used for conversational applications.

Llama 3.1 8B

  • Model ID: llama3.1-8b
  • Parameters: 8 billion
  • Knowledge cutoff: March 2023
  • Context Length: 8192
  • Training Tokens: 15 trillion

Llama 3.1 70B

  • Model ID: llama3.1-70b
  • Parameters: 70 billion
  • Knowledge cutoff: December 2023
  • Context Length: 8192
  • Training Tokens: 15 trillion
Due to high demand in our early launch phase, we are temporarily limiting Llama 3.1 models to a context window of 8192 in our Free Tier. If your use case or application would benefit from longer context windows, please let us know!

Resources

Was this page helpful?