Skip to main content

Cerebras Inference home page

Community
Blog
Get an API Key
Get an API Key

Python SDK
Node.js SDK

Get Started

Overview
Quickstart
Pricing
Rate Limits

Capabilities

Reasoning
Streaming Responses
Predicted Outputs
Preview
Structured Outputs
Tool Calling
Prompt Caching
Payload Optimization
CePO: Cerebras Planning & Optimization

Dedicated Endpoints

Overview

Compatibility

OpenAI Compatibility
Migrate to GLM 4.7

Cloud Console

Projects

Resources

Designing for Cerebras
Integrations
API Playground

Support

Service Status
Error Codes
Change Log
Deprecations
Policies
Preview Releases

Llama 3.1 8B

This model excels in speed-critical scenarios like real-time chat, customer service, interactive gaming, and live content generation. Perfect for high-throughput tasks including batch processing, concurrent API requests, and data pipelines.

Documentation Index
Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.

Llama 3.1 8B will be deprecated on May 27, 2026.

Was this page helpful?

Qwen 3 235B Instruct

⌘I

Assistant

Responses are generated using AI and may contain mistakes.