Models
Llama 3.1 8B
This model excels in speed-critical scenarios like real-time chat, customer service, interactive gaming, and live content generation. Perfect for high-throughput tasks including batch processing, concurrent API requests, and data pipelines.
Context Length
Free Tier8,192 tokens
Paid TiersUp to 32k
Speed
~2200
tokens/sec
Input / Output
Input Formats JSON, plain text
Output FormatsJSON, plain text, structured
Model Notes
Model ID:
llama3.1-8b
Rate Limits
Tier | Requests/min | Input Tokens/min | Output Tokens/min | Daily Tokens |
---|---|---|---|---|
Free | 30 | 60k | - | 1M |
1 | 600 | 600k | 60k | 245M |
2 | 1000 | 1M | 100k | 415M |
Endpoints
Chat Completions
Completions
Features
Streaming
Structured Outputs
Streaming w/ Structured Outputs
Tool Calling
Multi-Turn Tool Calling
Tool Calling w/ Structured Outputs
Need Higher Limits?
Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.
Contact Sales