This model excels in speed-critical scenarios like real-time chat, customer service, interactive gaming, and live content generation. Perfect for high-throughput tasks including batch processing, concurrent API requests, and data pipelines.
llama3.1-8b
Tier | Requests/min | Input Tokens/min | Output Tokens/min | Daily Tokens |
---|---|---|---|---|
Free | 30 | 60k | 8k/request | 1M |
1 | 600 | 600k | 60k | 245M |
2 | 1000 | 1M | 100k | 415M |
Chat Completions
Completions