Context Length

Free Tier8,192 tokens
Paid TiersUp to 64k

Speed

~2100
tokens/sec

Input / Output

Input Formats JSON, plain text
Output FormatsJSON, plain text, structured

Model Notes

Model ID: llama3.3-70b
Available through our Hugging Face InferenceClient.

Rate Limits

TierRequests/minInput Tokens/minOutput Tokens/minDaily Tokens
Free3060k-1M
1300300k30k41M
2600600k60k85M
310001M100k140M
412001.2M120k190M
514501.45M145k275M

Endpoints

Chat Completions
Completions

Features

Streaming
Structured Outputs
Streaming w/ Structured Outputs
Tool Calling
Parallel Tool Calling
Top P
Temperature
Max Completion Tokens
Logit Probabilities

Need Higher Limits?

Reach out for custom pricing with our Enterprise tier for higher rate limits and dedicated support.

Contact Sales