qwen-3-32b
.llama-4-scout-17b-16e-instruct
.strict
is set to true
. This feature allows you to enforce consistent JSON outputs for models, which is useful when building applications that need to process AI-generated data programmatically.log_probs
and top_log_probs
in the chat/completions
endpoint.llama3.1-70b
model will be automatically upgraded to llama-3.3-70b
. Any existing references to llama3.1-70b
in your code will continue to work during a short-term aliasing period. However, we strongly encourage you to update your references to the llama-3.3-70b
model as soon as possible, since the aliasing will not be maintained indefinitely.llama-3.3-70b
, Meta’s newly released model that delivers enhanced performance across popular benchmarks for use cases including chat, coding, instruction following, mathematics, and reasoning. We serve this model at a speed of 2100+ tokens per second.completions
endpoint.max_tokens
parameters has been renamed to max_completion_tokens
, to maintain consistency with OpenAI’s syntax.