Payload optimization is supported on /v1/chat/completions and /v1/completions. Support on Dedicated Endpoints may vary by model.
Encoding Options
The Cerebras API accepts the following request body encodings. The table shows the expected payload size reduction for each option:| Content-Type | Description | Chat Completions* | Completions** |
|---|---|---|---|
application/json | Default JSON encoding | Baseline | Baseline |
application/vnd.msgpack | msgpack binary encoding | up to ~5% | up to ~56% |
application/json + Content-Encoding: gzip | JSON with gzip compression | up to ~98% | up to ~68% |
application/vnd.msgpack + Content-Encoding: gzip | msgpack + gzip | up to ~98% | up to ~69% |
** Measured against a 50k token-ID completions payload (331 KB JSON baseline). You can use msgpack encoding or gzip compression independently, or combine them for maximum compression. Smaller request payloads reduce network transfer time, a contributing factor to TTFT. Actual TTFT improvement will vary, as network transfer is one of several factors that contribute to overall latency.
When to Use Payload Optimization
Optimizing payload size with request compression is most beneficial for:- Long prompts – requests with long system prompts, extensive conversation history, or large code blocks. Gzip compression is the most effective option for these payloads.
- Token-ID completions –
/v1/completionspayloads using integer token arrays benefit from both msgpack encoding and gzip compression - Tool call-heavy payloads – requests with many tool definitions or deeply nested JSON structures. Both msgpack encoding and gzip compression provide savings.
Size reductions depend on payload content. The benchmarks above used a token-ID completions payload, where msgpack’s integer encoding provides the greatest benefit. String-heavy chat payloads may see smaller msgpack reductions, while payloads with deeply nested structures (e.g., tool calls) may see greater savings. Gzip benefits are more consistent across payload types.
msgpack Encoding
msgpack is a binary serialization format that produces smaller payloads than equivalent JSON. To use it, serialize your request body with msgpack and set theContent-Type header to application/vnd.msgpack.
Gzip Compression
You can gzip-compress any request body and set theContent-Encoding: gzip header. This works with both JSON and msgpack payloads.

