Image Inputs

This feature is in Public Preview.

Vision-capable models can understand visual content alongside text — including objects, diagrams, screenshots, and any text that appears within an image (see Limitations for exceptions). Images are sent through the Chat Completions API as base64-encoded data URIs in the messages array.

Currently, image support is only available with gemma-4-31b.

Usage

To send an image, add an image_url object to the content array in a user message. The image must be base64-encoded and passed as a data URI.

Use the encoder in the Token Usage section to convert your image to a base64 data URI. It also shows the estimated token count and encoded payload size.

Single image
Multiple images

from cerebras.cloud.sdk import Cerebras
import os
import base64

client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("screenshot.png")

response = client.chat.completions.create(
    model="gemma-4-31b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in one concise sentence."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image}"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

import Cerebras from '@cerebras/cerebras_cloud_sdk';
import fs from 'fs';

const client = new Cerebras({
  apiKey: process.env['CEREBRAS_API_KEY'],
});

const base64Image = fs.readFileSync('screenshot.png').toString('base64');

const response = await client.chat.completions.create({
  model: 'gemma-4-31b',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image in one concise sentence.' },
        {
          type: 'image_url',
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
          },
        },
      ],
    },
  ],
});

console.log(response.choices[0].message.content);

# Encode image to base64 (macOS/Linux)
BASE64_IMAGE=$(base64 -i screenshot.png)
# Windows PowerShell:
# $BASE64_IMAGE = [Convert]::ToBase64String([IO.File]::ReadAllBytes("screenshot.png"))
# If you run this example in PowerShell, use curl.exe and replace ${BASE64_IMAGE} with $BASE64_IMAGE.

curl https://api.cerebras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CEREBRAS_API_KEY}" \
  -d "{
    \"model\": \"gemma-4-31b\",
    \"messages\": [
      {
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Describe this image in one concise sentence.\"},
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"data:image/png;base64,${BASE64_IMAGE}\"
            }
          }
        ]
      }
    ]
  }"

Include up to 5 images in a single request by adding additional image_url content parts to the content array. The model considers all images together when generating its response. Each image counts toward your token usage.

from cerebras.cloud.sdk import Cerebras
import os
import base64

client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image_1 = encode_image("image1.jpeg")
base64_image_2 = encode_image("image2.png")

response = client.chat.completions.create(
    model="gemma-4-31b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image_1}"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{base64_image_2}"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

import Cerebras from '@cerebras/cerebras_cloud_sdk';
import fs from 'fs';

const client = new Cerebras({
  apiKey: process.env['CEREBRAS_API_KEY'],
});

const base64Image1 = fs.readFileSync('image1.jpeg').toString('base64');
const base64Image2 = fs.readFileSync('image2.png').toString('base64');

const response = await client.chat.completions.create({
  model: 'gemma-4-31b',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Compare these two images.' },
        {
          type: 'image_url',
          image_url: {
            url: `data:image/jpeg;base64,${base64Image1}`,
          },
        },
        {
          type: 'image_url',
          image_url: {
            url: `data:image/png;base64,${base64Image2}`,
          },
        },
      ],
    },
  ],
});

console.log(response.choices[0].message.content);

# Encode images to base64 (macOS/Linux)
BASE64_IMAGE_1=$(base64 -i image1.jpeg)
BASE64_IMAGE_2=$(base64 -i image2.png)

curl https://api.cerebras.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${CEREBRAS_API_KEY}" \
  -d "{
    \"model\": \"gemma-4-31b\",
    \"messages\": [
      {
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Compare these two images.\"},
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"data:image/jpeg;base64,${BASE64_IMAGE_1}\"
            }
          },
          {
            \"type\": \"image_url\",
            \"image_url\": {
              \"url\": \"data:image/png;base64,${BASE64_IMAGE_2}\"
            }
          }
        ]
      }
    ]
  }"

Input Requirements

Requirement	Details
Supported formats	PNG (`.png`), JPEG (`.jpeg`, `.jpg`)
Encoding	Base64 data URI (e.g., `data:image/png;base64,...`)
External image URLs	Not supported during Public Preview
Max payload size	`10 MB` total image payload per request ¹
Max images per request	`5` ¹

¹ These limits apply to the shared tier during Public Preview. Higher limits may be available for Dedicated Endpoints.

Token Usage

gemma-4-31b uses the default preprocessing setting of up to 280 image tokens per image. The model preserves image aspect ratio during preprocessing. Depending on the input dimensions, the image may be downscaled or upscaled before tokenization. The processed height and width are then rounded down to the nearest multiple of 48. As a result, token usage depends on the processed image dimensions, not the uploaded file size or original resolution.

Estimate Token Count

Upload an image below to copy its base64 data URI, check the encoded size, and view a token estimate. You can also estimate the token count manually with the following steps:

Start with the input width and height.

Compute the scale factor:

scale = sqrt(645120 / (width × height))

Multiply the width and height by the scale factor.

scaled_width = width × scale
scaled_height = height × scale

Round each processed dimension down to the nearest multiple of 48.

Compute the token count:

image_tokens = (processed_width / 48) × (processed_height / 48)

Cap the result at 280.

This means smaller images do not always use fewer image tokens. For example, a 336 × 226 image is upscaled during preprocessing to 960 × 624, which uses 260 image tokens.

Input resolution	Processed resolution	Image tokens used
336 × 226	960 × 624	260
512 × 512	768 × 768	256
672 × 672	768 × 768	256
1024 × 1024	768 × 768	256
1280 × 720	1056 × 576	264
1920 × 1080	1056 × 576	264
2560 × 1440	1056 × 576	264
3840 × 2160	1056 × 576	264
336 × 480	672 × 960	280
480 × 336	960 × 672	280

To validate image token usage, inspect usage.image_tokens in the API response. This field reports the total number of image tokens used by the request. usage.prompt_tokens includes text tokens, image tokens, and message-formatting tokens. The difference between image and text-only prompt_tokens can include message-formatting tokens and might not match usage.image_tokens. Keep the following in mind:

280 is the maximum image tokens per image for gemma-4-31b on Cerebras.
Compressed file size does not directly determine token count. Processed image dimensions matter more than PNG or JPEG byte size.
Image tokens are included in usage.prompt_tokens and are also reported in usage.image_tokens.
Image tokens occupy part of the model context window, just like text prompt tokens.

Limitations

Medical images — not suitable for interpreting specialized medical images such as CT scans or MRIs. Do not use for medical diagnosis or advice.
Small text — may have difficulty reading small or low-resolution text. Enlarging text within the image before sending can improve results.
Rotated content — may misinterpret text or images that are rotated or upside-down.
Graphs and charts — may struggle to distinguish visual elements that differ only in color or line style, such as solid versus dashed lines.
Spatial reasoning — not reliable for tasks requiring precise spatial localization, such as identifying positions on a map or board game.
Object counting — the model may give approximate counts for objects in images.
Image shape — may perform less accurately on panoramic or fisheye images.
Preprocessing — the model cannot access original filenames or metadata. Images may be resized before analysis — see Token Usage for details.
Accuracy — the model may generate inaccurate descriptions or captions in some scenarios. Verify outputs for high-stakes use cases.
CAPTCHAs — CAPTCHA images are not supported.
Indirect prompt injection — text embedded in an image is included in the model’s prompt context alongside the user’s text. If an image contains adversarial instructions (for example, text that says “ignore all previous instructions”) and the user prompt asks the model to answer based on the image, the model may follow those embedded instructions. Treat image content from untrusted sources as untrusted input, and use a system prompt to constrain the model’s behavior when processing images you don’t control.
Untrusted output — the model may transcribe or describe text from an image verbatim, including HTML, script tags, URLs, or control characters. The API returns this content unmodified. Treat it the same as any other untrusted input before rendering, logging, or executing it in your application.

FAQs

Do I need to resend the image on later turns?

Yes. Cerebras Chat Completions is stateless. If a follow-up request depends on an earlier image, include that image-bearing turn in the conversation history you send with the new request. Continue to include that turn for as long as the model needs the visual context.

Can I generate images?

No, only image input is supported. The model returns text only and does not generate images.

Is prompt caching supported with image inputs?

Yes. Prompt caching can help with repeated images and repeated multimodal context within your organization. Prompt caches are never shared between organizations and remain ephemeral. See Prompt Caching.

Do rate limits change with image support?

No. Image support uses the same rate limit framework as text. The same request and token limits still apply based on your organization and tier. For current details, see Rate Limits.

Do you store image data?

Image inputs are processed as soon as they are received, and the original image payloads are not persisted. After preprocessing, image tokens and image embeddings may be cached ephemerally within your organization to support prompt caching.

Get Started

Models

Capabilities

Dedicated Endpoints

Compatibility

Cloud Console

Resources

Support

Usage

Input Requirements

Token Usage

Estimate Token Count

Limitations

FAQs

​Usage

​Input Requirements

​Token Usage

​Estimate Token Count

​Limitations

​FAQs

Usage

Input Requirements

Token Usage

Estimate Token Count

Limitations

FAQs