{truncateName(meta.name, 40)}
{meta.fmt} · {meta.width}×{meta.height} px · ~{meta.tokens} tokens
File {meta.fileSize} · Encoded {meta.uriSize}
Data URI

        <div className="flex items-center gap-3 mt-2">
          <button onClick={copy} className="flex-1 py-2 px-4 rounded-lg border border-gray-200 dark:border-gray-700 text-sm cursor-pointer bg-transparent hover:bg-gray-50 dark:hover:bg-gray-900 transition-colors">
            {copied ? '✓ Copied' : 'Copy data URI'}
          </button>
        </div>
      </div>;
  }
  return <div role="button" tabIndex={0} aria-label="Upload image to encode as base64 data URI" className={`not-prose rounded-xl border-2 border-dashed p-10 text-center cursor-pointer transition-colors my-4 ${dragging ? 'border-gray-400 bg-gray-50 dark:bg-gray-900' : 'border-gray-200 dark:border-gray-700'}`} onClick={() => inputRef.current && inputRef.current.click()} onKeyDown={e => e.key === 'Enter' && inputRef.current && inputRef.current.click()} onDragOver={e => {
    e.preventDefault();
    setDragging(true);
  }} onDragLeave={() => setDragging(false)} onDrop={e => {
    e.preventDefault();
    setDragging(false);
    process(e.dataTransfer.files[0]);
  }}>
      <div className="flex flex-col items-center gap-3">
        <p className="text-sm text-gray-500 m-0">Drop a PNG or JPEG here</p>
        <button onClick={e => {
    e.stopPropagation();
    inputRef.current && inputRef.current.click();
  }} className="px-4 py-2 text-sm rounded-lg border border-gray-200 dark:border-gray-700 cursor-pointer bg-transparent hover:bg-gray-50 dark:hover:bg-gray-900 transition-colors">
          Browse files
        </button>
        {stage === 'error' && <p className="text-xs text-red-500 m-0">Only PNG and JPEG files are supported.</p>}
      </div>
      <input ref={inputRef} type="file" accept="image/png,image/jpeg,.png,.jpg,.jpeg" className="hidden" onChange={e => process(e.target.files[0])} />
    </div>;
};

<Callout icon="lock" color="rgba(159, 156, 156, 1)" iconType="regular">
  This feature is in [Private Preview](/support/preview-releases). For access or more information, [contact us](https://www.cerebras.ai/contact) or reach out to your account representative.
</Callout>

Vision-capable models can understand visual content alongside text — including objects, diagrams, screenshots, and any text that appears within an image (see [Limitations](#limitations) for exceptions). Images are sent through the [Chat Completions](/api-reference/chat-completions) API as base64-encoded data URIs in the `messages` array.

<Note>
  Currently, image support is only available with `gemma-4-31b`.
</Note>

## Usage

To send an image, add an `image_url` object to the `content` array in a user message. The image must be base64-encoded and passed as a data URI.

<Tip>
  Use the [encoder in the Token Usage section](#estimate-token-count) to convert your image to a base64 data URI. It also shows the estimated token count and encoded payload size.
</Tip>

<Tabs>
  <Tab title="Single image">
    <CodeGroup>
      ```python Python theme={null}
      from cerebras.cloud.sdk import Cerebras
      import os
      import base64

      client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))

      def encode_image(image_path):
          with open(image_path, "rb") as image_file:
              return base64.b64encode(image_file.read()).decode("utf-8")

      base64_image = encode_image("screenshot.png")

      response = client.chat.completions.create(
          model="gemma-4-31b",
          messages=[
              {
                  "role": "user",
                  "content": [
                      {"type": "text", "text": "Describe this image in one concise sentence."},
                      {
                          "type": "image_url",
                          "image_url": {
                              "url": f"data:image/png;base64,{base64_image}"
                          },
                      },
                  ],
              }
          ],
      )

      print(response.choices[0].message.content)
      ```

      ```javascript Node.js theme={null}
      import Cerebras from '@cerebras/cerebras_cloud_sdk';
      import fs from 'fs';

      const client = new Cerebras({
        apiKey: process.env['CEREBRAS_API_KEY'],
      });

      const base64Image = fs.readFileSync('screenshot.png').toString('base64');

      const response = await client.chat.completions.create({
        model: 'gemma-4-31b',
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: 'Describe this image in one concise sentence.' },
              {
                type: 'image_url',
                image_url: {
                  url: `data:image/png;base64,${base64Image}`,
                },
              },
            ],
          },
        ],
      });

      console.log(response.choices[0].message.content);
      ```

      ```bash cURL theme={null}
      # Encode image to base64 (macOS/Linux)
      BASE64_IMAGE=$(base64 -i screenshot.png)
      # Windows PowerShell:
      # $BASE64_IMAGE = [Convert]::ToBase64String([IO.File]::ReadAllBytes("screenshot.png"))
      # If you run this example in PowerShell, use curl.exe and replace ${BASE64_IMAGE} with $BASE64_IMAGE.

      curl https://api.cerebras.ai/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${CEREBRAS_API_KEY}" \
        -d "{
          \"model\": \"gemma-4-31b\",
          \"messages\": [
            {
              \"role\": \"user\",
              \"content\": [
                {\"type\": \"text\", \"text\": \"Describe this image in one concise sentence.\"},
                {
                  \"type\": \"image_url\",
                  \"image_url\": {
                    \"url\": \"data:image/png;base64,${BASE64_IMAGE}\"
                  }
                }
              ]
            }
          ]
        }"
      ```
    </CodeGroup>
  </Tab>

  <Tab title="Multiple images">
    Include up to 5 images in a single request by adding additional `image_url` content parts to the `content` array. The model considers all images together when generating its response. Each image counts toward your [token usage](#token-usage).

    <CodeGroup>
      ```python Python theme={null}
      from cerebras.cloud.sdk import Cerebras
      import os
      import base64

      client = Cerebras(api_key=os.environ.get("CEREBRAS_API_KEY"))

      def encode_image(image_path):
          with open(image_path, "rb") as image_file:
              return base64.b64encode(image_file.read()).decode("utf-8")

      base64_image_1 = encode_image("image1.jpeg")
      base64_image_2 = encode_image("image2.png")

      response = client.chat.completions.create(
          model="gemma-4-31b",
          messages=[
              {
                  "role": "user",
                  "content": [
                      {"type": "text", "text": "Compare these two images."},
                      {
                          "type": "image_url",
                          "image_url": {
                              "url": f"data:image/jpeg;base64,{base64_image_1}"
                          },
                      },
                      {
                          "type": "image_url",
                          "image_url": {
                              "url": f"data:image/png;base64,{base64_image_2}"
                          },
                      },
                  ],
              }
          ],
      )

      print(response.choices[0].message.content)
      ```

      ```javascript Node.js theme={null}
      import Cerebras from '@cerebras/cerebras_cloud_sdk';
      import fs from 'fs';

      const client = new Cerebras({
        apiKey: process.env['CEREBRAS_API_KEY'],
      });

      const base64Image1 = fs.readFileSync('image1.jpeg').toString('base64');
      const base64Image2 = fs.readFileSync('image2.png').toString('base64');

      const response = await client.chat.completions.create({
        model: 'gemma-4-31b',
        messages: [
          {
            role: 'user',
            content: [
              { type: 'text', text: 'Compare these two images.' },
              {
                type: 'image_url',
                image_url: {
                  url: `data:image/jpeg;base64,${base64Image1}`,
                },
              },
              {
                type: 'image_url',
                image_url: {
                  url: `data:image/png;base64,${base64Image2}`,
                },
              },
            ],
          },
        ],
      });

      console.log(response.choices[0].message.content);
      ```

      ```bash cURL theme={null}
      # Encode images to base64 (macOS/Linux)
      BASE64_IMAGE_1=$(base64 -i image1.jpeg)
      BASE64_IMAGE_2=$(base64 -i image2.png)

      curl https://api.cerebras.ai/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer ${CEREBRAS_API_KEY}" \
        -d "{
          \"model\": \"gemma-4-31b\",
          \"messages\": [
            {
              \"role\": \"user\",
              \"content\": [
                {\"type\": \"text\", \"text\": \"Compare these two images.\"},
                {
                  \"type\": \"image_url\",
                  \"image_url\": {
                    \"url\": \"data:image/jpeg;base64,${BASE64_IMAGE_1}\"
                  }
                },
                {
                  \"type\": \"image_url\",
                  \"image_url\": {
                    \"url\": \"data:image/png;base64,${BASE64_IMAGE_2}\"
                  }
                }
              ]
            }
          ]
        }"
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Input Requirements

| Requirement            | Details                                              |
| ---------------------- | ---------------------------------------------------- |
| Supported formats      | PNG (`.png`), JPEG (`.jpeg`, `.jpg`)                 |
| Encoding               | Base64 data URI (e.g., `data:image/png;base64,...`)  |
| External image URLs    | Not supported during Public Preview                  |
| Max payload size       | `10 MB` total image payload per request <sup>1</sup> |
| Max images per request | `5` <sup>1</sup>                                     |

<div className="-mt-3 text-sm text-zinc-600 dark:text-zinc-400">
  <sup>1</sup> These limits apply to the shared tier during Public Preview. Higher limits may be available for [Dedicated Endpoints](/dedicated).
</div>

## Token Usage

`gemma-4-31b` uses the default preprocessing setting of up to 280 image tokens per image. The model preserves image aspect ratio during preprocessing. Depending on the input dimensions, the image may be downscaled or upscaled before tokenization. The processed height and width are then rounded down to the nearest multiple of 48.

As a result, token usage depends on the **processed image dimensions**, not the uploaded file size or original resolution.

### Estimate Token Count

Upload an image below to copy its base64 data URI, check the encoded size, and view a token estimate.

<ImageEncoder />

You can also estimate the token count manually with the following steps:

1. Start with the input width and height.

2. Compute the scale factor:

   ```text theme={null}
   scale = sqrt(645120 / (width × height))
   ```

3. Multiply the width and height by the scale factor.

   ```text theme={null}
   scaled_width = width × scale
   scaled_height = height × scale
   ```

4. Round each processed dimension down to the nearest multiple of 48.

5. Compute the token count:

   ```text theme={null}
   image_tokens = (processed_width / 48) × (processed_height / 48)
   ```

6. Cap the result at 280.

This means smaller images do not always use fewer image tokens. For example, a `336 × 226` image is upscaled during preprocessing to `960 × 624`, which uses `260` image tokens.

| Input resolution | Processed resolution | Image tokens used |
| ---------------- | -------------------- | ----------------- |
| 336 × 226        | 960 × 624            | 260               |
| 512 × 512        | 768 × 768            | 256               |
| 672 × 672        | 768 × 768            | 256               |
| 1024 × 1024      | 768 × 768            | 256               |
| 1280 × 720       | 1056 × 576           | 264               |
| 1920 × 1080      | 1056 × 576           | 264               |
| 2560 × 1440      | 1056 × 576           | 264               |
| 3840 × 2160      | 1056 × 576           | 264               |
| 336 × 480        | 672 × 960            | 280               |
| 480 × 336        | 960 × 672            | 280               |

To validate token usage for a specific request, send two otherwise identical requests — one with the image and one without — and compare the `usage.prompt_tokens` values. Cerebras does not currently return a separate `image_tokens` field.

Keep the following in mind:

* 280 is the maximum image tokens per image for `gemma-4-31b` on Cerebras.
* Compressed file size does not directly determine token count. Processed image dimensions matter more than PNG or JPEG byte size.
* Image tokens are added to text prompt tokens and reported together in `prompt_tokens` in the API response.
* Image tokens occupy part of the model context window, just like text prompt tokens.

## Limitations

* **Medical images** — not suitable for interpreting specialized medical images such as CT scans or MRIs. Do not use for medical diagnosis or advice.
* **Small text** — may have difficulty reading small or low-resolution text. Enlarging text within the image before sending can improve results.
* **Rotated content** — may misinterpret text or images that are rotated or upside-down.
* **Graphs and charts** — may struggle to distinguish visual elements that differ only in color or line style, such as solid versus dashed lines.
* **Spatial reasoning** — not reliable for tasks requiring precise spatial localization, such as identifying positions on a map or board game.
* **Object counting** — the model may give approximate counts for objects in images.
* **Image shape** — may perform less accurately on panoramic or fisheye images.
* **Preprocessing** — the model cannot access original filenames or metadata. Images may be resized before analysis — see [Token Usage](#token-usage) for details.
* **Accuracy** — the model may generate inaccurate descriptions or captions in some scenarios. Verify outputs for high-stakes use cases.
* **CAPTCHAs** — CAPTCHA images are not supported.
* **Indirect prompt injection** — text embedded in an image is included in the model's prompt context alongside the user's text. If an image contains adversarial instructions (for example, text that says "ignore all previous instructions") and the user prompt asks the model to answer based on the image, the model may follow those embedded instructions. Treat image content from untrusted sources as untrusted input, and use a system prompt to constrain the model's behavior when processing images you don't control.
* **Untrusted output** — the model may transcribe or describe text from an image verbatim, including HTML, script tags, URLs, or control characters. The API returns this content unmodified. Treat it the same as any other untrusted input before rendering, logging, or executing it in your application.

## FAQs

<AccordionGroup>
  <Accordion title="Do I need to resend the image on later turns?">
    Yes. Cerebras Chat Completions is stateless. If a follow-up request depends on an earlier image, include that image-bearing turn in the conversation history you send with the new request. Continue to include that turn for as long as the model needs the visual context.
  </Accordion>

  <Accordion title="Can I generate images?">
    No, only image input is supported. The model returns text only and does not generate images.
  </Accordion>

  <Accordion title="Is prompt caching supported with image inputs?">
    Yes. Prompt caching can help with repeated images and repeated multimodal context within your organization. Prompt caches are never shared between organizations and remain ephemeral. See [Prompt Caching](/capabilities/prompt-caching).
  </Accordion>

  <Accordion title="Do rate limits change with image support?">
    No. Image support uses the same rate limit framework as text. The same request and token limits still apply based on your organization and tier. For current details, see [Rate Limits](/support/rate-limits).
  </Accordion>

  <Accordion title="Do you store image data?">
    Image inputs are processed as soon as they are received, and the original image payloads are not persisted. After preprocessing, image tokens and image embeddings may be cached ephemerally within your organization to support prompt caching.
  </Accordion>
</AccordionGroup>