Skip to main content
Flowise is an open-source low-code tool for developers to build customized LLM orchestration flows and AI agents. With its intuitive drag-and-drop interface, you can easily create complex AI workflows without writing extensive code. By integrating Cerebras with Flowise, you can leverage the world’s fastest AI inference to power your Flowise applications with ultra-low latency and high throughput.
This guide covers the ChatCerebras v3.0 node, which includes a model dropdown selector and automatic integration tracking. If you’re using an older version, consider updating Flowise to get these enhanced features.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Flowise Installation - Install Flowise locally or use Flowise Cloud.
  • Node.js 18 or higher - Required for running Flowise locally.

Install Flowise

1

Install Flowise via NPM

The easiest way to get started with Flowise is to install it globally using NPM:
npm install -g flowise
Alternatively, you can use Docker:
docker run -d -p 3000:3000 flowiseai/flowise
2

Start Flowise

Once installed, start Flowise with:
flowise start
This will launch Flowise on http://localhost:3000. Open this URL in your browser to access the Flowise interface.

Configure Cerebras in Flowise

1

Create a new Chatflow

In the Flowise UI, create a new chatflow to house your Cerebras-powered application:
  1. Click on “Chatflows” in the left sidebar
  2. Click the “+Add New” button
  3. Give your chatflow a descriptive name like “Cerebras Chat Assistant”
2

Add the ChatCerebras node

Flowise has a dedicated ChatCerebras node for seamless integration:
  1. In the canvas, click the ”+” button or drag from the left panel
  2. Search for “ChatCerebras” in the Chat Models category
  3. Drag the ChatCerebras node onto the canvas
3

Configure the ChatCerebras node

Click on the ChatCerebras node to open its configuration panel and configure the following settings:Required Settings:
  1. Connect Credential: Click to add your Cerebras API Key
    • If this is your first time, click “Create New”
    • Enter your API key from cloud.cerebras.ai (starts with csk-)
    • Give it a name like “Cerebras API”
    • Click “Add”
  2. Model Name: Select from the dropdown:
    • llama-3.3-70b - Best for complex reasoning and long-form content
    • qwen-3-32b - Balanced performance for general-purpose tasks
    • llama3.1-8b - Fastest model, ideal for simple tasks (default)
    • gpt-oss-120b - Largest model for demanding tasks
    • zai-glm-4.6 - Advanced reasoning and complex problem-solving
Optional Settings (under Additional Parameters):
  • Temperature: Control randomness (0.0 to 1.0, default 0.9)
  • Max Tokens: Maximum response length
  • Top P: Nucleus sampling parameter
  • Streaming: Enable for real-time token generation (default: true)
The ChatCerebras node automatically:
  • Configures the correct API endpoint (https://api.cerebras.ai/v1)
  • Adds the integration tracking header for better support
  • No manual configuration needed!
4

Connect additional nodes

Build out your chatflow by adding other nodes to create a complete application:
  1. Add a Prompt Template - Click ”+” and search for “Prompt Template” to customize your system prompts
  2. Add Memory (optional) - Search for “Buffer Memory” or “Conversation Buffer Memory” to maintain conversation context
  3. Connect the nodes - Draw connections between nodes by clicking and dragging from output ports to input ports
A basic flow might look like:
Prompt Template → ChatCerebras → Output
Or with memory:
Prompt Template → Buffer Memory → ChatCerebras → Output
5

Test your chatflow

Once your nodes are connected, test your Cerebras-powered chatflow:
  1. Click the “Save” button in the top right
  2. Click the “Chat” icon to open the test interface
  3. Send a test message like “Hello! What can you help me with?”
  4. You should receive a response from your Cerebras-powered chatflow
If you encounter any errors, check that your API key is correct and the Base URL is set to https://api.cerebras.ai/v1.

Using Cerebras with Flowise API

Flowise automatically generates REST APIs for your chatflows, allowing you to integrate Cerebras-powered AI into any application.
1

Get your Chatflow API endpoint

In the Flowise UI:
  1. Open your chatflow
  2. Click the “API” button in the top right
  3. Copy the API endpoint URL (e.g., http://localhost:3000/api/v1/prediction/your-chatflow-id)
2

Make API requests to your chatflow

You can now call your Cerebras-powered chatflow from any application:
import requests
import json

url = "http://localhost:3000/api/v1/prediction/your-chatflow-id"

payload = {
    "question": "What is the capital of France?",
    "overrideConfig": {
        "temperature": 0.7
    }
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()

print(result["text"])
The chatflow will use Cerebras Inference to generate responses, giving you the speed and performance of Cerebras through Flowise’s convenient API.

Direct Integration with OpenAI SDK

For advanced users who want to use Cerebras directly in custom Flowise nodes or external applications, you can use the OpenAI SDK with Cerebras configuration:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1",
    default_headers={
        "X-Cerebras-3rd-Party-Integration": "flowise"
    }
)

response = client.chat.completions.create(
    model="cerebras/llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=500,
    temperature=0.7
)

print(response.choices[0].message.content)

Advanced Configuration

Using Environment Variables

For production deployments, store your Cerebras API key as an environment variable:
export CEREBRAS_API_KEY=your-api-key-here
Then in Flowise, reference it in the ChatCerebras credential configuration using ${CEREBRAS_API_KEY}.

Streaming Responses

To enable streaming responses for real-time output:
  1. In the ChatCerebras node, enable “Streaming”
  2. Your API responses will now stream tokens as they’re generated
  3. This is particularly useful for long-form content generation and provides a better user experience

Using Multiple Cerebras Models

You can create different chatflows for different use cases:
  • Fast responses: Use llama3.1-8b for quick, simple queries
  • Complex reasoning: Use llama-3.3-70b for complex reasoning and long-form content
  • General purpose: Use qwen-3-32b for balanced performance
  • Long context: Use gpt-oss-120b for processing large documents
  • Advanced reasoning: Use zai-glm-4.6 for demanding tasks

Next Steps

FAQ

This usually means your Cerebras API key is incorrect or not properly configured:
  1. Verify your API key at cloud.cerebras.ai
  2. Make sure there are no extra spaces or characters in the API key field
  3. If using environment variables, ensure they’re properly loaded
  4. Try regenerating your API key if the issue persists
Ensure you’re using the correct model name format:
  • Use llama-3.3-70b
  • Use qwen-3-32b
  • Use llama3.1-8b
  • Use gpt-oss-120b
Refer to the Cerebras models documentation for the complete list of available models.
If you’re experiencing slow responses:
  1. Check your internet connection
  2. Verify the Base URL is set to https://api.cerebras.ai/v1 (not http://)
  3. Try reducing the max_tokens parameter
  4. Consider using a faster model like cerebras/llama3.1-8b for simpler tasks
  5. Check Cerebras status page for any service issues
Yes! As of ChatCerebras v3.0, the X-Cerebras-3rd-Party-Integration: flowise header is automatically included in all requests. You don’t need to manually configure anything.This header helps Cerebras:
  • Track integration usage and performance
  • Provide better support for Flowise users
  • Identify and resolve integration-specific issues faster
Yes! The same configuration works with Flowise Cloud:
  1. Sign up at flowiseai.com
  2. Create a new chatflow
  3. Configure the ChatCerebras node as described above
  4. Your chatflow will use Cerebras Inference in the cloud
Switching models is easy with the dropdown selector:
  1. Click on the ChatCerebras node in your chatflow
  2. Click the “Model Name” dropdown
  3. Select your desired model from the list (each has a description to help you choose)
  4. Save the chatflow
  5. Test with the new model
You can also create multiple chatflows with different models for different use cases. The dropdown makes it easy to see all available options at a glance.
Flowise adds minimal overhead since it primarily orchestrates the workflow. The actual inference is performed directly by Cerebras, so you’ll experience the same ultra-low latency that Cerebras is known for. Any additional latency is typically negligible (< 50ms) and comes from Flowise’s workflow orchestration.