This guide covers the ChatCerebras v3.0 node, which includes a model dropdown selector and automatic integration tracking. If you’re using an older version, consider updating Flowise to get these enhanced features.
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Flowise Installation - Install Flowise locally or use Flowise Cloud.
- Node.js 18 or higher - Required for running Flowise locally.
Install Flowise
1
Install Flowise via NPM
The easiest way to get started with Flowise is to install it globally using NPM:Alternatively, you can use Docker:
2
Start Flowise
Once installed, start Flowise with:This will launch Flowise on
http://localhost:3000. Open this URL in your browser to access the Flowise interface.Configure Cerebras in Flowise
1
Create a new Chatflow
In the Flowise UI, create a new chatflow to house your Cerebras-powered application:
- Click on “Chatflows” in the left sidebar
- Click the “+Add New” button
- Give your chatflow a descriptive name like “Cerebras Chat Assistant”
2
Add the ChatCerebras node
Flowise has a dedicated ChatCerebras node for seamless integration:
- In the canvas, click the ”+” button or drag from the left panel
- Search for “ChatCerebras” in the Chat Models category
- Drag the ChatCerebras node onto the canvas
3
Configure the ChatCerebras node
Click on the ChatCerebras node to open its configuration panel and configure the following settings:Required Settings:
-
Connect Credential: Click to add your Cerebras API Key
- If this is your first time, click “Create New”
- Enter your API key from cloud.cerebras.ai (starts with
csk-) - Give it a name like “Cerebras API”
- Click “Add”
-
Model Name: Select from the dropdown:
- llama-3.3-70b - Best for complex reasoning and long-form content
- qwen-3-32b - Balanced performance for general-purpose tasks
- llama3.1-8b - Fastest model, ideal for simple tasks (default)
- gpt-oss-120b - Largest model for demanding tasks
- zai-glm-4.6 - Advanced reasoning and complex problem-solving
- Temperature: Control randomness (0.0 to 1.0, default 0.9)
- Max Tokens: Maximum response length
- Top P: Nucleus sampling parameter
- Streaming: Enable for real-time token generation (default: true)
The ChatCerebras node automatically:
- Configures the correct API endpoint (
https://api.cerebras.ai/v1) - Adds the integration tracking header for better support
- No manual configuration needed!
4
Connect additional nodes
Build out your chatflow by adding other nodes to create a complete application:Or with memory:
- Add a Prompt Template - Click ”+” and search for “Prompt Template” to customize your system prompts
- Add Memory (optional) - Search for “Buffer Memory” or “Conversation Buffer Memory” to maintain conversation context
- Connect the nodes - Draw connections between nodes by clicking and dragging from output ports to input ports
5
Test your chatflow
Once your nodes are connected, test your Cerebras-powered chatflow:
- Click the “Save” button in the top right
- Click the “Chat” icon to open the test interface
- Send a test message like “Hello! What can you help me with?”
- You should receive a response from your Cerebras-powered chatflow
https://api.cerebras.ai/v1.Using Cerebras with Flowise API
Flowise automatically generates REST APIs for your chatflows, allowing you to integrate Cerebras-powered AI into any application.1
Get your Chatflow API endpoint
In the Flowise UI:
- Open your chatflow
- Click the “API” button in the top right
- Copy the API endpoint URL (e.g.,
http://localhost:3000/api/v1/prediction/your-chatflow-id)
2
Make API requests to your chatflow
You can now call your Cerebras-powered chatflow from any application:The chatflow will use Cerebras Inference to generate responses, giving you the speed and performance of Cerebras through Flowise’s convenient API.
Direct Integration with OpenAI SDK
For advanced users who want to use Cerebras directly in custom Flowise nodes or external applications, you can use the OpenAI SDK with Cerebras configuration:Advanced Configuration
Using Environment Variables
For production deployments, store your Cerebras API key as an environment variable:${CEREBRAS_API_KEY}.
Streaming Responses
To enable streaming responses for real-time output:- In the ChatCerebras node, enable “Streaming”
- Your API responses will now stream tokens as they’re generated
- This is particularly useful for long-form content generation and provides a better user experience
Using Multiple Cerebras Models
You can create different chatflows for different use cases:- Fast responses: Use
llama3.1-8bfor quick, simple queries - Complex reasoning: Use
llama-3.3-70bfor complex reasoning and long-form content - General purpose: Use
qwen-3-32bfor balanced performance - Long context: Use
gpt-oss-120bfor processing large documents - Advanced reasoning: Use
zai-glm-4.6for demanding tasks
Next Steps
- Explore the Flowise documentation to learn about advanced features
- Try different Cerebras models to find the best fit for your use case
- Join the Flowise Discord community for support and inspiration
- Check out Flowise templates for pre-built chatflow examples
- Deploy your chatflow to production using Flowise Cloud
- GLM4.6 migration guide
FAQ
Error: 'Invalid API key' or 401 Unauthorized
Error: 'Invalid API key' or 401 Unauthorized
Error: 'Model not found' or invalid model name
Error: 'Model not found' or invalid model name
Ensure you’re using the correct model name format:
- Use
llama-3.3-70b - Use
qwen-3-32b - Use
llama3.1-8b - Use
gpt-oss-120b
Responses are slow or timing out
Responses are slow or timing out
If you’re experiencing slow responses:
- Check your internet connection
- Verify the Base URL is set to
https://api.cerebras.ai/v1(nothttp://) - Try reducing the
max_tokensparameter - Consider using a faster model like
cerebras/llama3.1-8bfor simpler tasks - Check Cerebras status page for any service issues
Does the integration tracking header get added automatically?
Does the integration tracking header get added automatically?
Yes! As of ChatCerebras v3.0, the
X-Cerebras-3rd-Party-Integration: flowise header is automatically included in all requests. You don’t need to manually configure anything.This header helps Cerebras:- Track integration usage and performance
- Provide better support for Flowise users
- Identify and resolve integration-specific issues faster
Can I use Cerebras with Flowise Cloud?
Can I use Cerebras with Flowise Cloud?
Yes! The same configuration works with Flowise Cloud:
- Sign up at flowiseai.com
- Create a new chatflow
- Configure the ChatCerebras node as described above
- Your chatflow will use Cerebras Inference in the cloud
How do I switch between different Cerebras models?
How do I switch between different Cerebras models?
Switching models is easy with the dropdown selector:
- Click on the ChatCerebras node in your chatflow
- Click the “Model Name” dropdown
- Select your desired model from the list (each has a description to help you choose)
- Save the chatflow
- Test with the new model
What additional latency can I expect when using Cerebras through Flowise?
What additional latency can I expect when using Cerebras through Flowise?
Flowise adds minimal overhead since it primarily orchestrates the workflow. The actual inference is performed directly by Cerebras, so you’ll experience the same ultra-low latency that Cerebras is known for. Any additional latency is typically negligible (< 50ms) and comes from Flowise’s workflow orchestration.

