Skip to main content

What is KiloCode?

KiloCode is an AI-powered autonomous assistant for Visual Studio Code that helps you plan, build, and fix code. It combines the best features of popular coding assistants with unique capabilities, offering multiple interaction modes including architect, code, ask, debug, and orchestrator modes. By integrating KiloCode with Cerebras, you get access to ultra-fast inference speeds that make your coding workflow significantly more responsive and efficient. Learn more at kilocode.ai.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Visual Studio Code - Download and install from code.visualstudio.com.
  • KiloCode Extension - Install from their Website.

Configure KiloCode with Cerebras

1

Open KiloCode in VS Code

After installing the KiloCode extension, open the KiloCode panel in VS Code. You’ll typically find it in the sidebar with the KiloCode icon. This panel serves as your command center for interacting with the AI assistant and managing your coding tasks.
2

Configure API provider

Click Use your own API key in the KiloCode panel. This allows you to connect KiloCode to Cerebras’s inference API instead of using the default provider.In the API Provider dropdown, select OpenAI Compatible or Custom (depending on your KiloCode version). KiloCode uses the OpenAI-compatible API format, which Cerebras fully supports.
3

Enter your Cerebras credentials

Configure the following settings to connect to Cerebras:
  • API Key: Enter your Cerebras API key
  • Base URL: https://api.cerebras.ai/v1
  • Model: Choose from available Cerebras models:
    • llama-3.3-70b - Best for complex reasoning, long-form content, and tasks requiring deep understanding
    • qwen-3-32b - Balanced performance for general-purpose applications
    • llama3.1-8b - Fastest option for simple tasks and high-throughput scenarios
    • gpt-oss-120b - Largest model for the most demanding tasks
    • zai-glm-4.6 - Advanced 357B parameter model with strong reasoning capabilities
We recommend starting with zai-glm-4.6 or gpt-oss-120b for the best coding assistance experience with the best performance.

Using KiloCode with Cerebras

Example: Building a Calculator App

Here’s a practical example of using KiloCode with Cerebras to build a full-stack calculator application:
1

Provide your task

In the KiloCode panel, enter your prompt describing what you want to build:
Build a basic calculator app with a FastAPI backend that performs 
addition, subtraction, multiplication, and division. The frontend 
should use HTML/CSS/JavaScript to accept input, select an operation, 
and display results.
KiloCode will send this prompt to Cerebras for processing, leveraging the ultra-fast inference to quickly understand your requirements.
2

Review the plan

KiloCode, powered by Cerebras, will analyze your request and present a detailed plan. This might include:
  • Creating the FastAPI backend structure
  • Implementing calculator operations as API endpoints
  • Building the HTML/CSS frontend interface
  • Setting up proper error handling
  • Implementing fetch API calls for frontend-backend communication
Review each step carefully to ensure it aligns with your requirements. You can modify the plan or ask for clarifications before proceeding.
3

Approve and execute

Approve the actions you want KiloCode to execute. The assistant will:
  • Generate the necessary files in your workspace
  • Write the backend and frontend code
  • Set up the project structure with proper organization
  • Provide instructions for installing dependencies and running the application
The KiloCode panel displays executed tasks, token usage, and session information at the top bar, giving you full visibility into the AI’s work and resource consumption.

Verifying the Integration

To confirm that KiloCode is using Cerebras’s API correctly:
  1. Response Speed: Cerebras provides significantly faster inference than typical GPU providers. You should notice near-instant responses for most queries.
  2. Token Usage: Monitor the token counter in the KiloCode panel to track your API usage.
  3. Model Name: Verify that your selected Cerebras model appears in the session info at the top of the panel.
  4. API Logs: Check the KiloCode output logs for successful API connections to api.cerebras.ai.

Advanced Usage

Custom Interaction Modes

You can create custom interaction modes tailored to your specific development needs:
  1. Navigate to the Edit section in KiloCode
  2. Create a new mode or modify existing ones
  3. Define custom prompts, system messages, and behaviors
  4. Save and activate your custom mode for specialized workflows
Custom modes are particularly useful for domain-specific tasks like API development, testing, documentation, or working with specific frameworks.

Working with Large Codebases

When working with large projects, consider these best practices:
  • Use architect mode first to plan changes and understand the impact across your codebase
  • Break down complex tasks into smaller, manageable steps that can be executed incrementally
  • Use orchestrator mode to coordinate multi-file changes while maintaining consistency
  • Review each change before approving execution to maintain code quality
  • Leverage context windows effectively by focusing on relevant files

Debugging Workflow

For effective debugging with KiloCode and Cerebras:
  1. Switch to debug mode for specialized debugging assistance
  2. Provide error messages, stack traces, or describe the unexpected behavior
  3. Let KiloCode analyze the problem using Cerebras’s fast inference
  4. Review suggested fixes and explanations
  5. Apply changes incrementally and test after each modification
  6. Use ask mode to understand why bugs occurred and how to prevent them

Best Practices

Tip: Start with smaller, well-defined tasks to get familiar with KiloCode’s capabilities and how it interacts with Cerebras before tackling larger projects.
  • Be Specific: Provide clear, detailed prompts with context for better results. Include file names, function names, and expected behavior.
  • Review Plans: Always review the assistant’s plan before execution to catch potential issues early.
  • Iterative Approach: Break complex tasks into smaller steps and validate each step before moving forward.
  • Use Appropriate Modes: Select the right interaction mode for your task to get optimized prompts and behaviors.
  • Monitor Token Usage: Keep track of your API usage in the panel to manage costs and understand model efficiency.
  • Leverage Speed: Take advantage of Cerebras’s fast inference to iterate quickly and experiment with different approaches.
  • Provide Feedback: Use the feedback mechanisms in KiloCode to improve future responses.

Frequently Asked Questions

For most coding tasks, we recommend gpt-oss-120b or zai-glm-4.6. These models offer the best code understanding, generation quality, and reasoning capability.Use llama3.3-70b for very simple tasks like code formatting, basic refactoring, or quick questions where speed is the priority.
Cerebras provides ultra-fast inference speeds, typically 10-20x faster than traditional GPU providers. This means:
  • Near-instant responses to your coding queries
  • Faster iteration cycles when generating or refactoring code
  • More responsive debugging and problem-solving
  • Ability to experiment with multiple approaches quickly
  • Reduced waiting time during complex multi-step tasks
Yes, you can change the model in KiloCode’s settings at any time. Go to the API configuration section and select a different model from the dropdown.Different models excel at different tasks, so you might want to use llama-3.3-70b for complex code generation and llama3.1-8b for quick refactoring tasks.
Yes, KiloCode supports streaming responses when using Cerebras models. This means you’ll see the AI’s response appear in real-time as it’s generated, providing an even more responsive experience.Streaming is automatically enabled when you configure KiloCode with Cerebras’s API endpoint.
You can track your API usage in two ways:
  1. In KiloCode: The top bar shows token usage for the current session
  2. Cerebras Dashboard: Visit cloud.cerebras.ai to see detailed usage analytics, including total tokens, requests, and costs
If you exceed your Cerebras API rate limits, you’ll receive an error message in KiloCode. The assistant will pause and notify you of the rate limit.You can:
  • Wait for the rate limit to reset (typically one minute)
  • Upgrade your Cerebras plan for higher rate limits
  • Optimize your prompts to use fewer tokens
Check your current rate limits in the Cerebras dashboard.

Next Steps