Get Started with KiloCode

What is KiloCode?

KiloCode is an AI-powered autonomous assistant for Visual Studio Code that helps you plan, build, and fix code. It combines the best features of popular coding assistants with unique capabilities, offering multiple interaction modes including architect, code, ask, debug, and orchestrator modes. By integrating KiloCode with Cerebras, you get access to ultra-fast inference speeds that make your coding workflow significantly more responsive and efficient. Learn more at kilocode.ai.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here.
Visual Studio Code - Download and install from code.visualstudio.com.
KiloCode Extension - Install from their Website.

Configure KiloCode with Cerebras

Open KiloCode in VS Code

After installing the KiloCode extension, open the KiloCode panel in VS Code. You’ll typically find it in the sidebar with the KiloCode icon. This panel serves as your command center for interacting with the AI assistant and managing your coding tasks.

Configure API provider

Click Use your own API key in the KiloCode panel. This allows you to connect KiloCode to Cerebras’s inference API instead of using the default provider.In the API Provider dropdown, select OpenAI Compatible or Custom (depending on your KiloCode version). KiloCode uses the OpenAI-compatible API format, which Cerebras fully supports.

Enter your Cerebras credentials

Configure the following settings to connect to Cerebras:

API Key: Enter your Cerebras API key
Base URL: https://api.cerebras.ai/v1
Model: Choose from available Cerebras models:
- llama3.1-8b - Fastest option for simple tasks and high-throughput scenarios
- gpt-oss-120b - Largest model for the most demanding tasks
- zai-glm-4.7 - Advanced 357B parameter model with strong reasoning capabilities

We recommend starting with zai-glm-4.7 or gpt-oss-120b for the best coding assistance experience with the best performance.

Using KiloCode with Cerebras

Example: Building a Calculator App

Here’s a practical example of using KiloCode with Cerebras to build a full-stack calculator application:

Provide your task

In the KiloCode panel, enter your prompt describing what you want to build:

Build a basic calculator app with a FastAPI backend that performs 
addition, subtraction, multiplication, and division. The frontend 
should use HTML/CSS/JavaScript to accept input, select an operation, 
and display results.

KiloCode will send this prompt to Cerebras for processing, leveraging the ultra-fast inference to quickly understand your requirements.

Review the plan

KiloCode, powered by Cerebras, will analyze your request and present a detailed plan. This might include:

Creating the FastAPI backend structure
Implementing calculator operations as API endpoints
Building the HTML/CSS frontend interface
Setting up proper error handling
Implementing fetch API calls for frontend-backend communication

Review each step carefully to ensure it aligns with your requirements. You can modify the plan or ask for clarifications before proceeding.

Approve and execute

Approve the actions you want KiloCode to execute. The assistant will:

Generate the necessary files in your workspace
Write the backend and frontend code
Set up the project structure with proper organization
Provide instructions for installing dependencies and running the application

The KiloCode panel displays executed tasks, token usage, and session information at the top bar, giving you full visibility into the AI’s work and resource consumption.

Verifying the Integration

To confirm that KiloCode is using Cerebras’s API correctly:

Response Speed: Cerebras provides significantly faster inference than typical GPU providers. You should notice near-instant responses for most queries.
Token Usage: Monitor the token counter in the KiloCode panel to track your API usage.
Model Name: Verify that your selected Cerebras model appears in the session info at the top of the panel.
API Logs: Check the KiloCode output logs for successful API connections to api.cerebras.ai.

Advanced Usage

Custom Interaction Modes

You can create custom interaction modes tailored to your specific development needs:

Navigate to the Edit section in KiloCode
Create a new mode or modify existing ones
Define custom prompts, system messages, and behaviors
Save and activate your custom mode for specialized workflows

Custom modes are particularly useful for domain-specific tasks like API development, testing, documentation, or working with specific frameworks.

Working with Large Codebases

When working with large projects, consider these best practices:

Use architect mode first to plan changes and understand the impact across your codebase
Break down complex tasks into smaller, manageable steps that can be executed incrementally
Use orchestrator mode to coordinate multi-file changes while maintaining consistency
Review each change before approving execution to maintain code quality
Leverage context windows effectively by focusing on relevant files

Debugging Workflow

For effective debugging with KiloCode and Cerebras:

Switch to debug mode for specialized debugging assistance
Provide error messages, stack traces, or describe the unexpected behavior
Let KiloCode analyze the problem using Cerebras’s fast inference
Review suggested fixes and explanations
Apply changes incrementally and test after each modification
Use ask mode to understand why bugs occurred and how to prevent them

Best Practices

Tip: Start with smaller, well-defined tasks to get familiar with KiloCode’s capabilities and how it interacts with Cerebras before tackling larger projects.

Be Specific: Provide clear, detailed prompts with context for better results. Include file names, function names, and expected behavior.
Review Plans: Always review the assistant’s plan before execution to catch potential issues early.
Iterative Approach: Break complex tasks into smaller steps and validate each step before moving forward.
Use Appropriate Modes: Select the right interaction mode for your task to get optimized prompts and behaviors.
Monitor Token Usage: Keep track of your API usage in the panel to manage costs and understand model efficiency.
Leverage Speed: Take advantage of Cerebras’s fast inference to iterate quickly and experiment with different approaches.
Provide Feedback: Use the feedback mechanisms in KiloCode to improve future responses.

Frequently Asked Questions

Which Cerebras model should I use for coding tasks?

For most coding tasks, we recommend gpt-oss-120b or zai-glm-4.7. These models offer the best code understanding, generation quality, and reasoning capability.Use llama3.1-8b for very simple tasks like code formatting, basic refactoring, or quick questions where speed is the priority.

How does Cerebras improve my KiloCode experience?

Cerebras provides ultra-fast inference speeds, typically 10-20x faster than traditional GPU providers. This means:

Near-instant responses to your coding queries
Faster iteration cycles when generating or refactoring code
More responsive debugging and problem-solving
Ability to experiment with multiple approaches quickly
Reduced waiting time during complex multi-step tasks

Can I switch between different Cerebras models?

Yes, you can change the model in KiloCode’s settings at any time. Go to the API configuration section and select a different model from the dropdown.Different models excel at different tasks, so you might want to use gpt-oss-120b for complex code generation and llama3.1-8b for quick refactoring tasks.

Does KiloCode support streaming responses with Cerebras?

Yes, KiloCode supports streaming responses when using Cerebras models. This means you’ll see the AI’s response appear in real-time as it’s generated, providing an even more responsive experience.Streaming is automatically enabled when you configure KiloCode with Cerebras’s API endpoint.

How do I track my Cerebras API usage?

You can track your API usage in two ways:

In KiloCode: The top bar shows token usage for the current session
Cerebras Dashboard: Visit cloud.cerebras.ai to see detailed usage analytics, including total tokens, requests, and costs

What happens if I exceed my API rate limits?

If you exceed your Cerebras API rate limits, you’ll receive an error message in KiloCode. The assistant will pause and notify you of the rate limit.You can:

Wait for the rate limit to reset (typically one minute)
Upgrade your Cerebras plan for higher rate limits
Optimize your prompts to use fewer tokens

Check your current rate limits in the Cerebras dashboard.

Next Steps

Explore the KiloCode documentation for advanced features and customization options
Try different Cerebras models to find the best fit for your workflow and coding style
Experiment with custom interaction modes to optimize for your specific development needs
Join the KiloCode community for tips, support, and sharing best practices
Check out our Model Comparison Guide to understand which Cerebras model works best for different coding tasks
Read about Cerebras’s ultra-fast inference technology to learn how it achieves industry-leading speeds
Want to migrate from GLM4.7? Check out the GLM4.7 migration guide

Get Started

Capabilities

Compatibility

Resources

Support

What is KiloCode?

Prerequisites

Configure KiloCode with Cerebras

Using KiloCode with Cerebras

Example: Building a Calculator App

Verifying the Integration

Advanced Usage

Custom Interaction Modes

Working with Large Codebases

Debugging Workflow

Best Practices

Frequently Asked Questions

Next Steps

Get Started

Capabilities

Compatibility

Resources

Support

​What is KiloCode?

​Prerequisites

​Configure KiloCode with Cerebras

​Using KiloCode with Cerebras

​Example: Building a Calculator App

​Verifying the Integration

​Advanced Usage

​Custom Interaction Modes

​Working with Large Codebases

​Debugging Workflow

​Best Practices

​Frequently Asked Questions

​Next Steps

What is KiloCode?

Prerequisites

Configure KiloCode with Cerebras

Using KiloCode with Cerebras

Example: Building a Calculator App

Verifying the Integration

Advanced Usage

Custom Interaction Modes

Working with Large Codebases

Debugging Workflow

Best Practices

Frequently Asked Questions

Next Steps