CePO (Cerebras Planning & Optimization) is a framework that adds advanced reasoning capabilities to the Llama family of models by utilizing test-time compute. This approach enables Llama to address complex reasoning tasks that can be difficult for standard one-shot or instruct models.CePO is implemented using Cerebras inference, which currently supports llama3.3-70b at 2,200 reasoning tokens/s. This level of inference speed enables efficient test-time computation for more sophisticated reasoning tasks.
CePO is built on the popular, open-source OptiLLM library. To get started, install OptiLLM and make sure you have the latest version of the Cerebras Inference SDK installed.