> ## Documentation Index > Fetch the complete documentation index at: https://inference-docs.cerebras.ai/llms.txt > Use this file to discover all available pages before exploring further. # Public models Retrieve details about publicly available Cerebras models without an API key, including context length, pricing, and supported features. This endpoint supports multiple response formats for compatibility with different platforms. This endpoint is **public** and does not require an API key. ## Supported Formats The endpoint supports three response formats via the `format` query parameter: * **Default (Cerebras)** - Native Cerebras format * **OpenRouter** - OpenRouter-compatible format * **HuggingFace** - HuggingFace-compatible format ## Query Parameters Response format. Options: `openrouter`, `huggingface`. Omit for default Cerebras format. ## Response ### Default Format The object type, which is always `list` for list responses or `model` for single model responses. Array of model objects (only in list responses). The model identifier (e.g., `gpt-oss-120b`). The object type, which is always `model`. The Unix timestamp (in seconds) of when the model was created. The organization that owns the model. Human-readable model name. Description of the model's capabilities and recommended use cases. The HuggingFace repository identifier for this model. Pricing per token in USD. Cost per prompt token. Cost per completion token. Boolean flags for supported features. Whether streaming is supported. Whether function calling is supported. Whether structured outputs are supported. Whether vision/image input is supported. Whether JSON mode is supported. Whether tool use is supported. Whether tool choice is supported. Whether parallel tool calls are supported. Whether response format control is supported. Whether the model supports reasoning. Boolean flags for supported API parameters. Model architecture details. Input/output modality (e.g., `text`). Tokenizer type. Instruction tuning format. Model limits. Maximum context window in tokens. Maximum completion tokens. Requests per minute limit (null if unlimited). Tokens per minute limit (null if unlimited). Whether this model is deprecated. Whether this model is in preview. Quantization method used for the model. ### OpenRouter Format Array of model objects with extended metadata. The model identifier. The HuggingFace repository identifier. Human-readable model name. Supported input types (e.g., `["text"]`). Supported output types (e.g., `["text"]`). Maximum context window size in tokens. Maximum output length in tokens. Pricing information per token. Cost per prompt token. Cost per completion token. Cost per request (typically `"0"`). Cost per image token (typically `"0"` for text-only models). Cost per cached input token read (typically `"0"`). Cost per cached input token write (typically `"0"`). List of supported sampling parameters. List of supported features (e.g., `["tools", "json_mode", "structured_outputs"]`). ### HuggingFace Format Array of model objects. The model identifier. The HuggingFace repository identifier. The object type, which is always `model`. The Unix timestamp (in seconds) of when the model was created. The organization that owns the model. Pricing in USD per million tokens. Price per million input tokens. Price per million output tokens. Maximum context window size in tokens. Boolean flags for supported features. ```bash cURL (Default) theme={null} curl -sS 'https://api.cerebras.ai/public/v1/models' | jq ``` ```bash cURL (OpenRouter) theme={null} curl -sS 'https://api.cerebras.ai/public/v1/models?format=openrouter' | jq ``` ```bash cURL (HuggingFace) theme={null} curl -sS 'https://api.cerebras.ai/public/v1/models?format=huggingface' | jq ``` ```python Python theme={null} import httpx # Default format response = httpx.get("https://api.cerebras.ai/public/v1/models") models = response.json() # OpenRouter format response = httpx.get( "https://api.cerebras.ai/public/v1/models", params={"format": "openrouter"} ) models = response.json() # HuggingFace format response = httpx.get( "https://api.cerebras.ai/public/v1/models", params={"format": "huggingface"} ) models = response.json() ``` ```javascript JavaScript theme={null} // Default format const response = await fetch('https://api.cerebras.ai/public/v1/models'); const models = await response.json(); // OpenRouter format const response = await fetch( 'https://api.cerebras.ai/public/v1/models?format=openrouter' ); const models = await response.json(); // HuggingFace format const response = await fetch( 'https://api.cerebras.ai/public/v1/models?format=huggingface' ); const models = await response.json(); ``` ```json Default Format theme={null} { "object": "list", "data": [ { "id": "gpt-oss-120b", "object": "model", "created": 1754438400, "owned_by": "OpenAI", "name": "OpenAI GPT OSS", "description": "This model excels at efficient reasoning across science, math, and coding applications. It's ideal for real-time coding assistance, processing large documents for Q&A and summarization, agentic research workflows, and regulated on-premises workloads.", "hugging_face_id": "openai/gpt-oss-120b", "pricing": { "prompt": "0.00000035", "completion": "0.00000075" }, "capabilities": { "streaming": true, "function_calling": true, "structured_outputs": true, "vision": false, "json_mode": true, "tools": true, "tool_choice": true, "parallel_tool_calls": false, "response_format": true, "reasoning": true }, "supported_parameters": { "temperature": true, "top_p": true, "seed": true, "stop": true, "max_completion_tokens": true, "logprobs": true, "top_logprobs": true, "frequency_penalty": true, "presence_penalty": true, "logit_bias": true, "repetition_penalty": false }, "architecture": { "modality": "text", "tokenizer": "GPT", "instruct_type": "harmony" }, "limits": { "max_context_length": 131072, "max_completion_tokens": 40960, "requests_per_minute": null, "tokens_per_minute": null }, "datacenter_locations": [], "deprecated": false, "preview": false, "quantization": "FP16/8 (weights only)" } ] } ``` ```json OpenRouter Format theme={null} { "data": [ { "id": "gpt-oss-120b", "hugging_face_id": "openai/gpt-oss-120b", "name": "OpenAI GPT OSS", "created": 1754438400, "input_modalities": ["text"], "output_modalities": ["text"], "quantization": "fp16", "context_length": 131072, "max_output_length": 40960, "pricing": { "prompt": "0.00000035", "completion": "0.00000075", "request": "0", "image": "0", "input_cache_read": "0", "input_cache_write": "0" }, "supported_sampling_parameters": [ "temperature", "top_p", "stop", "seed", "frequency_penalty", "presence_penalty", "max_tokens", "logprobs", "top_logprobs", "logit_bias" ], "supported_features": [ "tools", "json_mode", "structured_outputs", "reasoning" ] } ] } ``` ```json HuggingFace Format theme={null} { "data": [ { "id": "gpt-oss-120b", "hugging_face_id": "openai/gpt-oss-120b", "object": "model", "created": 1754438400, "owned_by": "OpenAI", "context_length": 131072, "pricing": { "input": 0.35, "output": 0.75 }, "capabilities": { "streaming": true, "function_calling": true, "structured_outputs": true, "vision": false } } ] } ``` ## Retrieve Specific Model You can also retrieve information about a specific model by appending the model ID to the endpoint: ``` GET https://api.cerebras.ai/public/v1/models/{model_id} ``` ### Path Parameters The ID of the model to retrieve (e.g., `gpt-oss-120b`, `zai-glm-4.7`). ```bash cURL theme={null} curl -sS 'https://api.cerebras.ai/public/v1/models/gpt-oss-120b' | jq ``` ```python Python theme={null} import httpx model_id = "gpt-oss-120b" response = httpx.get(f"https://api.cerebras.ai/public/v1/models/{model_id}") model = response.json() ``` ```javascript JavaScript theme={null} const modelId = 'gpt-oss-120b'; const response = await fetch( `https://api.cerebras.ai/public/v1/models/${modelId}` ); const model = await response.json(); ``` ```json Response theme={null} { "id": "gpt-oss-120b", "object": "model", "created": 1754438400, "owned_by": "OpenAI", "name": "OpenAI GPT OSS", "description": "This model excels at efficient reasoning across science, math, and coding applications. It's ideal for real-time coding assistance, processing large documents for Q&A and summarization, agentic research workflows, and regulated on-premises workloads.", "hugging_face_id": "openai/gpt-oss-120b", "pricing": { "prompt": "0.00000035", "completion": "0.00000075" }, "capabilities": { "streaming": true, "function_calling": true, "structured_outputs": true, "vision": false, "json_mode": true, "tools": true, "tool_choice": true, "parallel_tool_calls": false, "response_format": true, "reasoning": true }, "limits": { "max_context_length": 131072, "max_completion_tokens": 40960, "requests_per_minute": null, "tokens_per_minute": null }, "deprecated": false, "preview": false, "quantization": "FP16/8 (weights only)" } ``` ## Use Cases **Platform Integration** - Use the OpenRouter or HuggingFace formats to integrate Cerebras models into existing platforms that support these standards. **Model Discovery** - Programmatically discover available models and their capabilities without authentication. **Pricing Comparison** - Compare pricing across different models using the structured pricing information. **Feature Detection** - Check which features (streaming, tools, JSON mode) are supported by each model.