Skip to main content

Custom (OpenAI-compatible)

Point Kenaz at any endpoint that speaks the OpenAI Chat Completions API. Useful for self-hosted inference servers, enterprise gateways, and providers that don't have a first-class adapter yet.

Common scenarios

  • Self-hosted vLLM in your own datacenter
  • LiteLLM proxy in front of one or more upstream providers
  • Enterprise inference gateway (Together AI, Anyscale, Fireworks, Groq, Cerebras, etc.) — most expose an OpenAI-compatible endpoint
  • A model not yet supported by a first-class adapter — if it speaks OpenAI, Kenaz can talk to it
  • Local-but-not-Ollama — llama.cpp's --api-server, LM Studio, Text Generation WebUI, etc.

What you need

  • An HTTPS endpoint URL for the chat completions API
  • (Optional) an API key or bearer token, depending on whether the endpoint requires auth
  • A model identifier the endpoint will accept

Steps

  1. Providers → Add provider → Custom.
  2. Endpoint: the base URL up to and including /v1. Examples:
    • vLLM: https://vllm.your-corp.example/v1
    • LiteLLM: https://litellm.your-corp.example/v1
    • Together AI: https://api.together.xyz/v1
    • Groq: https://api.groq.com/openai/v1
    • LM Studio: http://localhost:1234/v1
  3. API key: paste if required, leave blank if not.
  4. Model: the model identifier the endpoint expects. Custom providers don't auto-discover; type the exact ID (e.g. meta-llama/Meta-Llama-3.1-70B-Instruct).
  5. Capabilities: Kenaz can't read capability hints from a custom endpoint — toggle Vision / Tool Use / etc. manually based on what the model and serving stack support.
  6. Test → Save.

Notes by provider

vLLM (self-hosted)

vLLM's OpenAI-compatible server: docs.vllm.ai/en/latest/serving/openai_compatible_server.html. Tool use works on vLLM ≥ 0.6 with --enable-auto-tool-choice and an appropriate parser flag for the model.

LiteLLM proxy

If your team already has LiteLLM routing to multiple upstreams (cost optimization, fallback, key rotation), pointing Kenaz at the LiteLLM proxy endpoint gives you all of LiteLLM's smarts for free.

Together AI

api.together.xyz/v1 — broad open-weight model selection (Llama, Qwen, DeepSeek, Mixtral). Good fit when you want open-weights but don't want to host yourself.

Groq

api.groq.com/openai/v1 — extremely fast (custom hardware), limited model selection, mostly Llama variants.

LM Studio (local)

LM Studio is the GUI alternative to Ollama. Its server tab exposes http://localhost:1234/v1. No API key required.

Privacy posture

Whatever your endpoint does. Read the docs for the specific provider. For self-hosted: your privacy is exactly your infrastructure's.

Troubleshooting

  • Test returns 404 Not Found. Endpoint URL is missing the /v1 path or has a trailing slash. Try without and with.
  • Test returns the right response code but no models listed. Custom providers don't auto-discover models — you have to type the model ID manually in the field below.
  • Tool calls silently fail. The endpoint or the model doesn't support function calling. Kenaz can't tell the difference between "model can't use tools" and "model misformatted the tool call." Disable tools in the session, or pick a different model.
  • Streaming hangs. Some endpoints buffer SSE behind reverse proxies that strip the keep-alive — Kenaz uses TCP keepalive on long streams (1.5s probes) but if a proxy aggressively closes idle connections, set Streaming → off in the provider editor.
  • Wrong content types. Older OpenAI-compatible servers accept old single-string content but reject the newer block format Kenaz sends. Update the server, or use Ollama / a first-class adapter instead.