acptoapi
OpenAI-compatible HTTP facade. Point any OpenAI SDK at it — requests route to Kilo Code, opencode, or Claude Code CLI by model prefix. Real token-by-token SSE, agentic tool calls preserved, no API keys required locally.
install
$
npx acptoapi
copy
Defaults: :4800, kilo :4780, opencode :4790, claude on $PATH. Override with --port, --kilo, --opencode, --claude-bin.
backends
| kilo | run kilo serve --port 4780 |
| opencode | run opencode serve --port 4790 |
| claude | install Claude Code CLI (claude on PATH), already authenticated |
drop-in openai sdk
import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:4800/v1', apiKey: 'none' });
const stream = await client.chat.completions.create({
model: 'claude/sonnet',
messages: [{ role: 'user', content: 'hi' }],
stream: true,
});
for await (const c of stream) process.stdout.write(c.choices[0]?.delta?.content || '');
try it live
Requires acptoapi running locally. Paste endpoint, pick model, hit stream.
content
reasoning
raw
(nothing yet)
model routing
Model IDs use <backend>/<model>. Bare IDs default to kilo.
| id | backend | notes |
|---|---|---|
kilo/x-ai/grok-code-fast-1:optimized:free | kilo | free |
kilo/kilo-auto/free | kilo | free |
opencode/minimax-m2.5-free | opencode | free |
claude/sonnet | claude | Claude Code CLI, OAuth auth |
anthropic/claude-sonnet-4-6 | anthropic | direct API, ANTHROPIC_API_KEY |
gemini/gemini-2.0-flash | gemini | GEMINI_API_KEY |
ollama/llama3.2 | ollama | local, OLLAMA_URL |
bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 | bedrock | AWS SigV4 |
groq/llama-3.3-70b-versatile | groq | GROQ_API_KEY |
openrouter/auto | openrouter | OPENROUTER_API_KEY |
together/... | together | TOGETHER_API_KEY |
deepseek/deepseek-chat | deepseek | DEEPSEEK_API_KEY |
xai/grok-2-latest | xai | XAI_API_KEY |
cerebras/llama-3.3-70b | cerebras | CEREBRAS_API_KEY |
perplexity/... | perplexity | PERPLEXITY_API_KEY |
mistral/mistral-large-latest | mistral | MISTRAL_API_KEY |
fireworks/... | fireworks | FIREWORKS_API_KEY |
openai/gpt-4o | openai | OPENAI_API_KEY |
claude/haiku | claude | fast + cheap |
claude/opus | claude | max capability |
endpoints
| method | path | purpose |
|---|---|---|
| POST | /v1/chat/completions | OpenAI chat (streaming + non-streaming) |
| GET | /v1/models | live-probed model list |
| POST | /v1/messages | Anthropic Messages API drop-in |
| POST | /v1/messages/count_tokens | heuristic token estimator |
| POST | /v1/embeddings | prefix-routed embeddings |
| POST | /v1/images/generations | image gen passthrough |
| POST | /v1/moderations | moderation passthrough |
| POST | /v1/rerank | rerank passthrough (cohere/voyage/together) |
| POST | /v1/audio/speech | TTS passthrough (openai/groq) |
| POST | /v1beta/models/:m:streamGenerateContent | Gemini streaming |
| POST | /v1beta/models/:m:embedContent | Gemini embeddings |
| POST | /v1beta/models/:m:countTokens | Gemini token count |
| GET | /metrics | Prometheus exposition |
| GET | /debug/providers | live backend probe |
| GET | /debug/config | active config (redacted) |
| POST | /debug/translate | echo internal event stream |
| GET | /health | backend status |
how it works
- messages[] → backend native format (ACP text part / claude
-pprompt) - text deltas → OpenAI
delta.content, streamed one SSE frame per token - reasoning →
delta.reasoning_content(routed via partID → part.type) - tool calls → OpenAI
delta.tool_callswith input JSON accumulated from streamed deltas - tool results (claude agentic loop) → folded into
delta.contentwith a clear marker - claude CLI is sandboxed: spawned in tempdir, no tools, no MCP, no skills, no hooks — pure prompt → text
- session end →
finish_reason: stop | length | tool_calls, followed by[DONE]
First chunk arrives within a few ms of the backend's first token. No buffering, no polling.