acptoapi / any-to-any bridge
acptoapi readme

acptoapi

OpenAI-compatible HTTP facade. Point any OpenAI SDK at it — requests route to Kilo Code, opencode, or Claude Code CLI by model prefix. Real token-by-token SSE, agentic tool calls preserved, no API keys required locally.

try it live github ↗ streaming tool_calls reasoning_content

install

$ npx acptoapi copy

Defaults: :4800, kilo :4780, opencode :4790, claude on $PATH. Override with --port, --kilo, --opencode, --claude-bin.

backends

kilorun kilo serve --port 4780
opencoderun opencode serve --port 4790
claudeinstall Claude Code CLI (claude on PATH), already authenticated

drop-in openai sdk

import OpenAI from 'openai';
const client = new OpenAI({ baseURL: 'http://localhost:4800/v1', apiKey: 'none' });

const stream = await client.chat.completions.create({
  model: 'claude/sonnet',
  messages: [{ role: 'user', content: 'hi' }],
  stream: true,
});
for await (const c of stream) process.stdout.write(c.choices[0]?.delta?.content || '');

try it live

Requires acptoapi running locally. Paste endpoint, pick model, hit stream.

content
reasoning
raw
(nothing yet)

model routing

Model IDs use <backend>/<model>. Bare IDs default to kilo.

idbackendnotes
kilo/x-ai/grok-code-fast-1:optimized:freekilofree
kilo/kilo-auto/freekilofree
opencode/minimax-m2.5-freeopencodefree
claude/sonnetclaudeClaude Code CLI, OAuth auth
anthropic/claude-sonnet-4-6anthropicdirect API, ANTHROPIC_API_KEY
gemini/gemini-2.0-flashgeminiGEMINI_API_KEY
ollama/llama3.2ollamalocal, OLLAMA_URL
bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0bedrockAWS SigV4
groq/llama-3.3-70b-versatilegroqGROQ_API_KEY
openrouter/autoopenrouterOPENROUTER_API_KEY
together/...togetherTOGETHER_API_KEY
deepseek/deepseek-chatdeepseekDEEPSEEK_API_KEY
xai/grok-2-latestxaiXAI_API_KEY
cerebras/llama-3.3-70bcerebrasCEREBRAS_API_KEY
perplexity/...perplexityPERPLEXITY_API_KEY
mistral/mistral-large-latestmistralMISTRAL_API_KEY
fireworks/...fireworksFIREWORKS_API_KEY
openai/gpt-4oopenaiOPENAI_API_KEY
claude/haikuclaudefast + cheap
claude/opusclaudemax capability

endpoints

methodpathpurpose
POST/v1/chat/completionsOpenAI chat (streaming + non-streaming)
GET/v1/modelslive-probed model list
POST/v1/messagesAnthropic Messages API drop-in
POST/v1/messages/count_tokensheuristic token estimator
POST/v1/embeddingsprefix-routed embeddings
POST/v1/images/generationsimage gen passthrough
POST/v1/moderationsmoderation passthrough
POST/v1/rerankrerank passthrough (cohere/voyage/together)
POST/v1/audio/speechTTS passthrough (openai/groq)
POST/v1beta/models/:m:streamGenerateContentGemini streaming
POST/v1beta/models/:m:embedContentGemini embeddings
POST/v1beta/models/:m:countTokensGemini token count
GET/metricsPrometheus exposition
GET/debug/providerslive backend probe
GET/debug/configactive config (redacted)
POST/debug/translateecho internal event stream
GET/healthbackend status

how it works

  • messages[] → backend native format (ACP text part / claude -p prompt)
  • text deltas → OpenAI delta.content, streamed one SSE frame per token
  • reasoningdelta.reasoning_content (routed via partID → part.type)
  • tool calls → OpenAI delta.tool_calls with input JSON accumulated from streamed deltas
  • tool results (claude agentic loop) → folded into delta.content with a clear marker
  • claude CLI is sandboxed: spawned in tempdir, no tools, no MCP, no skills, no hooks — pure prompt → text
  • session endfinish_reason: stop | length | tool_calls, followed by [DONE]

First chunk arrives within a few ms of the backend's first token. No buffering, no polling.