// diagen v0.1.x · probably emerging 🌀
// project · live · v0.1.x

diagen

real-time voice bot. it hears you, thinks, talks back, moves its face. that's the whole pitch.

try the demo → read the source or: clone and run it
// receipt
statuslive · works on our laptop
sttwhisper · on-device
llmllama3.2-1b · qwen3.5-0.8b · cuda
ttsomnivoice · pocket-tts wasm
faceaudio2face + vrm 0.0/1.0
shipsoffline · buildless · lfs-distributed
licensemit
first commit2025
// pipeline

voice in. voice out. one loop.

the mic hears you. whisper reads the words. llama writes a reply. omnivoice speaks it. audio2face moves the mouth. takes about a second end-to-end on a warm gpu.

01 · inmic48k stereo · discord or browser
02 · sttwhisperon-device · base/tiny
03 · llmllama.cppgguf · cuda · streaming
04 · ttsomnivoice24k · voice clone · cuda
05 · outa2f + vrmvisemes · blendshapes · face
// features

one system. every moving part.

speech-to-text
whisper, on your hardware
openai whisper via transformers.js. no api call. model cached locally. latency scales with the model you pick.
text generation
llama.cpp in-process
no ollama server. node-llama-cpp loads a gguf straight into the diagen process. cuda if you have it.
text-to-speech
omnivoice + pocket-tts
server side: omnivoice, with voice cloning from a wav reference. browser side: pocket-tts compiled to wasm.
facial animation
audio2face, self-hosted
audio → blendshapes. no cloud. the vrm mouth moves with the speech without a special mesh.
avatar
vrm 0.0 and 1.0
drop any vrm in. cleetus ships with the repo. the mouth, eyes, brows all route through the blendshape table.
distribution
git lfs for the weights
clone the repo, run git lfs pull, everything's there. no curl, no huggingface cli, no hunting.
// stack

what it's made of

runtimenode · bun
sttxenova whisper
llmnode-llama-cpp
tts (server)omnivoice
tts (browser)pocket-tts wasm
audio2faceonnx runtime
avatarthree + pixiv/three-vrm
voice i/odiscord.js + dispipe
uiwebjsx · ripple-ui
// quickstart

clone and run it

weights live in git lfs. first pull is ~2gb. after that it's just npm start.

# clone and fetch weights
git clone https://github.com/AnEntrypoint/diagen
cd diagen
git lfs pull

# optional: discord bot
echo "DISCORD_TOKEN=your-token-here" > .env

# install and start
npm install
npm start