Streaming Protocol
OpenHuman supports two streaming transports for real-time animation: a WebSocket binary protocol for low-latency bidirectional streams (live lip sync, mocap, remote control), and an HTTP chunked stream for pre-recorded or server-push scenarios. Both are consumed by the same
StreamingClientAPI on the client.
When to Use Each Transport
| WebSocket | HTTP Chunked | |
|---|---|---|
| Latency target | < 50ms end-to-end | 100–500ms typical |
| Direction | Bidirectional | Server → client only |
| Best for | Live AI TTS lip sync, real-time mocap, remote puppeteering | Pre-recorded speech playback, server-sequenced animations |
| Reconnect | Manual or automatic | Automatic (fetch retry) |
| Binary frames | ✅ Native | ✅ Via ReadableStream |
| JSON frames | ✅ Supported | ✅ Newline-delimited |
Architecture Overview
Transport A - WebSocket Binary Stream
Connecting
import { OpenHuman, StreamingClient } from "@openhuman/sdk"
const human = await OpenHuman.load("character.ohb", canvas)
const client = new StreamingClient({
transport: "websocket",
url: "wss://your-server.example.com/animation-stream",
jitterBuffer: 80, // ms - smooths latency spikes (default: 80)
reconnect: true, // auto-reconnect on drop (default: true)
reconnectDelay: 1000, // ms between reconnect attempts (default: 1000)
})
// Attach to the character - poses are applied automatically each frame
client.attach(human)
// Open the connection
await client.connect()Once attach() is called, every frame arriving from the server is pulled from the jitter buffer and applied to the character with no additional code needed.
Binary Frame Format
Each WebSocket message is a single binary frame (ArrayBuffer). The layout is a tightly-packed binary struct:
Offset Size Type Field
────── ──── ────── ──────────────────────────────────────────────
0 4 f32 timestamp - server clock (seconds)
4 2 u16 jointCount - number of joints in this frame
6 2 u16 facsCount - number of FACS weights (0 or 52)
8 n×32 f32×8 joints[] - joint data (see below)
8 + n×32 m×2 i16×m facs[] - quantised FACS weights
Joint data - 8 × f32 = 32 bytes per joint:
Offset Field
────── ─────────────────────────────────────────
0–11 position vec3 (x, y, z) - world space, metres
12–27 rotation quaternion (x, y, z, w) - unit quaternion
28–31 scale f32 - uniform scale (1.0 = no scale)
FACS weights are transmitted as 16-bit signed integers quantised over [-32 768, 32 767] mapping to [-1.0, 1.0]. The engine dequantises on receipt:
weight_f32 = facs_i16 / 32767.0
This cuts FACS bandwidth by 50% compared to sending raw f32 values (104 bytes vs 208 bytes per frame for 52 targets).
Frame Size Reference
| Content | Bytes per frame |
|---|---|
| Header only | 8 |
| 256 joints, no FACS | 8 + 256 × 32 = 8 200 B |
| FACS only, no joints | 8 + 52 × 2 = 112 B |
| 256 joints + 52 FACS | 8 312 B |
At 30 fps with 256 joints + FACS, raw bandwidth is approximately 2 Mbps. Enable WebSocket per-message deflate on your server to cut this by ~60%.
Reference Server - Node.js
A minimal WebSocket server that streams poses at 30 fps:
import { WebSocketServer } from "ws"
const wss = new WebSocketServer({ port: 8080 })
const JOINT_COUNT = 256
const FACS_COUNT = 52
const FRAME_BYTES = 8 + JOINT_COUNT * 32 + FACS_COUNT * 2
wss.on("connection", (ws) => {
console.log("Client connected")
const interval = setInterval(() => {
const buf = new ArrayBuffer(FRAME_BYTES)
const view = new DataView(buf)
let offset = 0
// Header
view.setFloat32(offset, performance.now() / 1000, true)
offset += 4
view.setUint16(offset, JOINT_COUNT, true)
offset += 2
view.setUint16(offset, FACS_COUNT, true)
offset += 2
// Joint data - replace with your actual pose source
for (let i = 0; i < JOINT_COUNT; i++) {
view.setFloat32(offset, 0, true) // position.x
view.setFloat32(offset + 4, 0, true) // position.y
view.setFloat32(offset + 8, 0, true) // position.z
view.setFloat32(offset + 12, 0, true) // rotation.x
view.setFloat32(offset + 16, 0, true) // rotation.y
view.setFloat32(offset + 20, 0, true) // rotation.z
view.setFloat32(offset + 24, 1, true) // rotation.w (identity)
view.setFloat32(offset + 28, 1, true) // scale
offset += 32
}
// FACS weights - replace with your TTS / phoneme output
for (let i = 0; i < FACS_COUNT; i++) {
const weight = 0.0 // float in range 0.0–1.0
view.setInt16(offset, Math.round(weight * 32767), true)
offset += 2
}
ws.send(buf)
}, 1000 / 30) // 30 fps
ws.on("close", () => clearInterval(interval))
})
console.log("Streaming server on ws://localhost:8080")Transport B - HTTP Chunked Stream
HTTP chunked streaming uses the browser ReadableStream API - no persistent connection required. Ideal for server-sequenced animation produced alongside AI-generated audio.
Connecting
import { OpenHuman, StreamingClient } from "@openhuman/sdk"
const human = await OpenHuman.load("character.ohb", canvas)
const client = new StreamingClient({
transport: "http",
url: "https://your-server.example.com/animation-stream",
format: "ndjson", // 'ndjson' (default) or 'binary'
})
client.attach(human)
await client.connect()Newline-delimited JSON (NDJSON) format
Each chunk is a UTF-8 JSON object terminated by \n - one object per animation frame:
{"t":1.033,"facs":[0,0,0,0,0,0,0,0,0.41,0,0,0,0,0,0,0,0,0,0,0,0,0,0.61,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
{"t":1.066,"facs":[0,0,0,0,0,0,0,0,0.55,0,0,0,0,0,0,0,0,0,0,0,0,0,0.71,0,0.3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}| Field | Type | Description |
|---|---|---|
t | number | Server timestamp in seconds |
joints | number[] | Optional - flattened joint array (8 floats × jointCount) |
facs | number[] | Optional - 52 FACS weights in canonical order |
Binary chunked format
For lower CPU overhead, set format: 'binary'. The server writes length-prefixed binary frames into the HTTP response body. Each frame is identical to the WebSocket binary format, prefixed with a 4-byte u32 frame length:
[frameLength: u32][timestamp: f32][jointCount: u16][facsCount: u16][joints...][facs...]
Reference Server - Node.js
import http from "http"
http.createServer((req, res) => {
if (req.url !== "/animation-stream") return res.end()
res.writeHead(200, {
"Content-Type": "application/x-ndjson",
"Transfer-Encoding": "chunked",
"Access-Control-Allow-Origin": "*",
})
let t = 0
const interval = setInterval(() => {
const facs = new Array(52).fill(0)
facs[22] = Math.sin(t) * 0.5 + 0.5 // jawOpen oscillates (demo)
res.write(JSON.stringify({ t, facs }) + "\n")
t += 1 / 30
}, 1000 / 30)
req.on("close", () => clearInterval(interval))
}).listen(8080, () => console.log("HTTP stream server on http://localhost:8080"))Jitter Buffer
Network packets rarely arrive at a perfectly uniform interval. The jitter buffer absorbs bursts and gaps so the engine always has a frame ready at render time.
Server sends: ──▮──▮▮───▮──▮▮▮──▮── (irregular)
Jitter buffer: ──▮──▮──▮──▮──▮──▮── (uniform 33 ms output)
Configuration
const client = new StreamingClient({
transport: "websocket",
url: "wss://...",
jitterBuffer: 80, // target buffer depth in ms (default: 80)
maxBuffer: 300, // drop frames older than this ms (default: 300)
extrapolate: true, // predict pose on buffer underrun (default: true)
})| Option | Type | Default | Description |
|---|---|---|---|
jitterBuffer | number | 80 | Target depth in ms. Higher = smoother but more latency. |
maxBuffer | number | 300 | Discard frames older than this many ms. Prevents unbounded memory growth on stalls. |
extrapolate | boolean | true | Linearly extrapolate the last known pose from its velocity rather than freezing when the buffer runs dry. |
Tuning guide
| Network condition | Recommended jitterBuffer |
|---|---|
| Local / same datacenter | 40–60 ms |
| Cross-region (< 100 ms RTT) | 80 ms (default) |
| Intercontinental (100–300 ms RTT) | 120–160 ms |
| Unreliable mobile network | 200 ms |
Setting
jitterBufferlower than your actual network jitter causes frequent buffer underruns, visible as momentary pose freezes. If you see stuttering, increase the value before looking for other causes.
Facial-Only Streaming (Lip Sync)
For AI TTS lip sync you typically only need FACS weights - no joint data required. Set jointCount = 0 in server frames and the engine skips the skinning upload, saving ~8 KB per frame.
const client = new StreamingClient({
transport: "websocket",
url: "wss://tts-server.example.com/lipsync",
mode: "facs", // skip joint processing entirely
smoothing: 0.7, // exponential moving average α (default: 0.7)
jitterBuffer: 60,
})
client.attach(human)
await client.connect()The smoothing parameter applies an exponential moving average to all incoming FACS weights before forwarding them to human.morph:
smoothed[i] = α × previous[i] + (1 − α) × incoming[i]
smoothing | Effect |
|---|---|
0.0 | No smoothing - raw values applied instantly |
0.5 | Moderate smoothing, fast response |
0.7 | Default - natural speech feel |
0.9 | Heavy smoothing, very soft transitions |
Events
client.on("connected", () => console.log("Stream connected"))
client.on("disconnected", ({ code, reason }) => console.warn("Stream dropped:", reason))
client.on("reconnecting", ({ attempt }) => console.log("Reconnect attempt", attempt))
client.on("frame", (pose) => {
// pose: { timestamp, joints?: Float32Array, facs?: Float32Array }
// Fired after jitter buffer output - before GPU upload
})
client.on("bufferUnderrun", () => console.warn("Jitter buffer ran dry"))
client.on("bufferOverflow", () => console.warn("Buffer full - frames dropped"))Manual Pose Application
If you need to pre-process pose data before it reaches the character (retargeting, blending, filtering), skip client.attach() and handle frames yourself:
// Do NOT call client.attach(human)
client.on("frame", ({ joints, facs }) => {
if (joints) {
// Retarget from a different skeleton before applying
const retargeted = myRetargeter.apply(joints)
human.skeleton.setPose(retargeted)
}
if (facs) {
// Blend 50/50 with a local procedural expression
const blended = facs.map((w, i) => w * 0.5 + localFacs[i] * 0.5)
human.morph.setFromArray(blended)
}
})
await client.connect()Complete Example - AI TTS Avatar
A full end-to-end integration connecting an AI speech backend to a live avatar:
import { OpenHuman, StreamingClient } from "@openhuman/sdk"
// 1. Load character
const human = await OpenHuman.load("character.ohb", canvas)
human.animation.play("idle")
// 2. Create lip sync streaming client
const lipsync = new StreamingClient({
transport: "websocket",
url: "wss://tts-backend.example.com/lipsync",
mode: "facs",
jitterBuffer: 80,
smoothing: 0.7,
reconnect: true,
})
lipsync.attach(human)
// 3. Wire UI → TTS → stream
const input = document.getElementById("user-input")
const btn = document.getElementById("send-btn")
btn.addEventListener("click", async () => {
const text = input.value.trim()
if (!text) return
// Transition to talking state
human.animation.crossFadeTo("talk", 0.3)
human.morph.applyPreset("neutral")
// POST to TTS backend - it begins streaming FACS over WebSocket
await fetch("https://tts-backend.example.com/speak", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text }),
})
})
// 4. Return to idle when speech ends
lipsync.on("speechEnd", () => {
human.animation.crossFadeTo("idle", 0.5)
human.morph.animateTo({}, { duration: 0.3 })
})
// 5. Diagnostics
lipsync.on("bufferUnderrun", () => {
console.warn("Lip sync buffer underrun - consider increasing jitterBuffer")
})
setInterval(() => {
const s = lipsync.getStats()
console.log(`Buffer: ${s.bufferDepth}ms | Latency: ${s.estimatedLatency}ms | FPS: ${s.fps}`)
}, 2000)
await lipsync.connect()Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Pose freezes briefly every few seconds | Jitter buffer underruns | Increase jitterBuffer to 120–160 ms |
| Visible lag between audio and lip movement | Buffer too deep | Decrease jitterBuffer to 40–60 ms |
| Character snaps between poses | smoothing too low | Increase smoothing to 0.7–0.9 |
| WebSocket drops after ~30 s | Proxy / load-balancer timeout | Send a ping frame every 20 s or configure keep-alive |
| High CPU at 60 fps | Parsing large JSON frames each tick | Switch from NDJSON to binary transport |
bufferOverflow warnings | Server sending faster than realtime | Throttle server to target fps |
| CORS error on HTTP stream | Missing response headers | Add Access-Control-Allow-Origin: * to your stream response |
API Reference
StreamingClient options
| Option | Type | Default | Description |
|---|---|---|---|
transport | 'websocket' | 'http' | 'websocket' | Network transport |
url | string | - | wss:// or https:// endpoint (required) |
format | 'binary' | 'ndjson' | 'binary' for WS, 'ndjson' for HTTP | Frame encoding |
mode | 'full' | 'facs' | 'full' | 'facs' skips joint processing for lip-sync-only streams |
jitterBuffer | number | 80 | Buffer depth in ms |
maxBuffer | number | 300 | Max frame age in ms before discard |
extrapolate | boolean | true | Predict pose on buffer underrun |
smoothing | number | 0.7 | EMA α for FACS weights (0 = off, 1 = frozen) |
reconnect | boolean | true | Auto-reconnect on disconnect |
reconnectDelay | number | 1000 | Ms between reconnect attempts |
StreamingClient methods
| Method | Signature | Description |
|---|---|---|
attach | (human: OpenHuman) | Bind to a character - poses applied automatically each frame |
connect | () → Promise<void> | Open the connection |
disconnect | () | Close the connection and flush the buffer |
pause | () | Pause consuming from the jitter buffer (character holds last pose) |
resume | () | Resume consuming |
on | (event, handler) | Subscribe to a lifecycle event |
getStats | () → StreamStats | Return latency, buffer depth, and frame rate diagnostics |
StreamStats object
interface StreamStats {
connected: boolean
bufferDepth: number // current jitter buffer depth in ms
framesReceived: number // total frames received since connect
framesDropped: number // frames discarded (too old or overflow)
estimatedLatency: number // ms - server timestamp vs local clock delta
fps: number // observed incoming frame rate
}Next Steps
- Facial Blendshapes - understand the 52 FACS targets your server should drive
- Animation Graph - combine streaming poses with state-machine body animation
- Embed API Reference - expose the streaming avatar via the
<open-human>web component