Streaming Protocol

OpenHuman supports two streaming transports for real-time animation: a WebSocket binary protocol for low-latency bidirectional streams (live lip sync, mocap, remote control), and an HTTP chunked stream for pre-recorded or server-push scenarios. Both are consumed by the same StreamingClient API on the client.

When to Use Each Transport

	WebSocket	HTTP Chunked
Latency target	< 50ms end-to-end	100–500ms typical
Direction	Bidirectional	Server → client only
Best for	Live AI TTS lip sync, real-time mocap, remote puppeteering	Pre-recorded speech playback, server-sequenced animations
Reconnect	Manual or automatic	Automatic (fetch retry)
Binary frames	✅ Native	✅ Via `ReadableStream`
JSON frames	✅ Supported	✅ Newline-delimited

Architecture Overview

Transport A - WebSocket Binary Stream

Connecting

main.js

import { OpenHuman, StreamingClient } from "@openhuman/sdk"
 
const human = await OpenHuman.load("character.ohb", canvas)
 
const client = new StreamingClient({
    transport: "websocket",
    url: "wss://your-server.example.com/animation-stream",
    jitterBuffer: 80, // ms - smooths latency spikes (default: 80)
    reconnect: true, // auto-reconnect on drop (default: true)
    reconnectDelay: 1000, // ms between reconnect attempts (default: 1000)
})
 
// Attach to the character - poses are applied automatically each frame
client.attach(human)
 
// Open the connection
await client.connect()

Once attach() is called, every frame arriving from the server is pulled from the jitter buffer and applied to the character with no additional code needed.

Binary Frame Format

Each WebSocket message is a single binary frame (ArrayBuffer). The layout is a tightly-packed binary struct:

Offset     Size      Type       Field
──────     ────      ──────     ──────────────────────────────────────────────
0          4         f32        timestamp     - server clock (seconds)
4          2         u16        jointCount    - number of joints in this frame
6          2         u16        facsCount     - number of FACS weights (0 or 52)
8          n×32      f32×8      joints[]      - joint data (see below)
8 + n×32   m×2       i16×m      facs[]        - quantised FACS weights

Joint data - 8 × f32 = 32 bytes per joint:

Offset   Field
──────   ─────────────────────────────────────────
0–11     position    vec3 (x, y, z) - world space, metres
12–27    rotation    quaternion (x, y, z, w) - unit quaternion
28–31    scale       f32 - uniform scale (1.0 = no scale)

FACS weights are transmitted as 16-bit signed integers quantised over [-32 768, 32 767] mapping to [-1.0, 1.0]. The engine dequantises on receipt:

weight_f32 = facs_i16 / 32767.0

This cuts FACS bandwidth by 50% compared to sending raw f32 values (104 bytes vs 208 bytes per frame for 52 targets).

Frame Size Reference

Content	Bytes per frame
Header only	8
256 joints, no FACS	8 + 256 × 32 = 8 200 B
FACS only, no joints	8 + 52 × 2 = 112 B
256 joints + 52 FACS	8 312 B

At 30 fps with 256 joints + FACS, raw bandwidth is approximately 2 Mbps. Enable WebSocket per-message deflate on your server to cut this by ~60%.

Reference Server - Node.js

A minimal WebSocket server that streams poses at 30 fps:

server/stream-server.js

import { WebSocketServer } from "ws"
 
const wss = new WebSocketServer({ port: 8080 })
 
const JOINT_COUNT = 256
const FACS_COUNT = 52
const FRAME_BYTES = 8 + JOINT_COUNT * 32 + FACS_COUNT * 2
 
wss.on("connection", (ws) => {
    console.log("Client connected")
 
    const interval = setInterval(() => {
        const buf = new ArrayBuffer(FRAME_BYTES)
        const view = new DataView(buf)
        let offset = 0
 
        // Header
        view.setFloat32(offset, performance.now() / 1000, true)
        offset += 4
        view.setUint16(offset, JOINT_COUNT, true)
        offset += 2
        view.setUint16(offset, FACS_COUNT, true)
        offset += 2
 
        // Joint data - replace with your actual pose source
        for (let i = 0; i < JOINT_COUNT; i++) {
            view.setFloat32(offset, 0, true) // position.x
            view.setFloat32(offset + 4, 0, true) // position.y
            view.setFloat32(offset + 8, 0, true) // position.z
            view.setFloat32(offset + 12, 0, true) // rotation.x
            view.setFloat32(offset + 16, 0, true) // rotation.y
            view.setFloat32(offset + 20, 0, true) // rotation.z
            view.setFloat32(offset + 24, 1, true) // rotation.w (identity)
            view.setFloat32(offset + 28, 1, true) // scale
            offset += 32
        }
 
        // FACS weights - replace with your TTS / phoneme output
        for (let i = 0; i < FACS_COUNT; i++) {
            const weight = 0.0 // float in range 0.0–1.0
            view.setInt16(offset, Math.round(weight * 32767), true)
            offset += 2
        }
 
        ws.send(buf)
    }, 1000 / 30) // 30 fps
 
    ws.on("close", () => clearInterval(interval))
})
 
console.log("Streaming server on ws://localhost:8080")

Transport B - HTTP Chunked Stream

HTTP chunked streaming uses the browser ReadableStream API - no persistent connection required. Ideal for server-sequenced animation produced alongside AI-generated audio.

Connecting

main.js

import { OpenHuman, StreamingClient } from "@openhuman/sdk"
 
const human = await OpenHuman.load("character.ohb", canvas)
 
const client = new StreamingClient({
    transport: "http",
    url: "https://your-server.example.com/animation-stream",
    format: "ndjson", // 'ndjson' (default) or 'binary'
})
 
client.attach(human)
await client.connect()

Newline-delimited JSON (NDJSON) format

Each chunk is a UTF-8 JSON object terminated by \n - one object per animation frame:

{"t":1.033,"facs":[0,0,0,0,0,0,0,0,0.41,0,0,0,0,0,0,0,0,0,0,0,0,0,0.61,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
{"t":1.066,"facs":[0,0,0,0,0,0,0,0,0.55,0,0,0,0,0,0,0,0,0,0,0,0,0,0.71,0,0.3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}

Field	Type	Description
`t`	`number`	Server timestamp in seconds
`joints`	`number[]`	Optional - flattened joint array (8 floats × jointCount)
`facs`	`number[]`	Optional - 52 FACS weights in canonical order

Binary chunked format

For lower CPU overhead, set format: 'binary'. The server writes length-prefixed binary frames into the HTTP response body. Each frame is identical to the WebSocket binary format, prefixed with a 4-byte u32 frame length:

[frameLength: u32][timestamp: f32][jointCount: u16][facsCount: u16][joints...][facs...]

Reference Server - Node.js

server/http-stream-server.js

import http from "http"
 
http.createServer((req, res) => {
    if (req.url !== "/animation-stream") return res.end()
 
    res.writeHead(200, {
        "Content-Type": "application/x-ndjson",
        "Transfer-Encoding": "chunked",
        "Access-Control-Allow-Origin": "*",
    })
 
    let t = 0
    const interval = setInterval(() => {
        const facs = new Array(52).fill(0)
        facs[22] = Math.sin(t) * 0.5 + 0.5 // jawOpen oscillates (demo)
 
        res.write(JSON.stringify({ t, facs }) + "\n")
        t += 1 / 30
    }, 1000 / 30)
 
    req.on("close", () => clearInterval(interval))
}).listen(8080, () => console.log("HTTP stream server on http://localhost:8080"))

Jitter Buffer

Network packets rarely arrive at a perfectly uniform interval. The jitter buffer absorbs bursts and gaps so the engine always has a frame ready at render time.

Server sends:   ──▮──▮▮───▮──▮▮▮──▮──   (irregular)
Jitter buffer:  ──▮──▮──▮──▮──▮──▮──    (uniform 33 ms output)

Configuration

const client = new StreamingClient({
    transport: "websocket",
    url: "wss://...",
    jitterBuffer: 80, // target buffer depth in ms (default: 80)
    maxBuffer: 300, // drop frames older than this ms (default: 300)
    extrapolate: true, // predict pose on buffer underrun (default: true)
})

Option	Type	Default	Description
`jitterBuffer`	`number`	`80`	Target depth in ms. Higher = smoother but more latency.
`maxBuffer`	`number`	`300`	Discard frames older than this many ms. Prevents unbounded memory growth on stalls.
`extrapolate`	`boolean`	`true`	Linearly extrapolate the last known pose from its velocity rather than freezing when the buffer runs dry.

Tuning guide

Network condition	Recommended `jitterBuffer`
Local / same datacenter	`40–60 ms`
Cross-region (< 100 ms RTT)	`80 ms` (default)
Intercontinental (100–300 ms RTT)	`120–160 ms`
Unreliable mobile network	`200 ms`

Setting jitterBuffer lower than your actual network jitter causes frequent buffer underruns, visible as momentary pose freezes. If you see stuttering, increase the value before looking for other causes.

Facial-Only Streaming (Lip Sync)

For AI TTS lip sync you typically only need FACS weights - no joint data required. Set jointCount = 0 in server frames and the engine skips the skinning upload, saving ~8 KB per frame.

const client = new StreamingClient({
    transport: "websocket",
    url: "wss://tts-server.example.com/lipsync",
    mode: "facs", // skip joint processing entirely
    smoothing: 0.7, // exponential moving average α (default: 0.7)
    jitterBuffer: 60,
})
 
client.attach(human)
await client.connect()

The smoothing parameter applies an exponential moving average to all incoming FACS weights before forwarding them to human.morph:

smoothed[i] = α × previous[i] + (1 − α) × incoming[i]

`smoothing`	Effect
`0.0`	No smoothing - raw values applied instantly
`0.5`	Moderate smoothing, fast response
`0.7`	Default - natural speech feel
`0.9`	Heavy smoothing, very soft transitions

Events

client.on("connected", () => console.log("Stream connected"))
client.on("disconnected", ({ code, reason }) => console.warn("Stream dropped:", reason))
client.on("reconnecting", ({ attempt }) => console.log("Reconnect attempt", attempt))
client.on("frame", (pose) => {
    // pose: { timestamp, joints?: Float32Array, facs?: Float32Array }
    // Fired after jitter buffer output - before GPU upload
})
client.on("bufferUnderrun", () => console.warn("Jitter buffer ran dry"))
client.on("bufferOverflow", () => console.warn("Buffer full - frames dropped"))

Manual Pose Application

If you need to pre-process pose data before it reaches the character (retargeting, blending, filtering), skip client.attach() and handle frames yourself:

// Do NOT call client.attach(human)
 
client.on("frame", ({ joints, facs }) => {
    if (joints) {
        // Retarget from a different skeleton before applying
        const retargeted = myRetargeter.apply(joints)
        human.skeleton.setPose(retargeted)
    }
 
    if (facs) {
        // Blend 50/50 with a local procedural expression
        const blended = facs.map((w, i) => w * 0.5 + localFacs[i] * 0.5)
        human.morph.setFromArray(blended)
    }
})
 
await client.connect()

Complete Example - AI TTS Avatar

A full end-to-end integration connecting an AI speech backend to a live avatar:

ai-avatar.js

import { OpenHuman, StreamingClient } from "@openhuman/sdk"
 
// 1. Load character
const human = await OpenHuman.load("character.ohb", canvas)
human.animation.play("idle")
 
// 2. Create lip sync streaming client
const lipsync = new StreamingClient({
    transport: "websocket",
    url: "wss://tts-backend.example.com/lipsync",
    mode: "facs",
    jitterBuffer: 80,
    smoothing: 0.7,
    reconnect: true,
})
 
lipsync.attach(human)
 
// 3. Wire UI → TTS → stream
const input = document.getElementById("user-input")
const btn = document.getElementById("send-btn")
 
btn.addEventListener("click", async () => {
    const text = input.value.trim()
    if (!text) return
 
    // Transition to talking state
    human.animation.crossFadeTo("talk", 0.3)
    human.morph.applyPreset("neutral")
 
    // POST to TTS backend - it begins streaming FACS over WebSocket
    await fetch("https://tts-backend.example.com/speak", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ text }),
    })
})
 
// 4. Return to idle when speech ends
lipsync.on("speechEnd", () => {
    human.animation.crossFadeTo("idle", 0.5)
    human.morph.animateTo({}, { duration: 0.3 })
})
 
// 5. Diagnostics
lipsync.on("bufferUnderrun", () => {
    console.warn("Lip sync buffer underrun - consider increasing jitterBuffer")
})
 
setInterval(() => {
    const s = lipsync.getStats()
    console.log(`Buffer: ${s.bufferDepth}ms | Latency: ${s.estimatedLatency}ms | FPS: ${s.fps}`)
}, 2000)
 
await lipsync.connect()

Troubleshooting

Symptom	Likely cause	Fix
Pose freezes briefly every few seconds	Jitter buffer underruns	Increase `jitterBuffer` to `120–160 ms`
Visible lag between audio and lip movement	Buffer too deep	Decrease `jitterBuffer` to `40–60 ms`
Character snaps between poses	`smoothing` too low	Increase `smoothing` to `0.7–0.9`
WebSocket drops after ~30 s	Proxy / load-balancer timeout	Send a ping frame every 20 s or configure keep-alive
High CPU at 60 fps	Parsing large JSON frames each tick	Switch from NDJSON to binary transport
`bufferOverflow` warnings	Server sending faster than realtime	Throttle server to target fps
CORS error on HTTP stream	Missing response headers	Add `Access-Control-Allow-Origin: *` to your stream response

API Reference

`StreamingClient` options

Option	Type	Default	Description
`transport`	`'websocket' \| 'http'`	`'websocket'`	Network transport
`url`	`string`	-	`wss://` or `https://` endpoint (required)
`format`	`'binary' \| 'ndjson'`	`'binary'` for WS, `'ndjson'` for HTTP	Frame encoding
`mode`	`'full' \| 'facs'`	`'full'`	`'facs'` skips joint processing for lip-sync-only streams
`jitterBuffer`	`number`	`80`	Buffer depth in ms
`maxBuffer`	`number`	`300`	Max frame age in ms before discard
`extrapolate`	`boolean`	`true`	Predict pose on buffer underrun
`smoothing`	`number`	`0.7`	EMA α for FACS weights (`0` = off, `1` = frozen)
`reconnect`	`boolean`	`true`	Auto-reconnect on disconnect
`reconnectDelay`	`number`	`1000`	Ms between reconnect attempts

`StreamingClient` methods

Method	Signature	Description
`attach`	`(human: OpenHuman)`	Bind to a character - poses applied automatically each frame
`connect`	`() → Promise<void>`	Open the connection
`disconnect`	`()`	Close the connection and flush the buffer
`pause`	`()`	Pause consuming from the jitter buffer (character holds last pose)
`resume`	`()`	Resume consuming
`on`	`(event, handler)`	Subscribe to a lifecycle event
`getStats`	`() → StreamStats`	Return latency, buffer depth, and frame rate diagnostics

`StreamStats` object

interface StreamStats {
    connected: boolean
    bufferDepth: number // current jitter buffer depth in ms
    framesReceived: number // total frames received since connect
    framesDropped: number // frames discarded (too old or overflow)
    estimatedLatency: number // ms - server timestamp vs local clock delta
    fps: number // observed incoming frame rate
}

Next Steps

Facial Blendshapes - understand the 52 FACS targets your server should drive
Animation Graph - combine streaming poses with state-machine body animation
Embed API Reference - expose the streaming avatar via the <open-human> web component

Streaming Protocol

When to Use Each Transport

Architecture Overview

Transport A - WebSocket Binary Stream

Connecting

Binary Frame Format

Frame Size Reference

Reference Server - Node.js

Transport B - HTTP Chunked Stream

Connecting

Newline-delimited JSON (NDJSON) format

Binary chunked format

Reference Server - Node.js

Jitter Buffer

Configuration

Tuning guide

Facial-Only Streaming (Lip Sync)

Events

Manual Pose Application

Complete Example - AI TTS Avatar

Troubleshooting

API Reference

StreamingClient options

StreamingClient methods

StreamStats object

Next Steps

`StreamingClient` options

`StreamingClient` methods

`StreamStats` object