OpenHuman - Architecture Overview

System Layers

Core Modules

1. Loader

Responsibility: Parse .ohb bundle files, upload assets to the GPU.

Reads the binary .ohb archive (header + chunk list)
Decodes KTX2 textures → uploads as WebGLTexture
Parses glTF mesh data → uploads as WebGLBuffer (VBO + IBO)
Extracts skeleton, morph targets, and animation clips
Emits character:loaded event when GPU upload is complete

Key classes: BundleParser, TextureUploader, GeometryUploader

2. Animator

Responsibility: Manage animation state and produce per-frame pose data.

Implements a state machine with idle/talk/gesture states
Supports blend trees for smooth transition between clips
Evaluates 52 FACS morph target weights per frame
Outputs a PoseFrame: joint transforms + blendshape weights
Receives external poses from StreamingClient and merges with local state

Key classes: AnimationGraph, StateMachine, BlendTree, MorphController

3. StreamingClient

Responsibility: Receive real-time animation data over the network.

Opens a WebSocket (or HTTP chunked) connection to an animation server
Decodes binary frames: 16-bit quantized joint data → Float32Array
Maintains a jitter buffer to smooth latency spikes
Pushes decoded PoseFrame objects into the Animator's input queue

Key classes: StreamingClient, JitterBuffer, FrameDecoder

4. Render Engine

Responsibility: Take a PoseFrame + scene state → produce final pixels.

Four sub-systems execute in order each frame:

Sub-system	Role
Geometry Pipeline	GPU skinning (compute shader), frustum culling
Shadow Map	Render depth from light POV, PCF filtering
Material System	PBR shading, SSS pass, draw calls
Post-Process Stack	Bloom → DoF → ACES tonemapping → FXAA

5. WebGL 2.0 Context Manager

Responsibility: Own and manage the raw WebGL context.

Created once at new OpenHuman({ canvas }) initialization
Manages WebGL extension detection and capability flags
Handles context loss/restore events
Provides a thin abstraction layer (GpuDevice) used by all sub-systems - raw WebGLRenderingContext is never exposed in the public API

Data Flow: Asset Loading

flowchart TD
    A[".ohb file on disk / CDN"] --> B["BundleParser"]

    B --> B1["Header<br/>version, chunk count, flags"]
    B --> B2["Chunk[0]<br/>glTF mesh (binary)"]
    B --> B3["Chunk[1]<br/>KTX2 textures<br/>(albedo, normal, ORM, emissive)"]
    B --> B4["Chunk[2]<br/>Skeleton<br/>(joint hierarchy + bind pose)"]
    B --> B5["Chunk[3]<br/>Morph targets<br/>(52 FACS delta buffers)"]
    B --> B6["Chunk[4]<br/>Animation clips<br/>(idle, talk, blink, ...)"]

    B --> C["GPU Upload<br/>(async, chunked to avoid frame drops)"]

    C --> C1["WebGLBuffer<br/>vertex / index data"]
    C --> C2["WebGLTexture<br/>KTX2 compressed textures"]
    C --> C3["Float32Array<br/>morph target deltas<br/>(kept in JS heap)"]

    C --> D["CharacterInstance<br/>(ready to render)"]

Data Flow: Per-Frame Render Loop

flowchart TD
    A["requestAnimationFrame callback (60fps)"]

    A --> B["1. AnimationGraph.tick(deltaTime)"]

    B --> B1["Evaluate state machine"]
    B --> B2["Sample animation clips"]
    B --> B3["Blend morph weights"]
    B --> B4["Merge streaming pose (if connected)"]
    B --> B5["Output: PoseFrame { joints[], morphWeights[] }"]

    B5 --> C["2. GeometryPipeline.skin(PoseFrame)"]

    C --> C1["GPU skinning via Transform Feedback"]
    C --> C2["Output: skinned vertex buffer"]

    C2 --> D["3. ShadowMap.render()"]

    D --> D1["Render character depth from key light"]
    D --> D2["Output: shadow depth texture"]

    D2 --> E["4. MaterialSystem.render()"]

    E --> E1["PBR shading pass"]
    E --> E2["SSS accumulation pass"]
    E --> E3["Composite to HDR framebuffer"]

    E3 --> F["5. PostProcessStack.render()"]

    F --> F1["Bloom (threshold → blur → composite)"]
    F --> F2["Depth of Field (CoC map → bokeh blur)"]
    F --> F3["ACES tonemapping → LDR"]
    F --> F4["FXAA anti-aliasing"]

    F4 --> G["6. Blit to canvas (final output)"]

Threading Model

OpenHuman runs on a single main thread by default, with optional worker offloading:

Task	Thread	Notes
Render loop	Main thread	`requestAnimationFrame`
Asset parsing	Web Worker	Offloaded via `BundleParser` worker
WebSocket I/O	Main thread	Browser handles I/O async
Frame decoding	Web Worker	`FrameDecoder` runs in worker, posts `PoseFrame`
GPU commands	Main thread	WebGL requires main thread (no `OffscreenCanvas` by default)

OffscreenCanvas support (Chrome only): pass offscreen: true to new OpenHuman() to move the render loop to a dedicated worker thread. See GPU Optimization Guide for details.

Public API Surface

The SDK exposes a minimal API surface. All internal sub-systems are private.

class OpenHuman {
    // Lifecycle
    constructor(config: OpenHumanConfig)
    loadCharacter(url: string): Promise<void>
    destroy(): void
 
    // Playback
    play(animation: string, options?: PlayOptions): void
    stop(): void
    applyPose(pose: PoseFrame): void
 
    // Morphs
    setMorphWeight(name: string, weight: number): void
    setMorphWeights(weights: Record<string, number>): void
 
    // Configuration
    setQuality(quality: "high" | "medium" | "low"): void
    setFPS(fps: number): void
 
    // Events
    on(event: string, handler: Function): void
    off(event: string, handler: Function): void
 
    // Debug
    getStats(): RenderStats
}

Key Design Decisions

Why pure WebGL 2.0 (no Three.js)? Three.js and Babylon.js are general-purpose engines with significant overhead (scene graph, physics, audio, etc.) that OpenHuman doesn't need. A purpose-built renderer for digital humans allows tighter control over the render pipeline, SSS implementation, and GPU memory layout - resulting in a ≤200KB bundle vs. 500KB+ for a general engine.

Why .ohb instead of raw glTF? The .ohb format pre-processes and pre-optimizes assets for the OpenHuman pipeline: KTX2 textures are already in GPU-native compressed formats, morph target deltas are pre-computed, and the skeleton is already in OpenHuman's joint order. This eliminates runtime parsing overhead and enables faster load times.

Why 16-bit quantization for streaming? Full 32-bit floats for all joints would require ~2KB per frame at 60fps = ~120KB/s per character. 16-bit quantization halves this to ~60KB/s with imperceptible quality loss for animation data within human joint range-of-motion limits.

Next Steps

Render Pipeline Deep Dive - detailed pass-by-pass breakdown
.ohb Bundle Format Spec - binary layout and chunk types
Animation Graph Reference - state machine configuration
Streaming Protocol Spec - WebSocket frame format
GPU Optimization Guide - memory budgets and profiling