OpenHuman - Render Pipeline Deep Dive


Overview: Pass Execution Order

Each frame, the engine executes the following passes in sequence:


1. CPU Phase

1.1 Animation Graph Tick

Before any GPU work, the CPU evaluates the animation state machine for the current frame:

  • Sample active animation clips at current time
  • Compute blend weights between clips
  • Evaluate morph target weights (52 FACS values, [0.0 … 1.0])
  • Merge streaming pose from JitterBuffer if StreamingClient is connected
  • Output a PoseFrame:
interface PoseFrame {
    joints: Float32Array // 4x4 matrices, one per joint (flattened)
    morphWeights: Float32Array // 52 FACS weights
    timestamp: number
}

1.2 Frustum Culling

The engine tests each mesh LOD group against the camera frustum using axis-aligned bounding boxes (AABB). For a single digital human character this rarely culls anything, but becomes relevant when multiple characters are in the scene. Culled meshes skip all GPU passes for that frame.


2. GPU Skinning Pass

WebGL API: TRANSFORM_FEEDBACK (WebGL 2.0)

GPU skinning transforms the character's bind-pose vertex buffer into a posed vertex buffer using the joint matrix palette from the PoseFrame.

Inputs

  • Bind-pose VBO: vec3 position, vec3 normal, vec4 tangent
  • Joint index buffer: uvec4 joints (up to 4 joints per vertex)
  • Weight buffer: vec4 weights (normalized, sum = 1.0)
  • Joint matrix palette: uniform mat4 u_JointMatrices[MAX_JOINTS]

Process

// Vertex shader (simplified)
vec4 skinnedPos =
    weights.x * (u_JointMatrices[joints.x] * vec4(position, 1.0)) +
    weights.y * (u_JointMatrices[joints.y] * vec4(position, 1.0)) +
    weights.z * (u_JointMatrices[joints.z] * vec4(position, 1.0)) +
    weights.w * (u_JointMatrices[joints.w] * vec4(position, 1.0));

Normals and tangents are transformed using the inverse-transpose of the joint matrix to preserve correct lighting normals after non-uniform scaling.

Morph Target Application

After joint skinning, morph target deltas are applied additively:

vec3 morphedPos = skinnedPos.xyz;
for (int i = 0; i < 52; i++) {
    morphedPos += u_MorphWeights[i] * a_MorphDelta[i];
}

Output

  • Skinned + morphed vertex buffer written via Transform Feedback
  • This buffer is used as input for all subsequent render passes
  • No CPU readback - data stays on the GPU

3. Shadow Pass

Technique: PCF (Percentage Closer Filtering) soft shadows

Shadow Map Generation

The engine renders the character's depth from the key light's perspective into a dedicated WebGLTexture (depth texture, 2048×2048 by default).

Key Light
    │
    │  Orthographic projection (covers character bounds)
    ▼
Depth Framebuffer (2048×2048)
    └── depth attachment: DEPTH_COMPONENT24

Shadow Map Sampling (PCF)

During the main render pass, the shadow map is sampled using a 3×3 PCF kernel to produce soft shadow edges:

float shadow = 0.0;
vec2 texelSize = 1.0 / vec2(2048.0);
for (int x = -1; x <= 1; x++) {
    for (int y = -1; y <= 1; y++) {
        float pcfDepth = texture(u_ShadowMap,
            projCoords.xy + vec2(x, y) * texelSize).r;
        shadow += currentDepth - bias > pcfDepth ? 1.0 : 0.0;
    }
}
shadow /= 9.0;

Shadow bias is computed dynamically based on the surface normal vs. light direction angle to avoid shadow acne on curved surfaces.

Quality vs Performance

Quality PresetShadow Map SizePCF Kernel
high2048×20483×3 (9 samples)
medium1024×10242×2 (4 samples)
low512×5121×1 (hard shadow)

4. G-Pass (Geometry Pass)

The G-Pass renders the character's surface properties into an HDR framebuffer. OpenHuman uses a forward rendering approach (not deferred), optimized for a small, known number of lights (3-point lighting setup).

4.1 Opaque PBR Pass

Inputs (texture maps per material):

TextureFormatContents
AlbedoRGBA8 (KTX2 BC7)Base color + alpha
NormalRG8 (KTX2 BC5)Tangent-space normals (Z reconstructed)
ORMRGB8 (KTX2 BC7)Occlusion (R), Roughness (G), Metallic (B)
EmissiveRGBA8 (KTX2 BC7)Emissive color (HDR encoded)

PBR shading model: Cook-Torrance microfacet BRDF

  • Diffuse: Lambertian diffuse (modulated by 1 - metallic)
  • Specular: GGX distribution, Smith geometry term, Schlick Fresnel
  • Occlusion: Baked AO multiplied into diffuse contribution

Output written to HDR_COLOR attachment (RGBA16F).

4.2 SSS Accumulation Pass

For skin materials, a separate SSS accumulation buffer (RGB16F) is written in this pass. It stores the raw diffuse irradiance before any scattering, which is then blurred and composited in Pass 5.3.

Only mesh surfaces tagged with material.sss = true (set in the .ohb bundle) write to this buffer.


5. Lighting & Composite Pass

5.1 Direct Lighting

OpenHuman uses a fixed 3-point lighting model (key, fill, rim), configured via the SDK:

human.setLighting({
    key: { direction: [-1, -1, -1], color: "#fff5e6", intensity: 1.0 },
    fill: { direction: [1, -0.5, -1], color: "#e6f0ff", intensity: 0.4 },
    rim: { direction: [0, 0, 1], color: "#ffffff", intensity: 0.6 },
})

Each light contributes diffuse + specular using the Cook-Torrance BRDF evaluated in the fragment shader. The key light applies shadow attenuation from the shadow map (Pass 3).

5.2 Image-Based Lighting (IBL)

Diffuse IBL: A pre-convolved irradiance cubemap (stored in the .ohb bundle or provided externally) is sampled using the surface world-space normal.

Specular IBL: A pre-filtered environment map (split-sum approximation) is sampled using the reflection vector and roughness level (mip-mapped).

// Specular IBL (split-sum approximation)
vec3 F  = fresnelSchlickRoughness(NdotV, F0, roughness);
vec2 brdf = texture(u_BRDFLut, vec2(NdotV, roughness)).rg;
vec3 prefilteredColor = textureLod(u_EnvMap, R,
    roughness * MAX_REFLECTION_LOD).rgb;
vec3 specular = prefilteredColor * (F * brdf.x + brdf.y);

5.3 SSS Composite

The SSS accumulation buffer (from Pass 4.2) is blurred using a separable Gaussian blur with a skin-tuned kernel (wider in red channel, narrower in blue channel, to approximate real scattering distances in human skin tissue).

The blurred SSS buffer is then composited over the direct diffuse contribution:

vec3 diffuse = mix(directDiffuse, sssBlurred, u_SSSStrength);

Default u_SSSStrength: 0.6 (tunable via human.setSSS({ strength: 0.6 })).


6. Post-Process Stack

All post-process passes operate on screen-space fullscreen quads. Inputs and outputs are ping-pong framebuffers (RGBA16F for HDR passes, RGBA8 after tonemapping).

6.1 Bloom

Algorithm: Dual-Kawase blur (more efficient than Gaussian at large radii)

HDR color buffer
    │
    ▼ Threshold pass (keep pixels above luminance threshold)
Bright regions buffer
    │
    ▼ Downsample × 4 (half resolution each step)
    ▼ Dual-Kawase blur (down + up passes)
    ▼ Upsample × 4 (additive composite at each level)
    │
    ▼ Additive composite onto HDR buffer
Final HDR + Bloom

Tunable parameters:

human.setBloom({
    threshold: 0.9, // luminance threshold to extract bright regions
    intensity: 0.15, // bloom strength (additive)
    radius: 0.4, // blur spread (0.0 – 1.0)
})

6.2 Depth of Field

Algorithm: Circle of Confusion (CoC) map → radial bokeh blur

Only applied to the background - the character mesh is always in focus. A depth test against the character's depth buffer determines the CoC radius per pixel.

DoF is disabled by default (postProcess: true enables it, but dof: false is the default within the post-process stack). Enable explicitly:

human.setDoF({
    enabled: true,
    focalDistance: 1.2, // meters from camera
    focalRange: 0.3, // sharp zone radius around focal point
    maxBlur: 8.0, // max CoC radius in pixels
})

6.3 ACES Tonemapping

Converts the HDR (RGBA16F) buffer to LDR (RGBA8) using the ACES filmic tonemapping curve - the same curve used in Unreal Engine 4+.

vec3 ACESFilm(vec3 x) {
    float a = 2.51;
    float b = 0.03;
    float c = 2.43;
    float d = 0.59;
    float e = 0.14;
    return clamp((x*(a*x+b))/(x*(c*x+d)+e), 0.0, 1.0);
}

After tonemapping, linear → sRGB gamma correction is applied (pow(color, 1.0/2.2)).

6.4 FXAA

Algorithm: FXAA 3.11 (Nvidia) - single-pass edge-detect anti-aliasing on the LDR output buffer.

FXAA detects luminance edges and blends pixels along the edge direction. It is a post-process technique (operates on the final LDR image) and requires no MSAA multisampling.

Quality vs Performance:

Quality PresetFXAA QualityNotes
highFXAA_QUALITY__PRESET 29Highest quality, ~0.3ms
mediumFXAA_QUALITY__PRESET 15Balanced, ~0.15ms
lowDisabledNo AA

7. Blit to Canvas

The final LDR (RGBA8) framebuffer is blitted to the canvas backbuffer using gl.blitFramebuffer(). The canvas is then presented to the browser compositor for display.


Framebuffer Summary

BufferFormatSizeLifetime
Shadow depthDEPTH_COMPONENT242048×2048Persistent
HDR colorRGBA16FCanvas sizePersistent
SSS accumulationRGB16FCanvas sizePersistent
Post ping-pong ARGBA16FCanvas sizePersistent
Post ping-pong BRGBA16FCanvas sizePersistent
LDR outputRGBA8Canvas sizePersistent
Bloom mip chainRGBA16F½ canvas × 4 levelsPersistent

Total VRAM for framebuffers at 1080p (~2MP):

  • RGBA16F @ 2MP = ~16MB per buffer
  • 5 full-res HDR buffers + bloom chain ≈ ~100MB VRAM for framebuffers

On quality: 'low', the engine renders at 50% resolution (internal) and upscales to canvas size, halving framebuffer VRAM to ~25MB.


GPU Timing Budget (Target: 60fps = 16.67ms/frame)

PassHigh QualityMediumLow
GPU Skinning~0.4ms~0.3ms~0.2ms
Shadow Map~0.8ms~0.5ms~0.2ms
G-Pass (PBR)~2.5ms~1.8ms~1.0ms
SSS~1.2ms~0.8msdisabled
Lighting~0.6ms~0.5ms~0.3ms
Bloom~0.8ms~0.5msdisabled
DoF~0.6msdisableddisabled
ACES + FXAA~0.3ms~0.2ms~0.1ms
Total GPU~7.2ms~4.6ms~1.8ms

CPU overhead (JS): ~1–2ms per frame for animation evaluation. Remaining budget is available for application logic.


Next Steps