OpenHuman - Render Pipeline Deep Dive

Overview: Pass Execution Order

Each frame, the engine executes the following passes in sequence:

1. CPU Phase

1.1 Animation Graph Tick

Before any GPU work, the CPU evaluates the animation state machine for the current frame:

Sample active animation clips at current time
Compute blend weights between clips
Evaluate morph target weights (52 FACS values, [0.0 … 1.0])
Merge streaming pose from JitterBuffer if StreamingClient is connected
Output a PoseFrame:

interface PoseFrame {
    joints: Float32Array // 4x4 matrices, one per joint (flattened)
    morphWeights: Float32Array // 52 FACS weights
    timestamp: number
}

1.2 Frustum Culling

The engine tests each mesh LOD group against the camera frustum using axis-aligned bounding boxes (AABB). For a single digital human character this rarely culls anything, but becomes relevant when multiple characters are in the scene. Culled meshes skip all GPU passes for that frame.

2. GPU Skinning Pass

WebGL API: TRANSFORM_FEEDBACK (WebGL 2.0)

GPU skinning transforms the character's bind-pose vertex buffer into a posed vertex buffer using the joint matrix palette from the PoseFrame.

Inputs

Bind-pose VBO: vec3 position, vec3 normal, vec4 tangent
Joint index buffer: uvec4 joints (up to 4 joints per vertex)
Weight buffer: vec4 weights (normalized, sum = 1.0)
Joint matrix palette: uniform mat4 u_JointMatrices[MAX_JOINTS]

Process

// Vertex shader (simplified)
vec4 skinnedPos =
    weights.x * (u_JointMatrices[joints.x] * vec4(position, 1.0)) +
    weights.y * (u_JointMatrices[joints.y] * vec4(position, 1.0)) +
    weights.z * (u_JointMatrices[joints.z] * vec4(position, 1.0)) +
    weights.w * (u_JointMatrices[joints.w] * vec4(position, 1.0));

Normals and tangents are transformed using the inverse-transpose of the joint matrix to preserve correct lighting normals after non-uniform scaling.

Morph Target Application

After joint skinning, morph target deltas are applied additively:

vec3 morphedPos = skinnedPos.xyz;
for (int i = 0; i < 52; i++) {
    morphedPos += u_MorphWeights[i] * a_MorphDelta[i];
}

Output

Skinned + morphed vertex buffer written via Transform Feedback
This buffer is used as input for all subsequent render passes
No CPU readback - data stays on the GPU

3. Shadow Pass

Technique: PCF (Percentage Closer Filtering) soft shadows

Shadow Map Generation

The engine renders the character's depth from the key light's perspective into a dedicated WebGLTexture (depth texture, 2048×2048 by default).

Key Light
    │
    │  Orthographic projection (covers character bounds)
    ▼
Depth Framebuffer (2048×2048)
    └── depth attachment: DEPTH_COMPONENT24

Shadow Map Sampling (PCF)

During the main render pass, the shadow map is sampled using a 3×3 PCF kernel to produce soft shadow edges:

float shadow = 0.0;
vec2 texelSize = 1.0 / vec2(2048.0);
for (int x = -1; x <= 1; x++) {
    for (int y = -1; y <= 1; y++) {
        float pcfDepth = texture(u_ShadowMap,
            projCoords.xy + vec2(x, y) * texelSize).r;
        shadow += currentDepth - bias > pcfDepth ? 1.0 : 0.0;
    }
}
shadow /= 9.0;

Shadow bias is computed dynamically based on the surface normal vs. light direction angle to avoid shadow acne on curved surfaces.

Quality vs Performance

Quality Preset	Shadow Map Size	PCF Kernel
`high`	2048×2048	3×3 (9 samples)
`medium`	1024×1024	2×2 (4 samples)
`low`	512×512	1×1 (hard shadow)

4. G-Pass (Geometry Pass)

The G-Pass renders the character's surface properties into an HDR framebuffer. OpenHuman uses a forward rendering approach (not deferred), optimized for a small, known number of lights (3-point lighting setup).

4.1 Opaque PBR Pass

Inputs (texture maps per material):

Texture	Format	Contents
Albedo	RGBA8 (KTX2 BC7)	Base color + alpha
Normal	RG8 (KTX2 BC5)	Tangent-space normals (Z reconstructed)
ORM	RGB8 (KTX2 BC7)	Occlusion (R), Roughness (G), Metallic (B)
Emissive	RGBA8 (KTX2 BC7)	Emissive color (HDR encoded)

PBR shading model: Cook-Torrance microfacet BRDF

Diffuse: Lambertian diffuse (modulated by 1 - metallic)
Specular: GGX distribution, Smith geometry term, Schlick Fresnel
Occlusion: Baked AO multiplied into diffuse contribution

Output written to HDR_COLOR attachment (RGBA16F).

4.2 SSS Accumulation Pass

For skin materials, a separate SSS accumulation buffer (RGB16F) is written in this pass. It stores the raw diffuse irradiance before any scattering, which is then blurred and composited in Pass 5.3.

Only mesh surfaces tagged with material.sss = true (set in the .ohb bundle) write to this buffer.

5. Lighting & Composite Pass

5.1 Direct Lighting

OpenHuman uses a fixed 3-point lighting model (key, fill, rim), configured via the SDK:

human.setLighting({
    key: { direction: [-1, -1, -1], color: "#fff5e6", intensity: 1.0 },
    fill: { direction: [1, -0.5, -1], color: "#e6f0ff", intensity: 0.4 },
    rim: { direction: [0, 0, 1], color: "#ffffff", intensity: 0.6 },
})

Each light contributes diffuse + specular using the Cook-Torrance BRDF evaluated in the fragment shader. The key light applies shadow attenuation from the shadow map (Pass 3).

5.2 Image-Based Lighting (IBL)

Diffuse IBL: A pre-convolved irradiance cubemap (stored in the .ohb bundle or provided externally) is sampled using the surface world-space normal.

Specular IBL: A pre-filtered environment map (split-sum approximation) is sampled using the reflection vector and roughness level (mip-mapped).

// Specular IBL (split-sum approximation)
vec3 F  = fresnelSchlickRoughness(NdotV, F0, roughness);
vec2 brdf = texture(u_BRDFLut, vec2(NdotV, roughness)).rg;
vec3 prefilteredColor = textureLod(u_EnvMap, R,
    roughness * MAX_REFLECTION_LOD).rgb;
vec3 specular = prefilteredColor * (F * brdf.x + brdf.y);

5.3 SSS Composite

The SSS accumulation buffer (from Pass 4.2) is blurred using a separable Gaussian blur with a skin-tuned kernel (wider in red channel, narrower in blue channel, to approximate real scattering distances in human skin tissue).

The blurred SSS buffer is then composited over the direct diffuse contribution:

vec3 diffuse = mix(directDiffuse, sssBlurred, u_SSSStrength);

Default u_SSSStrength: 0.6 (tunable via human.setSSS({ strength: 0.6 })).

6. Post-Process Stack

All post-process passes operate on screen-space fullscreen quads. Inputs and outputs are ping-pong framebuffers (RGBA16F for HDR passes, RGBA8 after tonemapping).

6.1 Bloom

Algorithm: Dual-Kawase blur (more efficient than Gaussian at large radii)

HDR color buffer
    │
    ▼ Threshold pass (keep pixels above luminance threshold)
Bright regions buffer
    │
    ▼ Downsample × 4 (half resolution each step)
    ▼ Dual-Kawase blur (down + up passes)
    ▼ Upsample × 4 (additive composite at each level)
    │
    ▼ Additive composite onto HDR buffer
Final HDR + Bloom

Tunable parameters:

human.setBloom({
    threshold: 0.9, // luminance threshold to extract bright regions
    intensity: 0.15, // bloom strength (additive)
    radius: 0.4, // blur spread (0.0 – 1.0)
})

6.2 Depth of Field

Algorithm: Circle of Confusion (CoC) map → radial bokeh blur

Only applied to the background - the character mesh is always in focus. A depth test against the character's depth buffer determines the CoC radius per pixel.

DoF is disabled by default (postProcess: true enables it, but dof: false is the default within the post-process stack). Enable explicitly:

human.setDoF({
    enabled: true,
    focalDistance: 1.2, // meters from camera
    focalRange: 0.3, // sharp zone radius around focal point
    maxBlur: 8.0, // max CoC radius in pixels
})

6.3 ACES Tonemapping

Converts the HDR (RGBA16F) buffer to LDR (RGBA8) using the ACES filmic tonemapping curve - the same curve used in Unreal Engine 4+.

vec3 ACESFilm(vec3 x) {
    float a = 2.51;
    float b = 0.03;
    float c = 2.43;
    float d = 0.59;
    float e = 0.14;
    return clamp((x*(a*x+b))/(x*(c*x+d)+e), 0.0, 1.0);
}

After tonemapping, linear → sRGB gamma correction is applied (pow(color, 1.0/2.2)).

6.4 FXAA

Algorithm: FXAA 3.11 (Nvidia) - single-pass edge-detect anti-aliasing on the LDR output buffer.

FXAA detects luminance edges and blends pixels along the edge direction. It is a post-process technique (operates on the final LDR image) and requires no MSAA multisampling.

Quality vs Performance:

Quality Preset	FXAA Quality	Notes
`high`	FXAA_QUALITY__PRESET 29	Highest quality, ~0.3ms
`medium`	FXAA_QUALITY__PRESET 15	Balanced, ~0.15ms
`low`	Disabled	No AA

7. Blit to Canvas

The final LDR (RGBA8) framebuffer is blitted to the canvas backbuffer using gl.blitFramebuffer(). The canvas is then presented to the browser compositor for display.

Framebuffer Summary

Buffer	Format	Size	Lifetime
Shadow depth	`DEPTH_COMPONENT24`	2048×2048	Persistent
HDR color	`RGBA16F`	Canvas size	Persistent
SSS accumulation	`RGB16F`	Canvas size	Persistent
Post ping-pong A	`RGBA16F`	Canvas size	Persistent
Post ping-pong B	`RGBA16F`	Canvas size	Persistent
LDR output	`RGBA8`	Canvas size	Persistent
Bloom mip chain	`RGBA16F`	½ canvas × 4 levels	Persistent

Total VRAM for framebuffers at 1080p (~2MP):

RGBA16F @ 2MP = ~16MB per buffer
5 full-res HDR buffers + bloom chain ≈ ~100MB VRAM for framebuffers

On quality: 'low', the engine renders at 50% resolution (internal) and upscales to canvas size, halving framebuffer VRAM to ~25MB.

GPU Timing Budget (Target: 60fps = 16.67ms/frame)

Pass	High Quality	Medium	Low
GPU Skinning	~0.4ms	~0.3ms	~0.2ms
Shadow Map	~0.8ms	~0.5ms	~0.2ms
G-Pass (PBR)	~2.5ms	~1.8ms	~1.0ms
SSS	~1.2ms	~0.8ms	disabled
Lighting	~0.6ms	~0.5ms	~0.3ms
Bloom	~0.8ms	~0.5ms	disabled
DoF	~0.6ms	disabled	disabled
ACES + FXAA	~0.3ms	~0.2ms	~0.1ms
Total GPU	~7.2ms	~4.6ms	~1.8ms

CPU overhead (JS): ~1–2ms per frame for animation evaluation. Remaining budget is available for application logic.

Next Steps

Shader System Docs - GLSL source structure, PBR uniforms, custom shader API
SSS Implementation Details - kernel weights, channel-specific blur radii
GPU Optimization Guide - profiling, memory budgets, mobile tuning
Post-Process Configuration Reference - all tunable parameters