Skip to content

chore: optimize avatar bone matrix calculation pipeline#7604

Open
dalkia wants to merge 10 commits intodevfrom
chore/opti-transform-job
Open

chore: optimize avatar bone matrix calculation pipeline#7604
dalkia wants to merge 10 commits intodevfrom
chore/opti-transform-job

Conversation

@dalkia
Copy link
Collaborator

@dalkia dalkia commented Mar 17, 2026

Pull Request Description

What does this PR change?

Improves the avatar bone matrix calculation pipeline with three key optimizations, plus fixes a critical bug causing infinite avatar re-instantiation.

1. TransformAccessArray for parallel bone gathering

Bone localToWorldMatrix reads are now performed on worker threads via TransformAccessArray + IJobParallelForTransform (dedicated BoneGatherJob and AvatarRootGatherJob), instead of iterating every bone per avatar on the main thread. This eliminates a significant main-thread bottleneck that scaled linearly with avatar count — 62 Transform.localToWorldMatrix calls per avatar, each involving a managed-to-native transition.

2. Dedicated main player pipeline to unblock InterpolateCharacterSystem

The main player avatar is processed in its own separate small pipeline (62 bones + 1 root) that schedules and completes immediately within StartAvatarMatricesCalculationSystem. This releases the TransformAccessArray lock on the main player's transforms before InterpolateCharacterSystem runs in ChangeCharacterPositionGroup, preventing the job system from blocking the main thread when the character controller needs to write to the player transform hierarchy. Remote avatars use a separate batched pipeline with deferred completion in PreRenderingSystemGroup.

3. Per-avatar Burst-optimized bone calculation (from PR #7230)

BoneMatrixCalculationJob is now parallelized per avatar rather than per bone. Each job task loads the avatar matrix once and loops over 62 bones using math.mul on float4x4 natively — a tight sequential range that Burst can auto-vectorize. This also eliminates Matrix4x4float4x4 casts throughout the pipeline. Additionally, TransformAccessArrays are rebuilt lazily (only when avatars are added/removed), and the register-once pattern (RegisterAvatar / RegisterMainPlayerAvatar) avoids redundant per-frame main-thread work.

4. Structural cleanup: pipeline separation into dedicated classes

AvatarTransformMatrixJobWrapper has been split into three files for clarity:

  • MainPlayerPipeline — Dedicated single-avatar pipeline. Registered once at avatar creation and never released (bones are stable for the lifetime of the game). No dummy transform fallback needed.
  • RemoteAvatarPipeline — Batched pipeline for all remote avatars with dynamic resizing, index recycling, and deferred completion.
  • AvatarTransformMatrixJobWrapper — Thin orchestrator that owns both pipelines and the shared dummy transform.

The main player pipeline is now skipped during wearable changes (ReleaseAvatar no longer tears down the main player registration), since the skeleton and bone transforms persist across wearable swaps.

5. In-place TAA slot updates to eliminate rebuild bottleneck

RemoteAvatarPipeline now uses flat backing arrays (flatBones, flatRoots) pre-filled with dummyTransform, replacing the jagged Transform[][] + Transform[] arrays. When avatars are registered or released, TAA slots are updated in-place via TransformAccessArray[index] = transform (O(62) per avatar) instead of triggering a full rebuild of all slots (O(N×62)). Full TAA rebuilds now only occur on capacity growth (rare). This eliminates the main-thread stall that occurred every frame during bulk avatar instantiation — previously, destroying 500 avatars and reinstantiating them caused ~50 consecutive frames of 10-20ms stalls from handle.Complete() + full TAA reconstruction.

6. Fix: Profile.IsDirty never reset — infinite avatar re-instantiation

Bug fix: AvatarLoaderSystem checked profile.IsDirty to trigger avatar shape updates, but never reset it to false after consuming it. This caused an infinite loop:

  1. AvatarLoaderSystem sees profile.IsDirty == true, sets avatarShapeComponent.IsDirty = true
  2. AvatarInstantiatorSystem re-instantiates the avatar (full material/wearable/skinning setup), sets avatarShapeComponent.IsDirty = false
  3. Next frame: AvatarLoaderSystem sees profile.IsDirty is still true → goto 1

Every avatar that ever received a profile update was being fully re-instantiated every frame indefinitely, gated only by the frame time budget. This masked itself as "normal avatar overhead" but was consuming the entire instantiation budget on redundant work. Fixed by resetting profile.IsDirty = false in AvatarLoaderSystem after consuming it, with shared logic extracted into ApplyProfileToAvatarShape.

Files changed:

  • BoneMatrixCalculationJob.csIJobParallelFor per-avatar with float4x4/math.mul
  • TransformGatherJobs.cs — New BoneGatherJob + AvatarRootGatherJob (IJobParallelForTransform)
  • AvatarTransformMatrixJobWrapper.cs — Thin orchestrator delegating to the two pipeline classes, passes dummyTransform to RemoteAvatarPipeline
  • MainPlayerPipeline.cs — Dedicated main player bone matrix pipeline (register-once, immediate completion)
  • RemoteAvatarPipeline.cs — Batched remote avatar pipeline with flat backing arrays and in-place TAA updates
  • AvatarTransformMatrixComponent.cs — Added IsMainPlayer flag for pipeline routing
  • StartAvatarMatricesCalculationSystem.cs — Split query: PlayerComponent → main player pipeline, others → remote
  • FinishAvatarMatricesCalculationSystem.cs — Routes to correct result array based on IsMainPlayer
  • ReleaseAvatar.cs — Main player pipeline is never released on wearable change
  • AvatarInstantiatorSystem.cs — Passes releaseFromPipeline: false for main player wearable changes
  • AvatarLoaderSystem.cs — Reset profile.IsDirty = false after consuming, extracted ApplyProfileToAvatarShape

Performance comparison

With 400 avatars, on an isolated worlds, there are clear gains. Left is dev, right is this PR

image

Interpolate character system doesnt show a bottleneck, as originally presented in this issue. The difference between dev and this branch is meaningless

image

Test Instructions

Test Steps

  1. Enter world and confirm your own avatar renders correctly with all wearables
  2. Move to a populated area with 10+ other avatars visible
  3. Verify all remote avatars render and animate correctly
  4. Teleport between parcels — confirm no avatar flickering or missing meshes
  5. Walk around continuously — verify character movement is smooth with no microstuttering
  6. Change your wearables — confirm avatar re-instantiates correctly (and only once, not every frame)
  7. Verify emotes still play correctly on both local and remote avatars
  8. Teleport to an empty world. Instantiate 300 avatars through the debug menu. Force them to be all male, if we leave the random combination you may blow the memory unintentionally.
  9. Delete some of them. Delete all of them. Reinstantiate. They should all appear correctly without massive frame hiccups
  10. Verify that after all avatars are instantiated, AvatarInstantiatorSystem and AvatarLoaderSystem show negligible cost in the Profiler (no repeated re-instantiation)

Additional Testing Notes

  • Compare Profiler traces before/after: main-thread time in StartAvatarMatricesCalculationSystem should be significantly reduced
  • Pay special attention to character movement smoothness — the main player pipeline separation specifically targets stutter caused by job system locks
  • Test with both low (1-5) and high (50+) avatar counts to verify both pipelines perform well
  • Edge case: rapidly teleporting in/out of crowded areas (avatar registration/release churn)

Quality Checklist

  • Changes have been tested locally
  • Documentation has been updated (if required)
  • Performance impact has been considered
  • For SDK features: Test scene is included

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dalkia dalkia requested review from a team as code owners March 17, 2026 15:29
@github-actions
Copy link
Contributor

github-actions bot commented Mar 17, 2026

Copy link
Contributor

@NickKhalow NickKhalow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link

@DafGreco DafGreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ PR reviewed and approved by QA on both platforms following instructions playing both happy and un-happy path

Regressions for this ticket had been performed in order to verify that the normal flow is working as expected:

  • [✔️ ] Backpack and wearables in world
  • [ ✔️] Emotes in world and in backpack
  • [✔️ ] Teleport with map/coordinates/Jump In
  • [✔️ ] Chat and multiplayer
  • [ ✔️] Profile card
  • [✔️ ] Camera
  • [✔️ ] Skybox

Prod environment and PR have been compared mutually and there are more FPS in the PR than in prod as expected when initiating 300 avatars.
Plus , all the points of the PR have been checked and there are no issues in order to get this PR merged ! 🚀

Image Image

@DafGreco DafGreco self-requested a review March 23, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants