|
| 1 | +# Hybrid / Hierarchical Context Parallel |
| 2 | + |
| 3 | +This page covers the stable Bridge-facing meaning of hierarchical context |
| 4 | +parallelism, especially the `a2a+p2p` transport path and |
| 5 | +`hierarchical_context_parallel_sizes`. |
| 6 | + |
| 7 | +For operational setup, code anchors, and verification commands, see |
| 8 | +`skills/perf-techniques/hybrid-context-parallel.md`. |
| 9 | + |
| 10 | +## What It Is |
| 11 | + |
| 12 | +In upstream Megatron-Core, `cp_comm_type="a2a+p2p"` plus |
| 13 | +`hierarchical_context_parallel_sizes` enables a hierarchical context-parallel |
| 14 | +transport path. This is the Bridge-relevant form of hierarchical context |
| 15 | +parallelism. |
| 16 | + |
| 17 | +It is important to separate that from the upstream boolean |
| 18 | +`hybrid_context_parallel`, which is a different feature for balancing packed or |
| 19 | +variable-length workloads. The two concepts should not be treated as |
| 20 | +interchangeable. |
| 21 | + |
| 22 | +## When to Use It |
| 23 | + |
| 24 | +Hierarchical context parallelism is relevant when: |
| 25 | + |
| 26 | +- plain context parallelism is already required |
| 27 | +- larger CP sizes make flat `p2p` less attractive |
| 28 | +- you specifically want the hierarchical `a2a+p2p` transport path |
| 29 | + |
| 30 | +It should be treated as an advanced feature rather than a default recommendation. |
| 31 | + |
| 32 | +## Stable Bridge Limitation |
| 33 | + |
| 34 | +The most important Bridge-specific limitation is that hierarchical context |
| 35 | +parallelism is currently supported only on the MPU initialization path. |
| 36 | + |
| 37 | +In practice, that means: |
| 38 | + |
| 39 | +- `dist.use_decentralized_pg=False` is the supported Bridge path |
| 40 | +- the decentralized process-group path should not be assumed to materialize HCP |
| 41 | + groups |
| 42 | + |
| 43 | +## Stable Constraints |
| 44 | + |
| 45 | +The durable constraints are: |
| 46 | + |
| 47 | +- `hierarchical_context_parallel_sizes` must match |
| 48 | + `context_parallel_size` multiplicatively |
| 49 | +- the usual CP sequence-length divisibility rules still apply |
| 50 | +- Transformer Engine version support matters for `a2a+p2p` |
| 51 | + |
| 52 | +## Recommendation Level |
| 53 | + |
| 54 | +Use hierarchical context parallelism in Bridge only when you intentionally want |
| 55 | +that transport path and are prepared to validate execution-path details. It is |
| 56 | +not yet the kind of feature that should be presented as universally safe across |
| 57 | +all Bridge initialization modes. |
| 58 | + |
| 59 | +## Related Docs |
| 60 | + |
| 61 | +- `docs/performance-guide.md` |
| 62 | +- `docs/training/communication-overlap.md` |
| 63 | +- `skills/perf-techniques/hybrid-context-parallel.md` |
| 64 | +- `knowledge/techniques/hybrid_context_parallel.yaml` |
0 commit comments