Skip to content

Commit e681fe4

Browse files
committed
added unified attention docs and removed test file
1 parent da74834 commit e681fe4

File tree

2 files changed

+18
-131
lines changed

2 files changed

+18
-131
lines changed

docs/source/en/training/distributed_inference.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -332,4 +332,21 @@ transformer = AutoModel.from_pretrained(
332332
pipeline = DiffusionPipeline.from_pretrained(
333333
CKPT_ID, transformer=transformer, torch_dtype=torch.bfloat16,
334334
).to(device)
335-
```
335+
```
336+
### Unified Attention
337+
338+
[Unified Sequence Parallelism](https://huggingface.co/papers/2405.07719) combines Ring Attention and Ulysses Attention into a single approach for efficient long-sequence processing. It applies Ulysses's *all-to-all* communication first to redistribute heads and sequence tokens, then uses Ring Attention to process the redistributed data, and finally reverses the *all-to-all* to restore the original layout.
339+
340+
This hybrid approach leverages the strengths of both methods:
341+
- **Ulysses Attention** efficiently parallelizes across attention heads
342+
- **Ring Attention** handles very long sequences with minimal memory overhead
343+
- Together, they enable 2D parallelization across both heads and sequence dimensions
344+
345+
[`ContextParallelConfig`] supports Unified Attention by specifying both `ulysses_degree` and `ring_degree`. The total number of devices used is `ulysses_degree * ring_degree`, arranged in a 2D grid where Ulysses and Ring groups are orthogonal (non-overlapping).
346+
Pass the [`ContextParallelConfig`] with both `ulysses_degree` and `ring_degree` set to bigger than 1 to [`~ModelMixin.enable_parallelism`].
347+
348+
```py
349+
pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2, ring_degree=2))
350+
```
351+
352+
Unified Attention is to be used when there are enough devices to arrange in a 2D grid (at least 4 devices).

tests/others/test_unified_sp_attention.py

Lines changed: 0 additions & 130 deletions
This file was deleted.

0 commit comments

Comments
 (0)