Skip to content

Commit 09ed899

Browse files
authored
Update hpc.md (#2261)
1 parent 3608fad commit 09ed899

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

jsoc/gsoc/hpc.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22

33
Julia is emerging as a serious tool for technical computing and is ideally suited for the ever-growing needs of big data analytics. This set of proposed projects addresses specific areas for improvement in analytics algorithms and distributed data management.
44

5+
## Dynamic Scheduling for Mixture of Experts using Dagger.jl
6+
7+
**Difficulty:** Hard (350h)
8+
9+
Dynamic scheduling for Mixture of Experts (MoE) in LLM faces significant challenges due to the irregular computation patterns induced by expert routing, leading to load imbalances, underutilization of compute resources, and high communication overhead. Each token in MoE is routed to only a subset of experts, causing varying batch sizes and unbalanced workload distribution across experts. The traditional static scheduling approach does not efficiently handle these dynamic task assignments. By using Dagger.jl, we can implement a more dynamic, task-based scheduling system that assigns tokens to experts based on real-time compute availability, ensuring a more balanced workload. Dagger’s asynchronous scheduling allows for efficient parallel execution by dynamically distributing the tasks across multiple devices or compute units, improving GPU utilization and reducing bottlenecks. Furthermore, optimizations such as load balancing algorithms, soft routing mechanisms, and fine-grained task prioritization could be applied to maximize resource utilization and minimize execution time. Solving these optimization problems will not only enhance performance but also improve scalability, making MoE models more efficient and suitable for large-scale deployments.
10+
11+
**Skills:** Familiarity with GPU, representing execution models as Flux.jl, DAGs, and CUDA.jl
12+
13+
**Mentors:** [Julian Samaroo](https://github.com/jpsamaroo), and [Rabab Alomairy](https://github.com/Rabab53)
14+
15+
516
## Distributed Training
617

718
**Difficulty:** Hard (350h)

0 commit comments

Comments
 (0)