volcano-sh
diff --git a/‎content/en/blog/Meet Cloud Native Batch Computing with Volcano in AI & Big Data Scenarios.md‎
Lines changed: 1 addition & 1 deletion b/‎content/en/blog/Meet Cloud Native Batch Computing with Volcano in AI & Big Data Scenarios.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎content/en/blog/Volcano-1.11.0-release.md‎
Lines changed: 1 addition & 1 deletion b/‎content/en/blog/Volcano-1.11.0-release.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎content/en/blog/how-volcano-boosts-distributed-training-and-inference-performance.md‎
Lines changed: 1 addition & 1 deletion b/‎content/en/blog/how-volcano-boosts-distributed-training-and-inference-performance.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎content/en/docs/binpack.md‎
Lines changed: 105 additions & 0 deletions b/‎content/en/docs/binpack.md‎
Lines changed: 105 additions & 0 deletions
diff --git a/‎content/en/docs/drf.md‎
Lines changed: 119 additions & 0 deletions b/‎content/en/docs/drf.md‎
Lines changed: 119 additions & 0 deletions
diff --git a/‎content/en/docs/gang.md‎
Lines changed: 91 additions & 0 deletions b/‎content/en/docs/gang.md‎
Lines changed: 91 additions & 0 deletions
@@ -7,7 +7,7 @@ date = 2024-03-08
 lastmod = 2024-03-08
 datemonth = "Mar"
 dateyear = "2024"
-dateday = 08
+dateday = "08"
 
 draft = false  # Is this a draft? true/false
 toc = true  # Show table of contents? true/false
 
@@ -7,7 +7,7 @@ date = 2025-02-07
 lastmod = 2025-02-07
 datemonth = "Feb"
 dateyear = "2025"
-dateday = 07
+dateday = "07"
 
 draft = false  # Is this a draft? true/false
 toc = true  # Show table of contents? true/false
 
@@ -7,7 +7,7 @@ date = 2025-04-01
 lastmod = 2025-04-01
 datemonth = "Apr"
 dateyear = "2025"
-dateday = 01
+dateday = "01"
 
 draft = false  # Is this a draft? true/false
 toc = true  # Show table of contents? true/false
 
@@ -0,0 +1,105 @@
++++
+title = "Binpack"
+
+date = 2021-05-13
+lastmod = 2025-11-11
+
+draft = false  # Is this a draft? true/false
+toc = true  # Show table of contents? true/false
+type = "docs"  # Do not modify.
+
+# Add menu entry to sidebar.
+linktitle = "Binpack"
+[menu.docs]
+  parent = "plugins"
+  weight = 5
++++
+
+## Overview
+
+The goal of the Binpack scheduling algorithm is to fill existing nodes as much as possible (trying not to allocate to empty nodes). In the concrete implementation, the Binpack scheduling algorithm scores the nodes that can accommodate the task, with higher scores indicating higher resource utilization rates. The Binpack algorithm can fill up nodes as much as possible, consolidating application loads on some nodes, which is very conducive to the Kubernetes cluster's node auto-scaling functionality.
+
+## How It Works
+
+The Binpack algorithm is injected into the Volcano Scheduler process as a plugin and is applied during the node selection stage for Pods. When calculating the Binpack score, the Volcano Scheduler considers various resources requested by the Pod and averages them according to the weights configured for each resource.
+
+Key characteristics:
+
+- **Resource Weight**: Each resource type (CPU, Memory, GPU, etc.) can have a different weight in the scoring calculation, depending on the weight value configured by the administrator.
+- **Plugin Weight**: Different plugins also need to be assigned different weights when calculating node scores. The scheduler also sets score weights for the Binpack plugin.
+- **NodeOrderFn**: The plugin implements the NodeOrderFn to score nodes based on how efficiently they would be utilized after placing the task.
+
+## Scenario
+
+The Binpack algorithm is beneficial for small jobs that can fill as many nodes as possible:
+
+### Big Data Scenarios
+
+Single query jobs in big data processing benefit from Binpack by consolidating workloads and maximizing resource utilization on active nodes.
+
+### E-commerce High Concurrency
+
+Order generation in e-commerce flash sale scenarios can leverage Binpack to efficiently use available resources during peak loads.
+
+### AI Inference
+
+Single identification jobs in AI inference scenarios benefit from consolidated scheduling, reducing resource fragmentation.
+
+### Internet Services
+
+High concurrency service scenarios on the Internet benefit from Binpack by reducing fragmentation within nodes and reserving sufficient resource space on idle machines for Pods that have applied for more resource requests, maximizing the utilization of idle resources in the cluster.
+
+## Configuration
+
+The Binpack plugin is configured in the scheduler ConfigMap with optional weight parameters:
+
+```yaml
+tiers:
+- plugins:
+  - name: binpack
+    arguments:
+      binpack.weight: 10
+      binpack.cpu: 1
+      binpack.memory: 1
+      binpack.resources: nvidia.com/gpu
+      binpack.resources.nvidia.com/gpu: 2
+```
+
+### Configuration Parameters
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `binpack.weight` | Overall weight of the Binpack plugin score | 1 |
+| `binpack.cpu` | Weight for CPU resource in scoring | 1 |
+| `binpack.memory` | Weight for Memory resource in scoring | 1 |
+| `binpack.resources` | Additional resources to consider | - |
+| `binpack.resources.<resource>` | Weight for specific resource type | 1 |
+
+## Example
+
+Here's an example scheduler configuration that uses Binpack to prioritize node filling:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: volcano-scheduler-configmap
+  namespace: volcano-system
+data:
+  volcano-scheduler.conf: |
+    actions: "enqueue, allocate, backfill"
+    tiers:
+    - plugins:
+      - name: priority
+      - name: gang
+    - plugins:
+      - name: predicates
+      - name: nodeorder
+      - name: binpack
+        arguments:
+          binpack.weight: 10
+          binpack.cpu: 2
+          binpack.memory: 1
+```
+
+In this configuration, the Binpack plugin is given a weight of 10, and CPU is weighted twice as much as memory in the scoring calculation.
@@ -0,0 +1,119 @@
++++
+title = "DRF"
+
+date = 2021-05-13
+lastmod = 2025-11-11
+
+draft = false  # Is this a draft? true/false
+toc = true  # Show table of contents? true/false
+type = "docs"  # Do not modify.
+
+# Add menu entry to sidebar.
+linktitle = "DRF"
+[menu.docs]
+  parent = "plugins"
+  weight = 7
++++
+
+{{<figure library="1" src="drfjob.png" title="DRF Plugin">}}
+
+## Overview
+
+The full name of the DRF scheduling algorithm is **Dominant Resource Fairness**, which is a scheduling algorithm based on the container group's Dominant Resource. The Dominant Resource is the largest percentage of all required resources for a container group relative to the total cluster resources.
+
+The DRF algorithm selects the container group with the smallest Dominant Resource share for priority scheduling. This approach can accommodate more jobs without allowing a single resource-heavy job to starve a large number of smaller jobs. The DRF scheduling algorithm ensures that in an environment where many types of resources coexist, the fair allocation principle is satisfied as much as possible.
+
+## How It Works
+
+The DRF plugin:
+
+1. **Observes Dominant Resource**: For each job, it identifies which resource (CPU, Memory, GPU, etc.) represents the largest share of cluster resources
+2. **Calculates Share Value**: Computes each job's share value based on its dominant resource usage
+3. **Prioritizes Lower Share**: Jobs with lower share values (using less of their dominant resource) get higher scheduling priority
+
+Key functions implemented:
+
+- **JobOrderFn**: Orders jobs based on their dominant resource share, giving priority to jobs with smaller shares
+- **PreemptableFn**: Determines if a job can be preempted based on resource fairness calculations
+
+The plugin attempts to calculate the total amount of resources allocated to the preemptor and preempted tasks, triggering preemption when the preemptor task has fewer resources.
+
+## Scenario
+
+The DRF scheduling algorithm gives priority to the throughput of businesses in the cluster and is suitable for batch processing scenarios:
+
+### AI Training
+
+Single AI training jobs benefit from DRF as it ensures fair resource allocation across multiple training workloads.
+
+### Big Data Processing
+
+Single big data calculation and query jobs can share resources fairly with other workloads in the cluster.
+
+### Mixed Resource Workloads
+
+In environments with diverse resource requirements (CPU-intensive, Memory-intensive, GPU-intensive jobs), DRF ensures fair allocation across all resource dimensions.
+
+## Configuration
+
+The DRF plugin is configured in the scheduler ConfigMap:
+
+```yaml
+tiers:
+- plugins:
+  - name: priority
+  - name: gang
+- plugins:
+  - name: drf
+  - name: predicates
+  - name: proportion
+```
+
+## Example
+
+Consider a cluster with the following resources:
+- 100 CPUs
+- 400 GB Memory
+
+And two jobs:
+- **Job A**: Each task requires 2 CPUs and 8 GB Memory
+- **Job B**: Each task requires 1 CPU and 32 GB Memory
+
+For Job A:
+- CPU share per task: 2/100 = 2%
+- Memory share per task: 8/400 = 2%
+- Dominant resource: CPU and Memory are equal (2%)
+
+For Job B:
+- CPU share per task: 1/100 = 1%
+- Memory share per task: 32/400 = 8%
+- Dominant resource: Memory (8%)
+
+With DRF, Job A would be scheduled first because its dominant resource share (2%) is smaller than Job B's (8%). This ensures that neither job can monopolize the cluster by requesting large amounts of a single resource.
+
+### VolcanoJob Example
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+  name: drf-example-job
+spec:
+  schedulerName: volcano
+  minAvailable: 2
+  tasks:
+  - replicas: 2
+    name: worker
+    template:
+      spec:
+        containers:
+        - name: worker
+          image: busybox
+          resources:
+            requests:
+              cpu: "2"
+              memory: "8Gi"
+            limits:
+              cpu: "2"
+              memory: "8Gi"
+```
@@ -0,0 +1,91 @@
++++
+title = "Gang"
+
+date = 2021-05-13
+lastmod = 2025-11-11
+
+draft = false  # Is this a draft? true/false
+toc = true  # Show table of contents? true/false
+type = "docs"  # Do not modify.
+
+# Add menu entry to sidebar.
+linktitle = "Gang"
+[menu.docs]
+  parent = "plugins"
+  weight = 4
++++
+
+{{<figure library="1" src="gang.png" title="Gang Plugin">}}
+
+## Overview
+
+The Gang scheduling strategy is one of the core scheduling algorithms of the Volcano Scheduler. It meets the scheduling requirements of "All or nothing" in the scheduling process and avoids the waste of cluster resources caused by arbitrary scheduling of Pods. The Gang scheduler algorithm observes whether the scheduled number of Pods under a Job meets the minimum number of runs. When the minimum number of runs of Job is satisfied, the scheduling action is executed for all Pods under the Job; otherwise, it is not executed.
+
+## How It Works
+
+The Gang plugin considers tasks not in the `Ready` state (including Binding, Bound, Running, Allocated, Succeed, and Pipelined) as having a higher priority. It checks whether the resources allocated to the queue can meet the resources required by the task to run `minAvailable` pods after trying to evict some pods and reclaim resources. If yes, the Gang plugin will proceed with scheduling.
+
+Key functions implemented by the Gang plugin:
+
+- **JobReadyFn**: Checks if a job has enough resources to meet its `minAvailable` requirement
+- **JobPipelinedFn**: Checks if a job can be pipelined
+- **JobValidFn**: Validates if a job's Gang constraint is satisfied
+
+## Scenario
+
+The Gang scheduling algorithm based on the container group concept is well suited for scenarios that require multi-process collaboration:
+
+### AI and Deep Learning
+
+AI scenes often contain complex processes including Data Ingestion, Data Analysts, Data Splitting, Trainers, Serving, and Logging. These require a group of containers to work together, making them suitable for the container-based Gang scheduling strategy.
+
+### MPI and HPC
+
+Multi-thread parallel computing communication scenarios under the MPI computing framework are also suitable for Gang scheduling because master and slave processes need to work together. Containers under the container group are highly correlated, and there may be resource contention. Overall scheduling allocation can effectively solve deadlock situations.
+
+### Resource Efficiency
+
+In the case of insufficient cluster resources, the Gang scheduling strategy can significantly improve the utilization of cluster resources by preventing partial job allocations that would waste resources waiting for other tasks.
+
+## Configuration
+
+The Gang plugin is typically enabled by default and configured in the scheduler ConfigMap:
+
+```yaml
+tiers:
+- plugins:
+  - name: priority
+  - name: gang
+  - name: conformance
+```
+
+## Example
+
+Here's an example of a VolcanoJob that uses Gang scheduling:
+
+```yaml
+apiVersion: batch.volcano.sh/v1alpha1
+kind: Job
+metadata:
+  name: tensorflow-job
+spec:
+  minAvailable: 3  # Gang constraint: at least 3 pods must be schedulable
+  schedulerName: volcano
+  tasks:
+  - replicas: 1
+    name: ps
+    template:
+      spec:
+        containers:
+        - name: tensorflow
+          image: tensorflow/tensorflow:latest
+  - replicas: 2
+    name: worker
+    template:
+      spec:
+        containers:
+        - name: tensorflow
+          image: tensorflow/tensorflow:latest
+```
+
+In this example, the job will only be scheduled if all 3 pods (1 ps + 2 workers) can be allocated resources simultaneously.