LarryXFly
diff --git a/‎docs/source/torch/auto_deploy/advanced/expert_configurations.md‎
Lines changed: 79 additions & 0 deletions b/‎docs/source/torch/auto_deploy/advanced/expert_configurations.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎tensorrt_llm/_torch/auto_deploy/config/default.yaml‎
Lines changed: 1 addition & 1 deletion b/‎tensorrt_llm/_torch/auto_deploy/config/default.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎tensorrt_llm/_torch/auto_deploy/transform/interface.py‎
Lines changed: 4 additions & 2 deletions b/‎tensorrt_llm/_torch/auto_deploy/transform/interface.py‎
Lines changed: 4 additions & 2 deletions
@@ -153,6 +153,85 @@ python build_and_run_ad.py \
   --args.world-size=8  # CLI override beats both YAML configs
 ```
 
+## Sharding configuration
+
+The `detect_sharding` transform automatically detects and applies sharding strategies to the model. It supports multiple sharding sources and dimensions, allowing flexible configuration for different model architectures and parallelism strategies.
+
+### Configuration Parameters
+
+The `detect_sharding` transform accepts the following configuration parameters:
+
+#### `simple_shard_only` (bool, default: `false`)
+
+When set to `true`, forces simple sharding (row-wise sharding with all-gather) for all linear layers, bypassing more sophisticated column/row sharding strategies. This is useful when you want a uniform sharding approach across all layers or when debugging sharding issues.
+
+#### `sharding_source` (list, default: `['manual', 'factory', 'heuristic']`)
+
+Specifies the priority order of sharding sources. The order matters: if multiple sources try to apply sharding to the same layer, only the first one in the list will be applied. The available sources are:
+
+- **`'manual'`**: Uses manually provided sharding configuration via `manual_config` parameter
+- **`'factory'`**: Uses factory-provided sharding configuration (e.g., from HuggingFace model configs)
+- **`'heuristic'`**: Uses automatic heuristic-based sharding detection based on layer patterns
+
+Example: If both `manual` and `heuristic` try to apply sharding to layer L, only the `manual` transformation will be applied since it appears first in the list.
+
+#### `support_partial_config` (bool, default: `true`)
+
+When `true`, allows partial sharding configurations where not all layers need to be specified in the manual or factory config. Layers not explicitly configured will be handled by heuristic sharding or left unsharded. When `false`, the configuration must specify all layers or it will be invalidated and skipped.
+
+#### `sharding_dims` (list, default: `['tp', 'ep', 'bmm']`)
+
+Specifies which sharding dimensions to apply during heuristic sharding. The available dimensions are:
+
+- **`'tp'`**: Tensor parallelism - applies column/row sharding for standard transformer layers
+- **`'ep'`**: Expert parallelism - shards experts across ranks for Mixture-of-Experts (MoE) models
+- **`'bmm'`**: Batch matrix multiplication sharding - shards batch matrix multiplication operations
+- **`'ssm'`**: State space model sharding - applies specialized sharding for Mamba/SSM layers
+
+You can enable multiple dimensions simultaneously. For example, `['tp', 'ep']` will apply both tensor parallelism and expert parallelism.
+
+#### `requires_shape_prop` (bool, default: `true`)
+
+Whether shape propagation is required before applying this transform. Shape propagation enables the transform to make informed decisions about sharding strategies based on tensor dimensions.
+
+### Manual TP Sharding Configuration
+
+For advanced users, you can provide a manual sharding configuration. An example of such setting:
+
+```yaml
+args:
+  transforms:
+    detect_sharding:
+      manual_config:
+        head_dim: 128
+        tp_plan:
+          # mamba SSM layers
+          in_proj: mamba
+          out_proj: rowwise
+          # attention layers
+          q_proj: colwise
+          k_proj: colwise
+          v_proj: colwise
+          o_proj: rowwise
+          # NOTE: for performance reason, consider not sharding the following
+          # layers at all. Commenting out the following layers will replicate
+          # them across ranks.
+          # MLP and shared experts in MoE layers
+          gate_proj: colwise
+          up_proj: colwise
+          down_proj: rowwise
+          # MoLE: latent projections: simple shard
+          fc1_latent_proj: gather
+          fc2_latent_proj: gather
+```
+
+The `tp_plan` dictionary maps layer names (using module paths with wildcard `*` support) to sharding strategies:
+
+- **`colwise`**: Column-wise sharding (splits the weight matrix along columns)
+- **`rowwise`**: Row-wise sharding (splits the weight matrix along rows)
+- **`mamba`**: Specialized sharding for Mamba SSM layers
+- **`gather`**: Simple shard with row-wise sharding and all-gather operation
+
 ## Built-in Default Configuration
 
 Both `AutoDeployConfig` and `LlmArgs` classes automatically load a built-in `default.yaml` configuration file that provides defaults for the AutoDeploy inference optimizer pipeline. This file is specified in the `_get_config_dict()` function in `tensorrt_llm._torch.auto_deploy.llm_args` and defines default transform configurations for graph optimization stages.
 
@@ -76,7 +76,7 @@ transforms:
   detect_sharding:
     stage: sharding
     simple_shard_only: false
-    sharding_source: ['factory','heuristic']
+    sharding_source: ['manual', 'factory', 'heuristic']
     support_partial_config: true
     sharding_dims: ['tp', 'ep', 'bmm']
     allreduce_strategy: 'AUTO'
 
@@ -24,7 +24,7 @@
     run_shape_prop,
 )
 from ..utils.logger import ad_logger
-from ..utils.sharding_utils import ShardingConfig
+from ..utils.sharding_utils import ShardingTransformContainer
 
 
 class TransformError(Exception):
@@ -61,7 +61,9 @@ def __lt__(self, other):
 class SharedConfig(BaseModel):
     """Global config shared between multiple transforms in the inference optimizer."""
 
-    sharding_config: ShardingConfig = Field(default_factory=ShardingConfig)
+    sharding_transform_container: ShardingTransformContainer = Field(
+        default_factory=ShardingTransformContainer
+    )
     local_rank: int = Field(default=0)
     world_size: int = Field(default=1)