[AutoDeploy][Bug]: half/full precision MoE cutlass kernel invocation is missing style+activation arguments

To WAR, choose the triton kernels in `default.yaml`:
```
  fuse_moe:
    stage: post_load_fusion
    enabled: true
    backend: triton
```

### System Info

All



### Who can help?

@nzmora-nvidia 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```
trtllm-bench --model nvidia/NVIDIA-Nemotron-Nano-31B-A3-v3 throughput --dataset tmp/nemotron_128_128_256.inp --warmup 0 --backend _autodeploy --max_batch_size 256 --extra_llm_api_options ~/llm_args_ad.yaml --tp=1


File "/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_modelopt/users/gkwasniewski/dev/TensorRT-LLM/tensorrt_llm/_torch/custom_ops/torch_custom_ops.py", line 224, in fused_moe
    output = run_moe(input, token_selected_experts, token_final_scales,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: fc1_expert_weights inter size must be 2 times fc2_expert_weights inter size.
```

### Expected behavior

Should not throw exception

### actual behavior

throws an exception

### additional notes

The cutlass MoE kernel is invoked with the default activation function (`silu`) instead of the activation used by the model

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoDeploy][Bug]: half/full precision MoE cutlass kernel invocation is missing style+activation arguments #9338

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AutoDeploy][Bug]: half/full precision MoE cutlass kernel invocation is missing style+activation arguments #9338

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions