[WIP] [tx] Torch definition for qwen3 and LoRA #649

tyler-griggs · 2025-11-10T01:56:27Z

Implements a torch definition of Qwen3 and multi-LoRA.

The intent was to closely match the interface and implementation of the jax-based code, with exceptions for differences in the canonical torch library (e.g., the interface to attention is a little different with tensor dimensions in a different order).

This PR focuses on dense Qwen3 with LoRA applied to the linear layers. This PR does not add support for Qwen3 MoE or LoRA in the embedding or expert layers.

There are also performance improvements not included. For example, apply_lora loops over adapter indices and performs individual mm's instead of a raggeddot/gemm. These will be resolved in following PRs.

skyrl-tx/tx/torch/layers/lora.py

pcmoritz · 2025-11-10T09:56:10Z

skyrl-tx/tx/torch/layers/lora.py

+        A = torch.empty(*shape_A, dtype=dtype, device=device)
+        B = torch.zeros(*shape_B, dtype=dtype, device=device)
+        nn.init.kaiming_uniform_(A, a=math.sqrt(5))  # He-uniform A
+        self.lora_A = nn.Parameter(A, requires_grad=True)


I wonder if we can / if it makes sense to have something a little more like our Param class s we can do Param(*shape, init=..., sharding=...) going forward (we can also experiment with this as a follow up / later). The imperative nn.init.kaiming_uniform_ initialization is slightly ugly :D

Gave this a shot!

pcmoritz · 2025-11-10T09:59:32Z

skyrl-tx/tx/torch/layers/lora.py

+
+        if x.dim() != 3:
+            raise ValueError("x must be [B, T, in_features].")
+        B, T, in_features = x.shape


We will need to make sure this can support #511

Yes, it won't right now, but will be updated to support embedding layer adapters

pcmoritz · 2025-11-10T10:04:14Z

skyrl-tx/tx/torch/layers/lora.py

+        in_features: int,
+        out_features: int,
+        *,
+        max_lora_adapters: int = 0,


It would probably be best if we get rid of all the defaults here going forwards (I know the current code has them, but I don't think it is good, it can only lead to errors if somebody forgets to pass the parameter and the default was not good)

Good point, this is a good chance to clean it up. I set all optional arguments to default to None, but otherwise removed the defaults

pcmoritz · 2025-11-10T10:08:30Z

skyrl-tx/tx/torch/layers/util.py

+    sorted_adapter_indices = None if adapter_indices is None else adapter_indices[sort_idx]
+
+    # Compute group sizes (minlength guarantees output length)
+    sorted_indices = indices[sort_idx]


Hmm, this is not needed, right? E.g. bincount is permutation invariant so we can just pass indices in there and otherwise we don't need sorted_indices.

Oh good catch

pcmoritz · 2025-11-10T10:10:48Z

skyrl-tx/tx/torch/layers/util.py

+        sorted_adapter_indices: Adapter indices sorted with tokens (or None if not provided)
+    """
+    # Sort by group index
+    sort_idx = torch.argsort(indices)


Let's call this sort_indices so it is consistent with unsort_indices below and make clear it is multiple indices? (or alternatively maybe sort_perm and unsort_perm for "permutation" if you prefer)?

pcmoritz · 2025-11-10T10:19:14Z

skyrl-tx/tx/torch/models/qwen3.py

+
+        updated_cache = (k, v)
+
+        # Attention (causal only during prefill, GQA handled via repeat)


You should be able to just set enable_gqa=True in scaled_dot_product_attention, right?

pcmoritz · 2025-11-10T10:23:08Z

skyrl-tx/tx/torch/models/qwen3.py

+    def __init__(self, config: Qwen3Config, *, dtype: torch.dtype):
+        super().__init__()
+        self.config = config
+        max_lora_adapters = getattr(config, "max_lora_adapters", 0)


Since #636, you don't need these any more, you can just do config.{max_lora_adapters, max_lora_rank), and it is probably easiest to just inline it in the Qwen3DecoderLayer constructor below

pcmoritz · 2025-11-10T10:32:20Z

skyrl-tx/tests/torch/models/test_qwen3.py

+
+    # Load all safetensors files
+    state_dict = {}
+    from pathlib import Path


Let's move this to the top?

pcmoritz

Looks great! Before merging it, let's move it to tx/extra/torch/{....} and tests/extra/torch/... until we replace to jax model definitions to make it clear to contributors that this is not the main code path yet and avoid confusions?

I also made some small comments :)

tyler-griggs added 5 commits November 10, 2025 00:19

working on torch definition for qwen3/lora

73d440d

test and kv cache fixes

b7dd7d5

format

a11254e

move test

66cd3b9

cleaning up

005aed5

tyler-griggs mentioned this pull request Nov 10, 2025

[tx] Pytorch Implementation for the model definition #650

Open

8 tasks

tyler-griggs added 2 commits November 10, 2025 02:23

cleaning

7760d0a

more cleaning

fa7b392

tyler-griggs marked this pull request as ready for review November 10, 2025 02:45

pcmoritz reviewed Nov 10, 2025

View reviewed changes

skyrl-tx/tx/torch/layers/lora.py Outdated Show resolved Hide resolved

pcmoritz reviewed Nov 10, 2025

View reviewed changes

pcmoritz approved these changes Nov 10, 2025

View reviewed changes

tyler-griggs changed the title ~~[WIP] Torch definition for qwen3 and LoRA~~ [WIP] [tx] Torch definition for qwen3 and LoRA Nov 10, 2025

tyler-griggs added the tx label Nov 10, 2025

tyler-griggs added 2 commits November 10, 2025 19:02

respond to first batch of comments

bc1fb51

more responding to comments

56ecfcd


		updated_cache = (k, v)

		# Attention (causal only during prefill, GQA handled via repeat)

[WIP] [tx] Torch definition for qwen3 and LoRA #649

Are you sure you want to change the base?

[WIP] [tx] Torch definition for qwen3 and LoRA #649

Uh oh!

Conversation

tyler-griggs commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pcmoritz Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmoritz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tyler-griggs commented Nov 10, 2025 •

edited

Loading

pcmoritz Nov 10, 2025 •

edited

Loading

pcmoritz Nov 10, 2025 •

edited

Loading