|
| 1 | +# Twists (Permutation Symmetries) in twisteRL |
| 2 | + |
| 3 | +Twists are twisteRL's way to describe exact permutation symmetries that exist inside an environment. |
| 4 | +Instead of training a policy to rediscover that symmetries exist (for example, that swapping qubits |
| 5 | +across a symmetric coupling map produces an equivalent observation/action space), the environment |
| 6 | +hands the policy explicit permutations that it can use for data augmentation or symmetry-aware heads. |
| 7 | +By repeatedly doing and undoing these permutations you also reduce the chance of deadlocks and gain a |
| 8 | +lightweight form of regularization because the agent sees equivalent states under many orderings. |
| 9 | + |
| 10 | +## Where Twists Are Used |
| 11 | +- Every environment implements the `twisterl::rl::env::Env` trait. The trait includes a `twists` |
| 12 | + method that returns `(Vec<Vec<usize>>, Vec<Vec<usize>>)` representing valid permutations on the |
| 13 | + flattened observation array and matching permutations on the discrete action space |
| 14 | + (`rust/src/rl/env.rs:33`). |
| 15 | +- When an environment is instantiated from Python via `prepare_algorithm`, twisteRL immediately calls |
| 16 | + `env.twists()` and forwards the returned permutations to the policy constructor |
| 17 | + (`src/twisterl/utils.py:120`). The policy can then symmetrize logits, average values, or augment |
| 18 | + rollouts without extra environment queries. |
| 19 | + |
| 20 | +## Data Contract |
| 21 | +1. **Observation permutations (`obs_perms`)** are expressed in the same flattened index space |
| 22 | + produced by the environment’s `observe()` method. Each permutation covers every index exactly once. |
| 23 | +2. **Action permutations (`act_perms`)** must use the same ordering as `obs_perms`. TwisteRL |
| 24 | + assumes `act_perms[i]` describes how to remap actions when `obs_perms[i]` is applied. |
| 25 | +3. The length of the two permutation lists must match (`len(obs_perms) == len(act_perms)`), and the |
| 26 | + first permutation should usually be the identity so policies have a canonical ordering to fall back to. |
| 27 | + |
| 28 | +## Implementing Twists in Rust Environments |
| 29 | +1. **Compute permutations once** when the environment is constructed. Store the resulting vectors on |
| 30 | + the struct so you can reuse them without recomputing each step. |
| 31 | +2. **Return cached permutations** from the `twists` method by cloning or otherwise referencing the |
| 32 | + stored vectors. This keeps the call cheap even when policies request twists frequently. |
| 33 | +3. **Gate toggles through config**. Consider exposing a `use_perms` or `add_perms` flag so users can |
| 34 | + disable symmetries if they want to benchmark raw performance or compare against non-symmetric runs. |
| 35 | + |
| 36 | +### Tips for new envs |
| 37 | +- If your observation is multi-dimensional, decide on a consistent flattening order and reuse it in |
| 38 | + `observe()`, `obs_shape()`, and permutation computation. |
| 39 | +- Keep permutations short: only add a symmetry when it actually preserves the transition dynamics; |
| 40 | + incorrect permutations can break training stability. |
| 41 | +- Store permutations on the struct instead of recomputing them each `twists()` call to avoid extra |
| 42 | + allocations during training. |
| 43 | + |
| 44 | +## Implementing Twists in Python Environments |
| 45 | +Python environments exposed through `PyEnv` can mirror the same pattern: |
| 46 | + |
| 47 | +1. **Detect graph/device symmetries** using domain-specific tooling. Capture any permutation that |
| 48 | + leaves the transition structure unchanged. |
| 49 | +2. **Sample a permutation for every observation** if you want trajectories to naturally explore each |
| 50 | + orbit; this mimics the way many structured environments randomize qubit or tile order. |
| 51 | +3. **Expose action permutations** through the PyO3 wrapper so the policy receives matching |
| 52 | + permutations. When porting a Python env to Rust, copy the action/observation permutation lists into |
| 53 | + the Rust struct and return them from `twists()`. |
| 54 | + |
| 55 | +## Verifying Your Twists |
| 56 | +1. Call `env.twists()` from Python and check that each permutation is a rearrangement of |
| 57 | + `range(len(observe()))` and `range(num_actions())`. |
| 58 | +2. Run a short training job with and without permutations enabled. If permutations are correct you |
| 59 | + should see either faster convergence or identical performance; regressions usually mean the |
| 60 | + action-and-observation permutations are misaligned. |
| 61 | +3. For debugging, temporarily limit the permutation list to `[identity]` and re-enable additional |
| 62 | + symmetries one at a time. |
| 63 | + |
| 64 | +By explicitly documenting and exposing twists, twisteRL policies gain symmetry awareness for free, |
| 65 | +leading to higher data efficiency on structured problems such as puzzle solvers and quantum circuit |
| 66 | +optimization. |
0 commit comments