Skip to content

Add metrics-based rewards, symmetry permutations, and random inversions#8

Merged
victor-villar merged 20 commits intomainfrom
vv-metrics
Jan 8, 2026
Merged

Add metrics-based rewards, symmetry permutations, and random inversions#8
victor-villar merged 20 commits intomainfrom
vv-metrics

Conversation

@victor-villar
Copy link
Collaborator

Adds metrics-based reward shaping, symmetry permutations, and optional random inversions to all three gym environments. Updates twisterl to 0.3.0.

  • Metrics tracking: CNOTs, layers, gates with configurable penalty weights
  • Symmetries: Auto-computed observation/action permutations for data augmentation
  • Inversions: Optional random state inversions during training
  • Backward compatible: New parameters are optional, being true by default.

Copy link
Member

@cbjuan cbjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, review made in collaboration with an LLM to fill part of the gaps I have in my Rust knowledge

env.depth = env.max_depth;

env.step(0);
assert!(!env.solved());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With add_inverts=true, each step() has 50% chance of inverting the state. The assertions can randomly fail, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works because we create and empty circuit, add a CX(0,1) and then another CX(0,1) which basically "solves" the circuit. The inverted circuit is the same. In any case these were small tests do not add much value so I've removed them for now.

assert!(!env.solved());

env.step(0);
assert!(env.solved());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

if self.depth == 0 { -0.5 } else { -0.5/(self.max_depth as f32) }
}
}
fn reward(&self) -> f32 { self.reward_value }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented as it's a potential breaking change if the library is being used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, you mean the way we compute the rewards?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines +357 to +359
fn twists(&self) -> (Vec<Vec<usize>>, Vec<Vec<usize>>) {
(self.obs_perms.clone(), self.act_perms.clone())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If called frequently, this is wasteful. Consider returning &[Vec<usize>] or caching.

Copy link
Member

@cbjuan cbjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much better than before. A couple of things

  1. I found an issue with identation (added in the review comments)
  2. Also, the solution is not cleared on reset: The solution and solution_inv vectors are not cleared in reset(). This means if you reset and run again, the old solution data remains. Is this expected? It affects to clifford.rs, permutation.rs and linear_function.rs

Comment on lines +300 to +310
let mut penalty = 0.0f32;

if let Some(gate) = self.gateset.get(action).cloned() {
let previous = self.metrics_values.clone();
self.metrics.apply_gate(&gate);
let new_metrics = self.metrics.snapshot();
penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights);
self.metrics_values = new_metrics;

self.apply_gate_to_state(&gate);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an indentation issue in line 302

Suggested change
let mut penalty = 0.0f32;
if let Some(gate) = self.gateset.get(action).cloned() {
let previous = self.metrics_values.clone();
self.metrics.apply_gate(&gate);
let new_metrics = self.metrics.snapshot();
penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights);
self.metrics_values = new_metrics;
self.apply_gate_to_state(&gate);
}
let mut penalty = 0.0f32;
if let Some(gate) = self.gateset.get(action).cloned() {
let previous = self.metrics_values.clone();
self.metrics.apply_gate(&gate);
let new_metrics = self.metrics.snapshot();
penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights);
self.metrics_values = new_metrics;
self.apply_gate_to_state(&gate);
}

@victor-villar
Copy link
Collaborator Author

  1. Also, the solution is not cleared on reset: The solution and solution_inv vectors are not cleared in reset(). This means if you reset and run again, the old solution data remains. Is this expected? It affects to clifford.rs, permutation.rs and linear_function.rs

It does not affect functionality on training, where we use reset, but it could definitely be a bug in inference in case someone reuses the env and set a new state with set_state. I'll fix in in both functions, good catch.

Copy link
Member

@cbjuan cbjuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@victor-villar victor-villar merged commit 98688a5 into main Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants