Add metrics-based rewards, symmetry permutations, and random inversions#8
Add metrics-based rewards, symmetry permutations, and random inversions#8victor-villar merged 20 commits intomainfrom
Conversation
cbjuan
left a comment
There was a problem hiding this comment.
Few comments, review made in collaboration with an LLM to fill part of the gaps I have in my Rust knowledge
rust/src/envs/linear_function.rs
Outdated
| env.depth = env.max_depth; | ||
|
|
||
| env.step(0); | ||
| assert!(!env.solved()); |
There was a problem hiding this comment.
With add_inverts=true, each step() has 50% chance of inverting the state. The assertions can randomly fail, right?
There was a problem hiding this comment.
It works because we create and empty circuit, add a CX(0,1) and then another CX(0,1) which basically "solves" the circuit. The inverted circuit is the same. In any case these were small tests do not add much value so I've removed them for now.
rust/src/envs/linear_function.rs
Outdated
| assert!(!env.solved()); | ||
|
|
||
| env.step(0); | ||
| assert!(env.solved()); |
| if self.depth == 0 { -0.5 } else { -0.5/(self.max_depth as f32) } | ||
| } | ||
| } | ||
| fn reward(&self) -> f32 { self.reward_value } |
There was a problem hiding this comment.
This should be documented as it's a potential breaking change if the library is being used
There was a problem hiding this comment.
hmm, you mean the way we compute the rewards?
| fn twists(&self) -> (Vec<Vec<usize>>, Vec<Vec<usize>>) { | ||
| (self.obs_perms.clone(), self.act_perms.clone()) | ||
| } |
There was a problem hiding this comment.
If called frequently, this is wasteful. Consider returning &[Vec<usize>] or caching.
cbjuan
left a comment
There was a problem hiding this comment.
It looks much better than before. A couple of things
- I found an issue with identation (added in the review comments)
- Also, the solution is not cleared on reset: The
solutionandsolution_invvectors are not cleared inreset(). This means if you reset and run again, the old solution data remains. Is this expected? It affects toclifford.rs,permutation.rsandlinear_function.rs
| let mut penalty = 0.0f32; | ||
|
|
||
| if let Some(gate) = self.gateset.get(action).cloned() { | ||
| let previous = self.metrics_values.clone(); | ||
| self.metrics.apply_gate(&gate); | ||
| let new_metrics = self.metrics.snapshot(); | ||
| penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights); | ||
| self.metrics_values = new_metrics; | ||
|
|
||
| self.apply_gate_to_state(&gate); | ||
| } |
There was a problem hiding this comment.
There is an indentation issue in line 302
| let mut penalty = 0.0f32; | |
| if let Some(gate) = self.gateset.get(action).cloned() { | |
| let previous = self.metrics_values.clone(); | |
| self.metrics.apply_gate(&gate); | |
| let new_metrics = self.metrics.snapshot(); | |
| penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights); | |
| self.metrics_values = new_metrics; | |
| self.apply_gate_to_state(&gate); | |
| } | |
| let mut penalty = 0.0f32; | |
| if let Some(gate) = self.gateset.get(action).cloned() { | |
| let previous = self.metrics_values.clone(); | |
| self.metrics.apply_gate(&gate); | |
| let new_metrics = self.metrics.snapshot(); | |
| penalty = new_metrics.weighted_delta(&previous, &self.metrics_weights); | |
| self.metrics_values = new_metrics; | |
| self.apply_gate_to_state(&gate); | |
| } |
It does not affect functionality on training, where we use reset, but it could definitely be a bug in inference in case someone reuses the env and set a new state with |
Adds metrics-based reward shaping, symmetry permutations, and optional random inversions to all three gym environments. Updates twisterl to 0.3.0.