Skip to content

Conversation

@dgottlieb
Copy link
Member

@dgottlieb dgottlieb commented Oct 17, 2025

Not a final "ready to merge" state. But need agreement on details of how to blend in new solutions. What I think I've found that's more important than anything else is that cbirrt can behave very erratically. When running against main, scene 9 with max ik solutions changed to 3, we need "only" 65 rrt iterations to get an answer. Change the number of solutions to 4 and it now takes 249 rrt iterations.

The only difference is that the new node generated is the new "optimal" node.

I think a better first step might be to add a "performance" test that isolates IK generation from cbirrt. Where we can feed cbirrt different subsets of the same IK solutions. That always including an actual "good" solution, which is not necessarily IK's "optimal" node. Just to understand what the deviations are.

edit scene 9 is no longer relevant. scene 9 solves in 1 rrt iteration

What this patch functionally does now is let's us start cbirrt with less IK solutions. We can have confidence that IK will continue to generate solutions. And if none of the original IK solutions are sufficient, we should eventually discover any necessary IK solution that the pre-patch code would find.

Timing results from wine crazy touch 1 and 2 are improved because we now return 10 solutions instead of waiting for 1 full second to return a few dozen.

New timings:

crazy-touch1
Solution node: 2 Live? false
PlanMotion:                                            	Calls:     1	Total time: 930.240174ms 	Average time: 930.240174ms
    planSingleGoal:                                    	Calls:     1	Total time: 929.82524ms  	Average time: 929.82524ms
      initRRTSolutions:                                	Calls:     1	Total time: 389.86644ms  	Average time: 389.86644ms
crazy-touch2
Solution node: 2 Live? false
PlanMotion:                                            	Calls:     1	Total time: 843.591565ms 	Average time: 843.591565ms
    planSingleGoal:                                    	Calls:     1	Total time: 842.939328ms 	Average time: 842.939328ms
      initRRTSolutions:                                	Calls:     1	Total time: 448.929819ms 	Average time: 448.929819ms

Old timings:

crazy-touch1
PlanMotion:                                            	Calls:     1	Total time: 1.590587635s 	Average time: 1.590587635s
    planSingleGoal:                                    	Calls:     1	Total time: 1.590112747s 	Average time: 1.590112747s
      initRRTSolutions:                                	Calls:     1	Total time: 1.041093889s 	Average time: 1.041093889s
crazy-touch2
PlanMotion:                                            	Calls:     1	Total time: 1.591505683s 	Average time: 1.591505683s
    planSingleGoal:                                    	Calls:     1	Total time: 1.59102758s  	Average time: 1.59102758s
      initRRTSolutions:                                	Calls:     1	Total time: 1.061294959s 	Average time: 1.061294959s

For optimized cases that don't go through cbirrt, we had to take care to not introduce a regression. Specifically wine-adjust.json has 34 goals, none of which fall into cbirrt. There is an overhead to create (and cleanup/wait on) goroutines that are producing IK results. Each cleanup/wait takes ~2ms. It was important to batch up all the waiting at the top-level plan manager code. The batched code also saw an improvement with wine-adjust.json.

New:

wine-adjust
PlanMotion:                                          	Calls:     1	Total time: 160.962217ms 	Average time: 160.962217ms
    planSingleGoal:                                  	Calls:    34	Total time: 158.64754ms  	Average time: 4.666104ms
      initRRTSolutions:                              	Calls:    34	Total time: 87.320242ms  	Average time: 2.568242ms

Old:

wine-adjust
PlanMotion:                                          	Calls:     1	Total time: 208.094765ms 	Average time: 208.094765ms
    planSingleGoal:                                  	Calls:    34	Total time: 206.474914ms 	Average time: 6.072791ms
      initRRTSolutions:                              	Calls:    34	Total time: 156.733521ms 	Average time: 4.609809ms

For a comparison -- waiting for each planSingleGoal (rather than at PlanMotion), we get the following profile:

wine-adjust
PlanMotion:                                          	Calls:     1	Total time: 307.02622ms  	Average time: 307.02622ms
    planSingleGoal:                                  	Calls:    34	Total time: 305.789156ms 	Average time: 8.993798ms
      initRRTSolutions:                              	Calls:    34	Total time: 107.554055ms 	Average time: 3.163354ms

@dgottlieb dgottlieb requested a review from erh October 17, 2025 16:04
@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Oct 17, 2025
}
// constrainNear will ensure path between oldNear and newNear satisfies constraints along the way
near = &node{inputs: newNear}
near = &node{name: int(nodeNameCounter.Add(1)), inputs: newNear}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not intended to be part of the final solution. But I found giving nodes a "name" to be useful. To verify, for instance, whether the goal node we eventually reached was a pregenerted IK solution or a live one fed midway.

@@ -0,0 +1,2033 @@
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to add some test somewhere. Will move this or whatever we land on to the armplanning/data directory if we keep this PR open.

"planner_options": {
"goal_metric_type": "squared_norm",
"arc_length_tolerance": 0,
"max_ik_solutions": 10,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing with this was useful for observing cbirrt

rrtMaps.goalMap[newGoal] = nil

// Readjust the target to give the new solution a chance to succeed.
target, err = mp.sample(newGoal, iterNum)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the "unexpected" part of adding live IK solutions to cbirrt. Without this step of re-assigning the target, I was never able to see a new solution succeed at getting picked.

But if we do this too often, we waste time not advancing existing solutions. Some of which are probably perfectly fine. Hence the iterNum%20 at the top of the conditional.

Mostly just need feedback on whether this general idea is acceptable (for now) @erh or if I should be doing something substantially different here.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 9, 2025
@dgottlieb
Copy link
Member Author

dgottlieb commented Nov 9, 2025

@erh Merged in main. Still needs cleanup (e.g: test file in top-level directory -- undo node name stuff). Also the example request I was using re: adding more IK soutions is no longer relevant:

2025-11-09T17:35:18.477Z	DEBUG	cmd-plan	armplanning/cBiRRT.go:127	iteration: 0 target: &{0xc11b8b1b90 [0.6363898089819203 1.0078702187027526 0.8074065607095622 -0.2444791853253294 -0.6363929723811589 -1.570807186733885]} target name: 12
2025-11-09T17:35:18.626Z	DEBUG	cmd-plan	armplanning/cBiRRT.go:184	CBiRRT found solution after 0 iterations in 149.682547ms
Solution node: 10 Live? false

Let me know if you have another example in mind I should run against/add as a test

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 10, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 11, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 11, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 12, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 12, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 13, 2025

// Number of IK solutions that should be generated before stopping.
defaultSolutionsToSeed = 100
defaultSolutionsToSeed = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just remove this option entirely?
i hate all of these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

type node struct {
name int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this int64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


func (sss *solutionSolvingState) computeGoodCost(goal referenceframe.FrameSystemPoses) ([]float64, float64, error) {
ratios, err := inputChangeRatio(sss.psc.motionChains, sss.psc.start, sss.psc.pc.fs,
ratios, err := inputChangeRatio(sss.psc.motionChains, sss.psc.start.Clone(), sss.psc.pc.fs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this? should only be transient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have highlighted this. Can see for yourself by checking out the parent commit 5e3793c311bb29a60b96a297664ddb8f8a63c840^ and running TestBadSpray1 with -race. It goes absolutely insane. It gets so bad the stacks get messed up (methods invoked from incorrect lines -- I imagine data race is stack walking while execution is happening).

Long story, I can just document for now. But if you'd like me to go down one of the alternative directions I mention, happy to make a code change too. Or pushing this so you can play with it while immediately starting on a code change.

Unfortunately understanding this is a bit annoying. The short description of the race:

  • Consider a case where we have two waypoints to shoot for:
  • The first waypoint is IK + cbirrt solved.
    • Because IK is now concurrent, we've told it to stop, but it hasn't quite drained yet [1]
  • Start solving the second "segment"
    • Which compute[s]GoodCost that gets the "same" schema
    • But getting a schema right now always recomputes [2]

So now, the IK reading of linear-inputs and cross referencing with its schema is racing with the next segment writing to the schema object.

This goes back to that one point in time where we had linear inputs, but also still had the linearized frame system. And I mentioned how the linearized frame system has a bit of a different lifetime. I didn't know how I wanted to marry the two. This data race error has given me a path to try.

[1] I first had this "stop, but don't wait" as a stop "and yes" wait. That also avoids the race. We wait on background IK before we start the next segment. The problem here is that, in some cases (the ~100-200 plan request waypoint examples), the 1-2ms wait (if I remember correctly) on each waypoint finish really added up. Specifically in the case where we already ended IK because of a great solution and bypassed cbirrt.

[2] Right now, the linearized FS stuff has an API lifecycle of:

  1. Create linearized from based on the frame names in the PlanRequest request.Configuration.
  2. Call GetSchema(framesystem) that iterate the framesystem and attaches frames for their specific limits.
    a. In addition to adding in all missing frames
    b. I expect these are always non-moving frames. Maybe a throwback to pose -> pose motion planning?
  3. Initialize IK that "computes good costs" and "input change ratios" using the limits we attached
  4. Run IK that just cares about calling fs.Transform(linearizedInputs) to get distance costs to feed back to nlopt.

Why I didn't reach straight for "cache the schema":
I don't believe we do this in our code, but a seemingly innocent usage of this "re-usable library data structure" in a single thread would be to:

  • Put("frame1", inputs1)
  • Do IK <-> GetSchema
  • Put("frame2", inputs2) // prior schema is invalidated
  • Do more IK <-> GetSchema

So instead of caching and leaving a sharp edge. Or having GetSchema set some linearizedInputs.readOnly = true bit. Or having .Put bump some logical timestamp that invalidates prior schema objects. I just went the clone route. I instead introduced cloning.

But what I think I want to try (ideally after this PR, but up to you) is to just remove limits information from the schema altogether. And have Jog (and other related methods) just pass in the frame system that should be consulted for specific limits. Rather than expecting LinearizedInputs to have been hydrated with that information.

}
}

return &node{name: int(nodeNameCounter.Add(1)), inputs: step, cost: sss.psc.pc.configurationDistanceFunc(stepArc)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're computing cost twice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops


// return bool is if we should stop because we're done.
func (sss *solutionSolvingState) process(ctx context.Context, stepSolution *ik.Solution,
) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed this api around a bit, do you like your version better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not tied to this breakdown. It's certainly necessary to have the part of process that accepts a solution without writing it to the internal array of solutions. Because for live IK, we need to shove the solution node onto a channel.

But I'm not sure if the processCorrectness and processSimilarity both need to exist? I'm not sure what I was thinking there. Something tells me that maybe stepArc was being used in processSimilarity in addition to some other API call/log line? Or maybe more likely is that, originally, I wasn't using processSimilarity for the live solutions, for simplicity of managing that slice that getSolutions returns. And then revisited that decision.

But it's certainly not the case anymore -- processCorrectness and processSimilarity are always called in pairs. Happy to condense these two "functional" processes into a single one.

Or definitely let me know if you were honing in on a different detail between your API and mine.

}
}

func (bgGen *backgroundGenerator) StopAndWait() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call .Stop() and then .Wait()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return
}

step := solvingState.toInputs(ctx, solution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the API change i had made, you can just call process()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I follow. The output here is pushing myNode onto a channel.

processs output (here and on main) is to add to solution state slice.

Additionally process will evaluate + optionally set node.checkPath which I don't think has an impact for live solutions.

@dgottlieb dgottlieb requested a review from erh November 17, 2025 21:29
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 17, 2025
@github-actions
Copy link
Contributor

Availability

Scene # viamrobotics:main dgottlieb:live-ik-solutions-to-cbirrt Percent Improvement Health
1 100% 100% 0%
2 100% 100% 0%
3 100% 100% 0%
4 100% 100% 0%
5 100% 100% 0%
6 100% 100% 0%
7 100% 60% -40%
8 100% 100% 0%
9 100% 100% 0%
10 100% 100% 0%

Quality

Scene # viamrobotics:main dgottlieb:live-ik-solutions-to-cbirrt Percent Improvement Probability of Improvement Health
1 1.31±0.00 1.31±0.00 -0% 50%
2 0.90±0.00 0.91±0.01 -1% 21%
3 6.65±0.60 6.40±1.42 4% 56%
4 3.23±0.41 3.61±1.52 -12% 40%
5 8.10±2.32 8.15±1.63 -1% 49%
6 8.58±2.93 10.86±3.36 -27% 30%
7 5.79±2.75 6.23±4.02 -8% 46%
8 0.90±0.00 0.91±0.01 -1% 21%
9 4.20±0.14 5.56±1.97 -32% 25%
10 12.84±0.41 12.84±0.41 -0% 50%

Performance

Scene # viamrobotics:main dgottlieb:live-ik-solutions-to-cbirrt Percent Improvement Probability of Improvement Health
1 0.02±0.00 0.03±0.01 -21% 28%
2 0.05±0.00 0.05±0.00 2% 65%
3 0.07±0.01 0.10±0.05 -41% 27%
4 1.29±0.08 0.56±0.12 57% 100%
5 1.79±0.48 1.59±0.39 11% 63%
6 1.91±0.68 2.38±1.29 -24% 37%
7 2.42±0.71 1.88±0.87 22% 69%
8 0.05±0.00 0.06±0.02 -32% 23%
9 2.21±0.15 1.96±1.48 11% 57%
10 6.43±1.10 6.30±1.01 2% 54%

The above data was generated by running scenes defined in the motion-testing repository
The SHA1 for viamrobotics:main is: b39abd91b2d7c568703e39f08cf352167db9a609
The SHA1 for dgottlieb:live-ik-solutions-to-cbirrt is: b39abd91b2d7c568703e39f08cf352167db9a609

  • 10 samples were taken for each scene

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants