Recursion with PostgreSQL Follow-Up 2 - CYCLE Detection and Processing Millions of Nodes #5
Replies: 1 comment 4 replies
-
I cannot understand some of your claims:
You function
The level=5 row has visited node_10 only once, the other row that gave id=node_10 is on a different path (directly from 8->10, not via 5->7). Your manual implementation seems to treat this differently: It's not excluding just cycles, but generally visiting each node only once (don't have the time to check why node_10/node_1 is still included twice when I run your manual function). For reference, your manually verified test set:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Yesterday, I wrote another article for my Recursion with PostgreSQL series: Recursion with PostgreSQL Follow-Up 1 - Performances .
My conclusion was that the CTE recursion method was severely lacking a way out of the infinite loops - and then I proceeded to write a version that defined a maximum recursion depth as a parameter:
And that seemed to me as a significant limitation on the part of PostgreSQL's recursive functionality, a sort of deal breaker in my book.
Then, I proceeded with a classic functional recursion (as opposed to this CTE recursion), and I tried to figure out how much it could be optimized.
Oh, much little did I know...
During the day, @xocolatl (Vik Fearing) left this comment that pointed out that PostgreSQL already has a feature called Cycle Detection which was exactly to solution for my problem.
And Vik was kind enough to even write a full working version for me (using the example from the first article).
What can I say? I always felt that three days of debugging and testing could save 30 minutes of reading the documentation. And I'm also amazed that there is always something new to learn in SQL, even after many decades of use.
Anyway, the CYCLE feature was exactly the right solution. Documentation:
The way it works syntax-wise is to add at the end of recursive CTE this:
CYCLE [field_to_check_for_cycle] SET is_cycle USING path
, wherepath
is a newly generated field containing a path to check, but you may call it or use it as you wish.The
is_cycle
is also a new field with a fixed name of the boolean type that will tell us if we are in a cycle or not. And all we have to do at the end is to check if we are in the cycle withwhere not is_cycle
, and that is it.So, given that example, Vik provided the function
select_nodes_recursive_cte
, will now look like this:New Problem
After I tested this version on my standard test fixture:
I expect to see the following manually checked results:
However, that is NOT what the new function with CYCLE returns. This version adds another (unexpected) recursion level (level 5 with
node_10
):The node
node_10
-node_1
has already appeared in level 3 and, therefore, in my opinion - shouldn't be included in the results.After some investigation, my best assumption is that the PostgreSQL algorithm, when it decides that the row is a cycle row - includes that row also into the results and then exists cycle.
And again, the only way to exclude this field that I've found is to use the MIN aggregate in the final query:
After this small tweak - the results from both functions
select_nodes_recursive_cte
(recursive CTE with CYCLE) andselect_nodes_recursive_function
(my optimized classic function recursion version) - are identical.The next step is to do some serious performance testing.
However, I'm still not completely sure about this last node: Maybe it would be a better idea to leave
select_nodes_recursive_cte
as-is (without MIN aggregate) and to modifyselect_nodes_recursive_function
to behave identically as CTE with CYCLE version. That means that before exiting recursion, we must include that last node.So the final version
select_nodes_recursive_function
looks like this:As we can see, before we exit the recursion with
return
- we insert the current nodes with the current level.Both functions are returning identical results now, and we are finally ready for serious performance testing.
Performance Testing
select_nodes_recursive_function
select_nodes_recursive_cte
ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3ERROR
3big_tree_test
table. That is the number of tree connections.big_tree_test
table. That is the number of unique tree nodes.ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp226.8386": No space left on device
. I have 400GB free on my laptop. In later tests, when I ran this function on a larger, the connection was being reset. So, something very strange is going on with recursive CTE execution.Optimized function
select_nodes_recursive_function
starts to seriously degrade in performance just after a couple of million tree connections and after a million unique tree nodes.For the last performance tests, I tried to create an index on the tree table:
Results are approximately the same:
Most likely, indexes were never used.
Perhaps someone has a better idea?
Conclusion
Makeup what you will, it is what it is.
I still hold the position that recursive CTE queries are confusing, at least to me.
Anyhow, if someone is using them and starts to experience performance issues, maybe there is a better way?
This approach I demonstrated in this series may be the solution for processing a huge number of tree nodes.
Or maybe not.
Anyhow, I said last time that I'd focus on testing and debugging these functions now, I guess it will have to wait for another article in this series.
UPDATE 1:
I was suggested to use UNION ALL instead of just UNION in the CTE version. Unfortunately, no difference. On a small dataset, it will return identical results, it will be slow on a medium dataset, and it will crash on a big dataset.
I still don't know what to think of it.
Either the PostgreSQL CTE implementation is bad or ... I really don't know.
Next, I'll try to play around with that CTE version execution plan to try to figure it out.
UPDATE 2
Here is the execution plan for the CTE version (with UNION ALL, which should be better because it doesn't look for duplicates):
This
-> Recursive Union (cost=0.00..73.68 rows=645 width=76) (actual time=0.012..65812.549 rows=18633629 loops=1)
, this has some crazy number of rows and execution time.The
big_tree_test
has only 127 records with 49 unique id's.I'm baffled.
Beta Was this translation helpful? Give feedback.
All reactions