[Small Feature] Arrangement Caching Polyline Traits#5594
[Small Feature] Arrangement Caching Polyline Traits#5594
Conversation
|
@efifogel this is just a draft of small feature: before going further (documenting and everything), I'd like to have your input. This PR is based on #5094 as it follows the same logic of using X-monotone polylines to make boolean operations faster. New traitsThe current implementation of
Profiling shows that this copies are very costly, especially when dealing with large polylines. The principle of this PR is to introduce an alternative traits called In practise, this implementation makes boolean operations run up to 2.8x faster on large data sets, and never seems to be counter-productive. Of course, compared to
For these reasons, I don't think it should replace the existing I'm not quite sure about the name: I used There might also be room for improvement in terms of implementation: I tried to avoid duplicating code as much as possible by making Use in free functionsConsidering it always performs better than I did not update the Insertion of PWH in GPSOne unrelated change but which I still added with this branch is a change in the function By slightly modifying This makes the insertion of the PWH about 10x faster. This also gives me the intuition that the Surface Sweep in itself has a very big overhead… For a reason I'm not sure I understand, the polygon validation algorithm fails with this new StatusThe feature is developed and functional. All tests of the BSO2 testsuite compile and run without error (except for the So far it's not documented. Please let me know what's your opinion on this, if you see anything to change (naming, organisation, API, etc.) and how we can integrate it. |
|
Here are some comments after a quick review.
First, I assume that using Handles, which in turn use reference counting
for points and curves, is still inefficient, because we copy many handles;
do you agree?
The idea behind this traits is to use something similar to the concept of
Handle, but this time applied to entire subcurves.
1. The second limitation is severe, but at least the range retained in
Leightweight_polyline_2 is a const reference, so it cannot be changed after
the curve has been constructed, implying that it cannot be changed after
the curve has been inserted into the arrangement, for example, which is
good. Is there any advantage using a shared pointer for the range (and not
just a pointer---line 52)
2. The implementation of Lightweight_polyline_2 should not construct and
use kernel objects.
Statements like the one below should be eliminated
auto compare_x_2 = Kernel().compare_x_2_object();
All kernel based operations should move to the traits itself.
3. In the example, why not insert the non-x-monotone curve directly?
4. Line 183 of Lightweight_polyline_2.h: idx is not used.
5. I think that push back() in a loop for a vector can be replaced with
something more efficient and elegant, such as an initializer list;
see
a. Lines 606, 666, 1127, 1143 of Arr_polycurve_traits_2.h
b. Line 475 of Arr_lightweight_polyline_traits_2.h
6. The test of traits should also test the new traits
7. I think that the word lightweight gives the wrong impression. (I guess
you feel the same).
We do have Arr_segment_traits_2 and Arr_non_caching_segment_traits_2.
How about Arr_caching_polyline_traits_2 ?
8. Where and how is the line of an Extreme_point (the second element) used?
I have a small problem with Line 289:
l = std::make_shared<Line_2>(a, b);
because there is a hidden use of a kernel object, but I need to understand
better other things.
____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
…On Thu, 8 Apr 2021 at 15:40, Simon Giraudot ***@***.***> wrote:
@efifogel <https://github.com/efifogel> this is just a draft of small
feature: before going further (documenting and everything), I'd like to
have your input.
This PR is based on #5094 <#5094> as it
follows the same logic of using X-monotone polylines to make boolean
operations faster.
New traits
The current implementation of Arr_polyline_traits_2 copies chunks of
points in most major steps:
- when making a curve X-monotone, all input points are copied into new
containers
- when splitting a polyline, again, all input points are copied into
the new created polylines
Profiling shows that this copies are very costly, especially when dealing
with large polylines. The principle of this PR is to introduce an
alternative traits called Arr_lightweight_polyline_traits_2 that relies
on pairs of iterators to perform shallow copies of chunks of points (with
the addition of hanging extreme points that might not belong to the
original range of points: points created by intersecting 2 polylines for
example).
In practise, this implementation makes boolean operations run up to 2.8x
faster on large data sets, and never seems to be counter-productive.
Of course, compared to Arr_polyline_traits_2, this traits has two
limitations:
- the traits need to be templated by the range, which means the type
of the traits will be different depending on whether your input is a
CGAL::Polygon_2, a std::list<Point_2>, etc.
- the range needs to exist in memory *outside* of the traits: you
cannot generate curves "on-the-fly", you always need a valid reference to
the support (like a CGAL::Polygon_2)
For these reasons, I don't think it should replace the existing
Arr_polyline_traits_2 but just be an alternative traits.
I'm not quite sure about the name: I used
Arr_lightweight_polyline_traits_2 as my main objective was indeed to make
a traits that manipulates "lightweight" objects (pairs of iterators instead
of deep copies), but you may have a better idea.
There might also be room for improvement in terms of implementation: I
tried to avoid duplicating code as much as possible by making
Arr_lightweight_polyline_traits_2 inherit from
Arr_polycurve_basic_traits_2, but it may be possible to make it better
than what I did. I had to slightly modify Arr_polycurve_basic_traits_2 to
make the polyline type template (the core of my implementation of
lightweight polylines is in Arr_geometry_traits/Lightweight_polyline.h).
Use in free functions
Considering it always performs better than Arr_polyline_traits_2 in my
experiments, I also modified the behavior introduced by #5094
<#5094> to use
Arr_lightweight_polyline_traits_2 by default. The limitations are not a
problem here, as we always have the same range type (CGAL::Polygon_2 –
even for PWH for which we have one CGAL::Polygon_2 per outer boundary +
holes) and the polygons do exist in memory outside of the boolean
operations.
I did not update the test_general_polygon_constructions test that thus
fails (as it uses the original traits). It will require quite a lot of
changes as the API is quite different (and because of the fixed type of the
traits which means there will be a different type for different cases).
Insertion of PWH in GPS
One unrelated change but which I still added with this branch is a change
in the function General_polygon_set_2::insert(const Polygon_with_holes&):
I noticed that this function was significantly slower than the function to
insert a simple CGAL::Polygon_2. It seems that this comes from the fact
insert(PWH) uses a sweep while insert(Polygon_2) just does one "locate"
and directly updated the arrangement.
By slightly modifying insert(Polygon_2), we can use it to insert a hole
(instead of marking the new face creates as in, we mark it as out if we
insert a hole), and then just do insert(Polygon_2) with each hole of the
PWH.
This makes the insertion of the PWH about 10x faster. This also gives me
the intuition that the Surface Sweep in itself has a very big overhead…
For a reason I'm not sure I understand, the polygon validation algorithm
fails with this new insert(PWH) version, so I added a boolean parameter
to still make is possible to use the sweep.
Status
The feature is developed and functional.
All tests of the BSO2 testsuite compile and run without error (except for
the test_general_polygon_constructions test I was mentioning earlier).
So far it's not documented.
Please let me know what's your opinion on this, if you see anything to
change (naming, organisation, API, etc.) and how we can integrate it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5594 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVBNOG4DS33JUCDVF6N3NDTHWP3TANCNFSM42SZ33UQ>
.
|
Indeed. Copying a range of I must also add that CGAL Handles are not that lightweight: I did a test copying a large vector of EPECK points, and comparing that with copying pointers and copying reference wrappers. The results:
Users have the ownership on the range, so they are responsible for the memory management of the range. Ideally, this pointer in particular would be a const ref, but it would make the object not default constructible (among other limitations), so using a pointer is easier.
Sounds good! I'll use that.
Exactly the same as a line of an input point: this is similar to the line cache mechanism of the The extreme points are only there to handle the fact that some points cannot come from the input range (points created by intersection), but that is an implementation detail of the lightweight polycurve: from an external point of view, there's no difference between an input point and an extreme points (both also embed the cached line support of the following segment). Let's imagine you have a polyline only composed of input points: At some point, this polyline is split between (Thanks also for the other remarks I did not reply to, I'll integrate the changes.) |
|
Now, a kernel is not constructed locally, but every curve stores an
additional pointer.
I was hoping that the use of the kernel could be eliminated.
The Caching_polyline_2 is just a space holder; that is, all geometric
operations should be conducted by the traits.
I see that m_traits (the kernel) is used in the following places:
1. is_directed_right()
2. is_vertical()
3. is_x_monotone()
4. compute_direction()
5. line()
(1) it is used in an assertion, so it can be removed.
(2, 3) can be replaced with flags (just like m_is_directed_right
(4) used in constructors, so it can be passed in as a parameter computed by
the traits.
BW, constructors should be made private and friends of the traits (I think).
Users should only use the traits functors to construct polylines (as done
in the example).
(5) This is the culprit.
Consider the code that starts at Line 288:
line(const Index& index, const Point_2& a, const Point_2& b)
If I understand it right, the construction of the (underlying) line is
deferred to the time it is actually needed, similarly to the way segments
are treated in Arr_segment_traits_2.
There, BW, I did not raise the issue of locally constructing kernel
objects, because it was spread all over.
I would resolve (1-4) as described above.
Regarding (5) I see 3 options:
1. Construct a kernel locally
2. Save a reference to a kernel (as currently done)
3. Construct a line directly without a kernel: Line_2 l(p, q);
Do you see other options?
If not, I would go for (2) but instead of keeping a reference to the kernel
I would keep a reference to the Construct_line_2 object of the kernel (to
prevent abusing the kernel further).
Have you looked into the tests?
I have a request: please remove the spaces before '(' in functions calls or
function definitions (keep then though after if, while, for, etc.). We have
it without a space elsewhere.
____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
…On Tue, 13 Apr 2021 at 13:57, Simon Giraudot ***@***.***> wrote:
First, I assume that using Handles, which in turn use reference counting
for points and curves, is still inefficient, because we copy many handles;
do you agree?
Indeed. Copying a range of n objects, however lightweight they are, is
O(n), while copying a pair of iterators is O(1).
I must also add that CGAL Handles are not that lightweight: I did a test
copying a large vector of EPECK points
<https://gist.github.com/sgiraudot/8b4d35b5859e7a5b290c22c4221b9a77>, and
comparing that with copying pointers and copying reference wrappers. The
results:
Copies = 2.63811
Pointers = 0.945099 (2.79136x faster)
Refs = 0.997299 (2.64525% faster)
Is there any advantage using a shared pointer for the range (and not
just a pointer---line 52)
Users have the ownership on the range, so they are responsible for the
memory management of the range. Ideally, this pointer in particular would
be a const ref, but it would make the object not default constructible
(among other limitations), so using a pointer is easier.
We do have Arr_segment_traits_2 and Arr_non_caching_segment_traits_2.
How about Arr_caching_polyline_traits_2 ?
Sounds good! I'll use that.
1. Where and how is the line of an Extreme_point (the second element)
used?
Exactly the same as a line of an input point: this is similar to the line
cache mechanism of the Arr_segment_traits_2.
The extreme points are only there to handle the fact that some points
cannot come from the input range (points created by intersection), but that
is an implementation detail of the lightweight polycurve: from an external
point of view, there's no difference between an input point and an extreme
points (both also embed the cached line support of the following segment).
Let's imagine you have a polyline only composed of input points:
Polyline A
p0 l0 p1 l1 p2 l2 p3 l3 p4 (l4 not used)
+------------+------------+------------+------------+
At some point, this polyline is split between p1 and p2, an extreme point
pE (shared) is created:
Polyline B Polyline C
p0 l0 p1 l1 pE pE lE p2 l2 p3 l3 p4 (l4 not used)
+------------+-------+ +------+------------+------------+
(Thanks also for the other remarks I did not reply to, I'll integrate the
changes.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5594 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVBNOE4MJXDFDNIGWYFHDLTIQPQVANCNFSM42SZ33UQ>
.
|
|
ok, there is a better and preferred option and it's trivial.
The member function line() of the polyline curve is called only from the
subcurve traits, so the code that computes the line in case it hasn't been
computed yet, should move to the traits and then set in the polyline curve
record by the traits.
One needs to introduce set_line(... line) in the polyline curve record to
enable the setting.
____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
…On Mon, 19 Apr 2021 at 15:05, Efi Fogel ***@***.***> wrote:
Now, a kernel is not constructed locally, but every curve stores an
additional pointer.
I was hoping that the use of the kernel could be eliminated.
The Caching_polyline_2 is just a space holder; that is, all geometric
operations should be conducted by the traits.
I see that m_traits (the kernel) is used in the following places:
1. is_directed_right()
2. is_vertical()
3. is_x_monotone()
4. compute_direction()
5. line()
(1) it is used in an assertion, so it can be removed.
(2, 3) can be replaced with flags (just like m_is_directed_right
(4) used in constructors, so it can be passed in as a parameter computed
by the traits.
BW, constructors should be made private and friends of the traits (I
think).
Users should only use the traits functors to construct polylines (as done
in the example).
(5) This is the culprit.
Consider the code that starts at Line 288:
line(const Index& index, const Point_2& a, const Point_2& b)
If I understand it right, the construction of the (underlying) line is
deferred to the time it is actually needed, similarly to the way segments
are treated in Arr_segment_traits_2.
There, BW, I did not raise the issue of locally constructing kernel
objects, because it was spread all over.
I would resolve (1-4) as described above.
Regarding (5) I see 3 options:
1. Construct a kernel locally
2. Save a reference to a kernel (as currently done)
3. Construct a line directly without a kernel: Line_2 l(p, q);
Do you see other options?
If not, I would go for (2) but instead of keeping a reference to the
kernel I would keep a reference to the Construct_line_2 object of the
kernel (to prevent abusing the kernel further).
Have you looked into the tests?
I have a request: please remove the spaces before '(' in functions calls
or function definitions (keep then though after if, while, for, etc.). We
have it without a space elsewhere.
____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
On Tue, 13 Apr 2021 at 13:57, Simon Giraudot ***@***.***>
wrote:
> First, I assume that using Handles, which in turn use reference counting
> for points and curves, is still inefficient, because we copy many handles;
> do you agree?
>
> Indeed. Copying a range of n objects, however lightweight they are, is
> O(n), while copying a pair of iterators is O(1).
>
> I must also add that CGAL Handles are not that lightweight: I did a test
> copying a large vector of EPECK points
> <https://gist.github.com/sgiraudot/8b4d35b5859e7a5b290c22c4221b9a77>,
> and comparing that with copying pointers and copying reference wrappers.
> The results:
>
> Copies = 2.63811
> Pointers = 0.945099 (2.79136x faster)
> Refs = 0.997299 (2.64525% faster)
>
> Is there any advantage using a shared pointer for the range (and not
> just a pointer---line 52)
>
> Users have the ownership on the range, so they are responsible for the
> memory management of the range. Ideally, this pointer in particular would
> be a const ref, but it would make the object not default constructible
> (among other limitations), so using a pointer is easier.
>
> We do have Arr_segment_traits_2 and Arr_non_caching_segment_traits_2.
> How about Arr_caching_polyline_traits_2 ?
>
> Sounds good! I'll use that.
>
>
> 1. Where and how is the line of an Extreme_point (the second element)
> used?
>
> Exactly the same as a line of an input point: this is similar to the line
> cache mechanism of the Arr_segment_traits_2.
>
> The extreme points are only there to handle the fact that some points
> cannot come from the input range (points created by intersection), but that
> is an implementation detail of the lightweight polycurve: from an external
> point of view, there's no difference between an input point and an extreme
> points (both also embed the cached line support of the following segment).
>
> Let's imagine you have a polyline only composed of input points:
>
> Polyline A
>
> p0 l0 p1 l1 p2 l2 p3 l3 p4 (l4 not used)
> +------------+------------+------------+------------+
>
> At some point, this polyline is split between p1 and p2, an extreme
> point pE (shared) is created:
>
> Polyline B Polyline C
>
> p0 l0 p1 l1 pE pE lE p2 l2 p3 l3 p4 (l4 not used)
> +------------+-------+ +------+------------+------------+
>
> (Thanks also for the other remarks I did not reply to, I'll integrate the
> changes.)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#5594 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABVBNOE4MJXDFDNIGWYFHDLTIQPQVANCNFSM42SZ33UQ>
> .
>
|
STL_Extension/include/CGAL/Single.h
Outdated
| public: | ||
|
|
||
| typedef T* iterator; | ||
| typedef T* const_iterator; |
There was a problem hiding this comment.
missing const? + non-const begin/end
|
The documentation is built. It will be available, after a few minutes, here : https://cgal.github.io/5594/v1/Manual/index.html |
|
I've fixed the test infrastructure.
Now, we have some real failures.
So I pass the ball to you, Simon.
For example, the split test fails because the precondition in line 175 of
Arr_caching_polyline_traits_2.h does not catch a violation (the split point
does not lie on the curve).
____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
…On Tue, 4 May 2021 at 10:00, github-actions[bot] ***@***.***> wrote:
The documentation is built. It will be available, after a few minutes,
here : https://cgal.github.io/5594/v1/Manual/index.html
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5594 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVBNOBQIUYPVSWO3IWUDTDTL6LPNANCNFSM42SZ33UQ>
.
|
|
Alright, I managed to fix some tests: Now there are still tests that fail, but I think here we reach the limits of this caching mechanism:
In that case, tests would expect 1 output curve when computing the intersection, but this is impossible with caching traits as it would mean this curve would have to alternate between two separate ranges. Which is why I output segments one by one using extreme points only: the output is correct (in the sense that it does return overlapping X-monotone curves) but suboptimal. This should trigger any problem in practise, but tests that expect a specific unique polyline can never pass.
|
|
Merge is used by the demo. (Naturally, the demo does not use the
caching-polyline traits.)
Consider the following scenario:
Add a curve c1.
Add another c2 the intersect c1 in its interior.
Now remove the curve(s) originated from c2.
c1 is still split, so you can merge it.
1. The way you suggest to implement Are_merrgeable would suffice in this
case.
It is not mandatory though, and I see that this functor is not tested by
the test_traits test for caching polylines anyway.
2. The Split functor, I think, is failing to detect the case where the
split point does not lie on the curve.
The call on line 171 returns 0 instead of Traits::INVALID_INDEX.
std::size_t i = m_traits.locate(xcv, p);
3. I understand the issue with make_x_monotone and merge.
Simply suppress these two tests: Add the tokens INTERSECT and MERGE to line
1033 in the file
Arrangement_on_surface_2/test/Arrangement_on_surface_2/cgal_test.cmake
…____ _ ____ _
/_____/_) o /__________ __ //
(____ ( ( ( (_/ (_/-(-'_(/
_/
On Tue, 11 May 2021 at 16:32, Simon Giraudot ***@***.***> wrote:
Alright, I managed to fix some tests: Split_2 was not well implemented as
I never tested the cases where the split point was an existing point of the
curve, it should be fixed now (the split test passes). Also, the default
value for duplicate_first was set to true, which indeed led to problems
as all tested curves were wrong compared to the expected values of the
testsuite.
Now there are still tests that fail, but I think here we reach the limits
of this caching mechanism:
- overlaps are handled in a suboptimal way, by outputting segments one
by one using only extreme points: the reason is that overlaps cannot be
handled correctly with caching traits. Picture for example 2 overlapping
polylines for which the vertices along the overlap alternate:
CV1 = x--------x--------------x----x------------------x-----x
CV2 = x---x-----------x----------------------x--------------x
OUT = x---x----x------x-------x----x---------x--------x-----x
In that case, tests would expect 1 output curve when computing the
intersection, but this is impossible with caching traits as it would mean
this curve would have to alternate between two separate ranges. Which is
why I output segments one by one using extreme points only: the output is
correct (in the sense that it does return overlapping X-monotone curves)
but suboptimal. This should trigger any problem in practise, but tests that
expect a specific unique polyline can never pass.
- the merge test does not pass either, for a similar reason. I did not
implement Are_mergeable_2 and Merge_2: in the case of caching
polylines, 2 polylines would only be mergeable if they relied on the same
range and share a common indexed point. But in that case, there's no reason
why these polylines would be separated in the first place… Two polylines
relying on separate range can never be merged even if they share a common
geometric point (which is what is expected, from what I understand). I
don't know in what cases these functors are used, but it seems to me
caching traits simply cannot support them.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5594 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVBNOFWJWASD5UE45VUJKTTNEWVLANCNFSM42SZ33UQ>
.
|
|
Alright, I tried to do a partial implementation of merge / are mergeable, but it appears it's not sufficient for the tests in the testsuite (there are cases were curves not from the same range are tested for merging), so I did as you suggest and disabled the tests. I also fixed the missing precondition in The |
|
right. I think you can suprees the "assertion" test as well.
…On Mon, May 17, 2021, 13:05 Simon Giraudot ***@***.***> wrote:
Alright, I tried to do a partial implementation of merge / are mergeable,
but it appears it's not sufficient for the tests in the testsuite (there
are cases were curves not from the same range are tested for merging), so I
did as you suggest and disabled the tests.
I also fixed the missing precondition in Split_2, so the split tests in
the assertions test pass.
The assertions tests still fails on other parts: from what I understand,
it expects a precondition violation on merge tests but get an assertion
violation. But I guess the merge test should not be run at all?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5594 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVBNOAL4DSZBAEGE2HBJVDTODS7DANCNFSM42SZ33UQ>
.
|
Rationale
The current implementation of
Arr_polyline_traits_2copies chunks of points in most major steps:Profiling shows that this copies are very costly, especially when dealing with large polylines. The principle of this PR is to introduce an alternative traits called
Arr_caching_polyline_traits_2that relies on pairs of iterators to perform shallow copies of chunks of points (with the addition of hanging extreme points that might not belong to the original range of points: points created by intersecting 2 polylines for example).In practise, this implementation makes boolean operations run up to 2.8x faster on large data sets, and never seems to be counter-productive.
Summary of API changes
Arr_caching_polyline_traits_2is providedLicense and copyright ownership
(no change)
CHANGES.md
TODO
Submission
Status