ClusterData crashes with large graphs representing point clouds #2610

QuanticDisaster · 2021-05-20T11:06:34Z

QuanticDisaster
May 20, 2021

Hello,

I have a problem with the ClusterData class which crashes each time I work with large datasets.
I am working on points clouds, and thought of using it to custer my data in order to have a reasonable computation time, but as soon as I work with large points clouds (here 2,6 million points), the kernel crashes everytime on the METIS partitionning unless I use a very low num_parts argument (less than 10).

In the paper on which the class is based : https://arxiv.org/pdf/1905.07953.pdf : the authors partition the Amazon3M dataset in 15000 partitions for instance, which is way more than what I am trying to do; hence my confusion.

Is there a problem with the way I'm using ClusterData ?

Thx in advance

QuanticDisaster · 2021-05-20T11:34:54Z

QuanticDisaster
May 20, 2021
Author

Nevermind, it seems the problem was coming from the features tensor of the graph (due to a wrong manipulation, I had x=[3] instead of x=[number of nodes, 3] in the Data object) causing the process to hang then crash

0 replies

QuanticDisaster · 2021-06-17T08:48:38Z

QuanticDisaster
Jun 17, 2021
Author

I am reopening this discussion as I am facing more or less the same problem now

I can indeed partition some large clouds points (with 6 000 000 nodes, 16 neighboors for the knn graph et more or les 16 features + XYZ in 2000 parts with a batch size of 2 or 10) however the metis partitionning is sometimes extremely inconsistent and can crash on points clouds smaller (800 000). This is a bit hard to reproduce as it doesnt seem to be systematic but the same point cloud with same parameters can crash 2 times and work the 3rd time.

What happens is the ClusterData shows

Data :  Data(edge_index=[2, 3562736], pos=[222671, 3], x=[222671, 16], y=[222671])
n_parts :  700
Computing METIS partitioning...

calculate for a bit then stops completely.

On jupyter it can be seen by the star showing it is running for some seconds then disappears. On Colab (on which it seems to crash the most to me), it displays that the kernel crashed. Spyder will just say it encountered a problem
Moreover it crashes the entire kernel, making me have to rerun it entirely

Is there any way to avoid this issue ? If not, is there any alternative ?

4 replies

rusty1s Jun 17, 2021
Maintainer

I actually never tested to run METIS on such a big graph, especially not in combination with knn graph construction. I think there are better alternatives to "cluster" a point cloud than using METIS. My guess is that if your points are evenly distributed, METIS might fail to find reasonable clusters.

An alternative might be to filter points within a fixed voxel for mini-batching.

QuanticDisaster Jun 22, 2021
Author

Indeed, the points clouds I am working can have homogeneous density. As I am only linking points with all their neighboors, the weights and edges don't have much sense for metis.

Ok, I will look into it. In this topic : #2547 you mention GraphSaint and NeighboorSampler
I am not sure of the differences between the three, are they likely to face the same problem as metis you think or are they adapted to my problem ?

rusty1s Jun 23, 2021
Maintainer

I think that creating mini-batches is not really a problem in contrast to generating mini-batches in arbitrary graphs (where you want to preserve a meaningful neighborhood). In general, creating mini-batches of point clouds is done in a pre-processing step, by clustering points in voxels of appropriate size. While PyG does not provide a helper function for that, you can create such splitting easily by yourself. For example, given that you want to divide your points in interval [-1, 1]^3 into 8 clusters :

data_list = []
for x_start, y_start, z_start in product(range(-1, 1), range(-1, 1), range(-1, 1)):
    x, y, z = data.pos[:, 0], data.pos[:, 1], data.pos[:, 2]
    mask = (x >= x_start) & (x < x_start + 1) & (y >= y_start) & (y < y_start + 1) & (z >= z_start) & (z < z_start + 1)
    data_list.append(Data(x=data.x[mask], y=data.y[mask], pos=data.pos[mask]))

rusty1s Jun 25, 2021
Maintainer

Did you delete your previous message? Feel free to create a new discussion based on your issue :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ClusterData crashes with large graphs representing point clouds #2610

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ClusterData crashes with large graphs representing point clouds #2610

Uh oh!

QuanticDisaster May 20, 2021

Replies: 2 comments · 4 replies

Uh oh!

QuanticDisaster May 20, 2021 Author

Uh oh!

QuanticDisaster Jun 17, 2021 Author

Uh oh!

rusty1s Jun 17, 2021 Maintainer

Uh oh!

QuanticDisaster Jun 22, 2021 Author

Uh oh!

rusty1s Jun 23, 2021 Maintainer

Uh oh!

rusty1s Jun 25, 2021 Maintainer

QuanticDisaster
May 20, 2021

Replies: 2 comments 4 replies

QuanticDisaster
May 20, 2021
Author

QuanticDisaster
Jun 17, 2021
Author

rusty1s Jun 17, 2021
Maintainer

QuanticDisaster Jun 22, 2021
Author

rusty1s Jun 23, 2021
Maintainer

rusty1s Jun 25, 2021
Maintainer