Skip to content

Conversation

@shink
Copy link
Contributor

@shink shink commented Feb 24, 2025

@shink shink self-assigned this Feb 24, 2025
@shink
Copy link
Contributor Author

shink commented Feb 25, 2025

Something wrong when building torch_npu see: https://github.com/cosdt/pytorch-integration-tests/actions/runs/13497218910

Could you please provide some logs from the runner cluster? Thanks @cllouud

@shink
Copy link
Contributor Author

shink commented Feb 25, 2025

@shink
Copy link
Contributor Author

shink commented Mar 3, 2025

The root cause is a make failure due to lack of memory. multiprocessing.cpu_count() returns cpu count of the worker node, which is a big number (192 actually).

build_args = ['-j', str(multiprocessing.cpu_count())]

https://github.com/Ascend/pytorch/blob/0bdf8f94245753ad02731b84b788013f0ad7f5a7/setup.py#L366

@shink
Copy link
Contributor Author

shink commented Mar 3, 2025

We should allow users custom make parallelism, everything gonna be ok after this pr merged.

@shink
Copy link
Contributor Author

shink commented Mar 12, 2025

runs-on: ubuntu-latest:

multiprocessing.cpu_count() returns 4

image

@shink shink merged commit e4fcfc0 into main Mar 13, 2025
@shink shink deleted the ci/runner branch March 13, 2025 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants