-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
Description
I have built PhoenixOS in the container provided in the document, and then I want to test C/R of ResNet program in the same container, but there seems to be a problem with the API forwarding.
I first executed the following command in one terminal, and the outputs are as follow:
~# pos_cli --start --target daemon
POS Log >>>>>>>>>> PhOS Workspace <<<<<<<<<<
_____ _ _ ____ _____
| __ \| | (_) / __ \ / ____|
| |__) | |__ ___ ___ _ __ ___ _| | | | (___
| ___/| '_ \ / _ \ / _ \ '_ \| \ \/ / | | |\___ \
| | | | | | (_) | __/ | | | |> <| |__| |____) |
|_| |_| |_|\___/ \___|_| |_|_/_/\_\\____/|_____/
POS Log PhoenixOS workspace created, welcome!
+00:00:00.420194 INFO: waiting for RPC requests...
Cache Optimization: Enabled!
Async Optimization: Enabled!
Handler Optimization: Enabled!
xpu remote address: localhost
create shm buffer
And then executed the following command in another terminal:
~/examples/resnet# env $phos python3 train.py
+00:00:00.629521 ERROR: error registering fatbin: 32621 in cpu-client.c:478
+00:00:00.629555 ERROR: error registering function: 496557680 in cpu-client.c:437
+00:00:00.681239 INFO: api-call-cnt: 0
+00:00:00.681268 INFO: memcpy-cnt: 0
----client_total_infos----
api 1: count 1, client_total_time 65336.000000
api 2: count 1, client_total_time 34162.000000
free(): invalid pointer
Aborted (core dumped)
Then nvidia-smi outputs (executed in container):
~/examples/resnet# nvidia-smi
Sat Nov 9 14:00:19 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:B8:00.0 Off | Off |
| 30% 30C P8 4W / 450W | 782MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:D8:00.0 Off | Off |
| 30% 32C P8 9W / 450W | 782MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 83346 C cricket-rpc-server 0MiB |
| 1 N/A N/A 83346 C cricket-rpc-server 0MiB |
+-----------------------------------------------------------------------------------------+
Some details about my environment (executed in container):
~/examples/resnet# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
~/examples/resnet# python3
Python 3.8.10 (default, Sep 11 2024, 16:02:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.13.0+cu117'
Reactions are currently unavailable