Skip to content

Failed to run resnet example #9

@Minamoto25

Description

@Minamoto25

I have built PhoenixOS in the container provided in the document, and then I want to test C/R of ResNet program in the same container, but there seems to be a problem with the API forwarding.
I first executed the following command in one terminal, and the outputs are as follow:

~# pos_cli --start --target daemon
 POS Log  >>>>>>>>>> PhOS Workspace <<<<<<<<<<
 _____  _                      _       ____   _____
|  __ \| |                    (_)     / __ \ / ____|
| |__) | |__   ___   ___ _ __  ___  _| |  | | (___
|  ___/| '_ \ / _ \ / _ \ '_ \| \ \/ / |  | |\___ \
| |    | | | | (_) |  __/ | | | |>  <| |__| |____) |
|_|    |_| |_|\___/ \___|_| |_|_/_/\_\\____/|_____/

 POS Log  PhoenixOS workspace created, welcome!
+00:00:00.420194 INFO:  waiting for RPC requests...
Cache Optimization: Enabled!
Async Optimization: Enabled!
Handler Optimization: Enabled!
xpu remote address: localhost
create shm buffer

And then executed the following command in another terminal:

~/examples/resnet# env $phos python3 train.py
+00:00:00.629521 ERROR: error registering fatbin: 32621 in cpu-client.c:478
+00:00:00.629555 ERROR: error registering function: 496557680   in cpu-client.c:437
+00:00:00.681239 INFO:  api-call-cnt: 0
+00:00:00.681268 INFO:  memcpy-cnt: 0
----client_total_infos----
api 1: count 1, client_total_time 65336.000000
api 2: count 1, client_total_time 34162.000000
free(): invalid pointer
Aborted (core dumped)

Then nvidia-smi outputs (executed in container):

~/examples/resnet# nvidia-smi
Sat Nov  9 14:00:19 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:B8:00.0 Off |                  Off |
| 30%   30C    P8              4W /  450W |     782MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:D8:00.0 Off |                  Off |
| 30%   32C    P8              9W /  450W |     782MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     83346      C   cricket-rpc-server                              0MiB |
|    1   N/A  N/A     83346      C   cricket-rpc-server                              0MiB |
+-----------------------------------------------------------------------------------------+

Some details about my environment (executed in container):

~/examples/resnet# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0



~/examples/resnet# python3
Python 3.8.10 (default, Sep 11 2024, 16:02:53)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.13.0+cu117'

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions