|
15 | 15 | Introduction
|
16 | 16 | ------------
|
17 | 17 | PyTorch 1.8 includes an updated profiler API capable of
|
18 |
| -recording the CPU side operations as well as the CUDA kernel launches on the GPU side. |
| 18 | +recording CPU-side operations as well as device-side kernel launches (for example CUDA or XPU), |
| 19 | +when supported by the platform and underlying tracing integrations. |
| 20 | +
|
19 | 21 | The profiler can visualize this information
|
20 | 22 | in TensorBoard Plugin and provide analysis of the performance bottlenecks.
|
21 | 23 |
|
|
76 | 78 | # Next, create Resnet model, loss function, and optimizer objects.
|
77 | 79 | # To run on GPU, move model and loss to GPU device.
|
78 | 80 |
|
79 |
| -device = torch.device("cuda:0") |
80 |
| -model = torchvision.models.resnet18(weights='IMAGENET1K_V1').cuda(device) |
81 |
| -criterion = torch.nn.CrossEntropyLoss().cuda(device) |
| 81 | +acc = torch.accelerator.current_accelerator() |
| 82 | +device = torch.device(f'{acc}:0') |
| 83 | +model = torchvision.models.resnet18(weights='IMAGENET1K_V1').to(device) |
| 84 | +criterion = torch.nn.CrossEntropyLoss().to(device) |
82 | 85 | optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
|
83 | 86 | model.train()
|
84 | 87 |
|
@@ -346,7 +349,7 @@ def train(data):
|
346 | 349 | # For example, "GPU0" means the following table only shows each operator's memory usage on GPU 0, not including CPU or other GPUs.
|
347 | 350 | #
|
348 | 351 | # The memory curve shows the trends of memory consumption. The "Allocated" curve shows the total memory that is actually
|
349 |
| -# in use, e.g., tensors. In PyTorch, caching mechanism is employed in CUDA allocator and some other allocators. The |
| 352 | +# in use, e.g., tensors. In PyTorch, caching mechanism is employed in the device allocator and some other allocators. The |
350 | 353 | # "Reserved" curve shows the total memory that is reserved by the allocator. You can left click and drag on the graph
|
351 | 354 | # to select events in the desired range:
|
352 | 355 | #
|
|
0 commit comments