Skip to content

Gigahorse-1.1.8-9d66b86: GPU address limit 1TB/40bit problem: instance of 'std::runtime_error, signal 6, swiotlb buffer is full, NVRM: Failed to create a DMA mapping! #28

@bladeuserpi

Description

@bladeuserpi

Hi,

on 1.5TB machine K34 plots do not work (GPU: Quadro M6000).
I believe this is due to GPU 40bit address limit.

Here it is mentioned:
https://learn.microsoft.com/en-us/windows-hardware/drivers/display/iommu-dma-remapping
" This page describes the IOMMU DMA remapping feature that was introduced in Windows 11 22H2 (WDDM 3.0).
...
Upcoming servers and high end workstations can be configured with over 1TB of memory which crosses the common 40-bit address space limitation of many GPUs."

So it seems while Windows 22H2 can handle it, in Linux it can be a problem (kernel 4.18.0-425.10.1.el8_7).

Also note increasing swiotlb delays the termination to the 2nd plot, but even the 1st plot
might be corrupt as there are ten thousand (!) of such (and other) messages:

    200 park_delta(): LP_1 < LP_0 (1875189930, 18446744073709551615) (x = 1348, y = 6770)
    201 park_delta(): LP_1 < LP_0 (2255150717, 18446744073709551615) (x = 1351, y = 6770)
    202 park_delta(): LP_1 < LP_0 (1891597597, 18446744073709551615) (x = 1353, y = 6770)
    203 park_delta(): LP_1 < LP_0 (1267797774, 1891597597) (x = 1354, y = 6770)
    204 park_delta(): LP_1 < LP_0 (2922005224, 3001459040) (x = 1356, y = 6770)
    205 park_delta(): LP_1 < LP_0 (30753450, 2922005224) (x = 1357, y = 6770)

Furthermore these messages also appear for K32 after some successful plots when running "-n -1",
so while that did not terminate it might also produce corrupt plots.

This workstation has a BIOS option "1TB Memory Cap":

If 1 TB of memory is installed, limits useable memory to 1TB-64MB for compatibility with graphics cards that
can`t address 1TB or more of memory".

I will try that next.

Logs

    46 Chia k34 next-gen CUDA plotter - 9d66b86
     47 Plot Format: v2.4
     48 Network Port: 11337 [MMX] (unique)
     49 No. GPUs: 1
     50 No. Streams: 4
     51 Final Destination: ./
     52 Shared Memory limit: unlimited
     53 Number of Plots: 5
     54 Initialization took 0.106 sec
     55 Crafting plot 1 out of 5 (2023/02/01 16:36:20)
     56 Process ID: 1993
     57 Pool Puzzle Hash:  xxx
     58 Farmer Public Key: xxx
     59 Working Directory:   ./
     60 Working Directory 2: @RAM
     61 Compression Level: C1 (xbits = 15, final table = 3)
     62 Plot Name: plot-mmx-k34-c1-2023-02-01-16-36-xxx
     63 [P1] Setup took 0.894 sec
     64 [P1] Table 1 took 77.857 sec, 17179869184 entries, 16789935 max, 17020 tmp, 0 GB/s up, 2.31198 GB/s down
     65 [P1] Table 2 took 143.196 sec, 17179636764 entries, 16790768 max, 17010 tmp, 1.00561 GB/s up, 1.81572 GB/s down
     66 [P1] Table 3 took 319.245 sec, 17178866356 entries, 16787901 max, 16960 tmp, 0.651528 GB/s up, 1.8168 GB/s down
     67 terminate called after throwing an instance of 'std::runtime_error'
     68   what():  OS call failed or operation not supported on this OS
     69 Command terminated by signal 6
     70 223.30user 287.34system 10:14.91elapsed 83%CPU (0avgtext+0avgdata 541219792maxresident)k

This can be seen with "dmesg -T" or /var/log/messages:

61238 Feb  1 17:45:35 m8 kernel: nvidia 0000:2d:00.0: swiotlb buffer is full (sz: 4194304 bytes), total 32768 (slots), used 0 (slots)
61239 Feb  1 17:45:35 m8 kernel: NVRM: 0000:2d:00.0: Failed to create a DMA mapping!

With some experiment I also got this failures mode:

Feb  1 22:43:28 m8 kernel: NVRM: GPU 0000:2d:00.0: RmInitAdapter failed! (0x25:0x65:1457)
Feb  1 22:43:28 m8 kernel: NVRM: GPU 0000:2d:00.0: rm_init_adapter failed, device minor number 0

Related documentation:
https://lenovopress.lenovo.com/lp1467.pdf
An Introduction to IOMMU Infrastructure in the Linux Kernel

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions