Skip to content

Commit 1e24ed4

Browse files
garloffmbuechse
andauthored
Feat/add gpu vram (#780)
* Add GPU table and VRAM into specification. To be done: Adjust flavor name parser, pretty printer, generator. * Version 3.2, adjust examples. * Fix empty line and extra space. * Minor addition of information for AMD and intel. * Typo. * Note about 1/7 uncertainties. Nvidia spelling. Also mention older generations ... * Appease markdownlint. * More appeasement for markdownlint. One true fix (broken link) And tweak the double-space test for tolerating two spaces after a | in a table. * One more fix against double spaces. * add vram and vramperf to GPU (retrofitting v1 and v2) * bugfix: use correct variable * appease flake8 * GPU VRAM always comes with CU spec: Use "and" wording. Just for a bit better readability. * More specific meaning of GPU h modifiers. h on the SMs/CUs/EUs: High frequency h on the VRAM: High bandwidth Again, this is really only to differentiate if a vendor has several otherwise similar models that have a material difference in frequencyor bandwidth, such as e.g. a GDDR6 vs an HBM2e veriant ... or a low-power, low-frequency variant. Signed-off-by: Kurt Garloff <[email protected]> Signed-off-by: Matthias Büchse <[email protected]> Co-authored-by: Matthias Büchse <[email protected]>
1 parent ec6b4c2 commit 1e24ed4

File tree

5 files changed

+155
-28
lines changed

5 files changed

+155
-28
lines changed

.markdownlint-cli2.jsonc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,10 @@
4343
{
4444
"name": "double-spaces",
4545
"message": "Avoid double spaces",
46-
"searchPattern": "/([^\\s>]) ([^\\s|])/g",
46+
"searchPattern": "/([^\\s>|]) ([^\\s|])/g",
4747
"replace": "$1 $2",
48-
"skipCode": true
48+
"skipCode": true,
49+
"tables": false
4950
}
5051
]
5152
}

Standards/scs-0100-v3-flavor-naming.md

Lines changed: 35 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ description: |
1414

1515
## Introduction
1616

17-
This is the standard v3.1 for SCS Release 5.
17+
This is the standard v3.2 for SCS Release 8.
1818
Note that we intend to only extend it (so it's always backwards compatible),
1919
but try to avoid changing in incompatible ways.
2020
(See at the end for the v1 to v2 transition where we have not met that
@@ -417,15 +417,17 @@ is more significant.
417417

418418
### [OPTIONAL] GPU support
419419

420-
Format: `_`\[`G/g`\]X\[N\]\[`-`M\]\[`h`\]
420+
Format: `_`\[`G/g`\]X\[N\[`-`M\[`h`\]\[`-`V\[`h`\]\]\]\]
421421

422422
This extension provides more details on the specific GPU:
423423

424424
- pass-through (`G`) vs. virtual GPU (`g`)
425425
- vendor (X)
426426
- generation (N)
427427
- number (M) of processing units that are exposed (for pass-through) or assigned; see table below for vendor-specific terminology
428-
- high-performance indicator (`h`)
428+
- high-frequency indicator (`h`) for compute units
429+
- amount of video memory (V) in GiB
430+
- an indicator for high-bandwidth memory
429431

430432
Note that the vendor letter X is mandatory, generation and processing units are optional.
431433

@@ -440,13 +442,29 @@ for AMD GCN-x=0.x, RDNA1=1, C/RDNA2=2, C/RDNA3=3, C/RDNA3.5=3.5, C/RDNA4=4, ...
440442
for Intel Gen9=0.9, Xe(12.1/DG1)=1, Xe(12.2)=2, Arc(12.7/DG2)=3 ...
441443
(Note: This may need further work to properly reflect what's out there.)
442444

443-
The optional `h` suffix to the compute unit count indicates high-performance (e.g. high freq or special
444-
high bandwidth gfx memory such as HBM);
445-
`h` can be duplicated for even higher performance.
445+
The optional `h` suffix to the compute unit count indicates high-frequency GPU compute units.
446+
It is not normally recommended to use it except if there are several variants of cards within
447+
a generation of GPUs and with similar number of SMs/CUs/EUs.
448+
In case there are even more than two variants, the letter `h` can be duplicated for even
449+
higher frquencies.
446450

447-
Example: `SCS-16V-64-500s_GNa-14h`
448-
This flavor has a pass-through GPU nVidia Ampere with 14 SMs and either high-bandwidth memory or specially high frequencies.
449-
Looking through GPU specs you could guess it's 1/4 of an A30.
451+
Please note that there are GPUs from one generation and vendor that have vastly different sizes
452+
(or different fractions are being passed to an instance with multi-instance-GPUs). The number
453+
M allows to differentiate between them and have an indicator of the compute capability and
454+
parallelism. M can not at all be compared between different generations let alone different
455+
vendors.
456+
457+
The amount of video memory dedicated to the instance can be indicated by V (in binary
458+
Gigabytes). This number needs to be an integer - fractional memory sizes must be rounded
459+
down. An optional `h` can be used to indicate high bandwidth memory (such as HBM2+) with
460+
bandwidths well above 1GiB/s.
461+
462+
Example: `SCS-16V-64-500s_GNa-14-6h`
463+
This flavor has a pass-through GPU nVidia Ampere with 14 SMs and 6 GiB of high-bandwidth video
464+
memory. Looking through GPU specs you could guess it's 1/4 of an A30.
465+
466+
We have a table with common GPUs in the
467+
[implementation hints for this standard](scs-0100-w1-flavor-naming-implementation-testing.md)
450468

451469
### [OPTIONAL] Infiniband
452470

@@ -490,14 +508,14 @@ an image is considered broken by the SCS team.
490508

491509
## Proposal Examples
492510

493-
| Example | Decoding |
494-
| ------------------------- | ---------------------------------------------------------------------------------------------- |
495-
| SCS-2C-4-10n | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk |
496-
| SCS-8Ti-32-50p_i1 | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe |
497-
| SCS-1L-1u-5 | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) |
498-
| SCS-16T-64-200s_GNa-64_ib | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Infiniband, 64 Passthrough nVidia Ampere SMs |
499-
| SCS-4C-16-2x200p_a1 | 4 dedicated Arm64 cores (A76 class), 16GiB RAM, 2x200GB local NVMe drives |
500-
| SCS-1V-0.5 | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) |
511+
| Example | Decoding |
512+
| ------------------------------ | ---------------------------------------------------------------------------------------------- |
513+
| `SCS-2C-4-10n` | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk |
514+
| `SCS-8Ti-32-50p_i1` | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe |
515+
| `SCS-1L-1u-5` | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) |
516+
| `SCS-16T-64-200s_GNa-72-24_ib` | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Infiniband, 72 Passthrough nVidia Ampere SMs |
517+
| `SCS-4C-16-2x200p_a1` | 4 dedicated Arm64 cores (A76 class), 16GiB RAM, 2x200GB local NVMe drives |
518+
| `SCS-1V-0.5` | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) |
501519

502520
## Previous standard versions
503521

Standards/scs-0100-w1-flavor-naming-implementation-testing.md

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ See the [README](https://github.com/SovereignCloudStack/standards/tree/main/Test
3232
for more details.
3333

3434
The functionality of this script is also (partially) exposed via the web page
35-
[https://flavors.scs.community/](https://flavors.scs.community/).
35+
[https://flavors.scs.community/](https://flavors.scs.community/), which can both
36+
parse SCS flavors names as well as generate them.
3637

3738
With the OpenStack tooling (`python3-openstackclient`, `OS_CLOUD`) in place, you can call
3839
`cli.py -v parse v3 $(openstack flavor list -f value -c Name)` to get a report
@@ -45,6 +46,107 @@ will create a whole set of flavors in one go.
4546
To that end, it provides different options: either the standard mandatory and
4647
possibly recommended flavors can be created, or the user can set a file containing his flavors.
4748

49+
### GPU table
50+
51+
The most commonly used datacenter GPUs are listed here, showing what GPUs (or partitions
52+
of a GPU) result in what GPU part of the flavor name.
53+
54+
#### Nvidia (`N`)
55+
56+
We show the most popular recent generations here. older one are of course possible as well.
57+
58+
##### Ampere (`a`)
59+
60+
One Streaming Multiprocessor on Ampere has 64 (A30, A100) or 128 Cuda Cores (A10, A40).
61+
62+
GPUs without MIG (one SM has 128 Cude Cores and 4 Tensor Cores):
63+
64+
| Nvidia GPU | Tensor C | Cuda Cores | SMs | VRAM | SCS name piece |
65+
|------------|----------|------------|-----|-----------|----------------|
66+
| A10 | 288 | 9216 | 72 | 24G GDDR6 | `GNa-72-24` |
67+
| A40 | 336 | 10752 | 84 | 48G GDDR6 | `GNa-84-48` |
68+
69+
GPUs with Multi-Instance-GPU (MIG), where GPUs can be partitioned and the partitions handed
70+
out as as pass-through PCIe devices to instances. One SM corresponds to 64 Cuda Cores and
71+
4 Tensor Cores.
72+
73+
| Nvidia GPU | Fraction | Tensor C | Cuda Cores | SMs | VRAM | SCS GPU name |
74+
|------------|----------|----------|------------|-----|-----------|----------------|
75+
| A30 | 1/1 | 224 | 3584 | 56 | 24G HBM2 | `GNa-56-24` |
76+
| A30 | 1/2 | 112 | 1792 | 28 | 12G HBM2 | `GNa-28-12` |
77+
| A30 | 1/4 | 56 | 896 | 14 | 6G HBM2 | `GNa-14-6` |
78+
| A30X | 1/1 | 224 | 3584 | 56 | 24G HBM2e | `GNa-56h-24h` |
79+
| A100 | 1/1 | 432 | 6912 | 108 | 80G HBM2e | `GNa-108h-80h` |
80+
| A100 | 1/2 | 216 | 3456 | 54 | 40G HBM2e | `GNa-54h-40h` |
81+
| A100 | 1/4 | 108 | 1728 | 27 | 20G HBM2e | `GNa-27h-20h` |
82+
| A100 | 1/7 | 60+ | 960+ | 15+| 10G HBM2e | `GNa-15h-10h`+ |
83+
| A100X | 1/1 | 432 | 6912 | 108 | 80G HBM2e | `GNa-108-80h` |
84+
85+
[+] The precise numbers for the 1/7 MIG configurations are not known by the author of
86+
this document and need validation.
87+
88+
##### Ada Lovelave (`l`)
89+
90+
No MIG support, 128 Cuda Cores and 4 Tensor Cores per SM.
91+
92+
| Nvidia GPU | Tensor C | Cuda Cores | SMs | VRAM | SCS name piece |
93+
|------------|----------|------------|-----|-----------|----------------|
94+
| L4 | 232 | 7424 | 58 | 24G GDDR6 | `GNl-58-24` |
95+
| L40 | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142-48` |
96+
| L40G | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142h-48` |
97+
| L40S | 568 | 18176 | 142 | 48G GDDR6 | `GNl-142hh-48` |
98+
99+
##### Grace Hopper (`g`)
100+
101+
These have MIG support and 128 Cuda Cores and 4 Tensor Cores per SM.
102+
103+
| Nvidia GPU | Fraction | Tensor C | Cuda Cores | SMs | VRAM | SCS GPU name |
104+
|------------|----------|----------|------------|-----|------------|----------------|
105+
| H100 | 1/1 | 528 | 16896 | 132 | 80G HBM3 | `GNg-132-80h` |
106+
| H100 | 1/2 | 264 | 8448 | 66 | 40G HBM3 | `GNg-66-40h` |
107+
| H100 | 1/4 | 132 | 4224 | 33 | 20G HBM3 | `GNg-33-20h` |
108+
| H100 | 1/7 | 72+ | 2304+ | 18+| 10G HBM3 | `GNg-18-10h`+ |
109+
| H200 | 1/1 | 528 | 16896 | 132 | 141G HBM3e | `GNg-132-141h` |
110+
| H200 | 1/2 | 264 | 16896 | 66 | 70G HBM3e | `GNg-66-70h` |
111+
| ... |
112+
113+
[+] The precise numbers for the 1/7 MIG configurations are not known by the author of
114+
this document and need validation.
115+
116+
#### AMD Radeon (`A`)
117+
118+
##### CDNA 2 (`2`)
119+
120+
One CU contains 64 Stream Processors.
121+
122+
| AMD Instinct| Stream Proc | CUs | VRAM | SCS name piece |
123+
|-------------|-------------|-----|------------|----------------|
124+
| Inst MI210 | 6656 | 104 | 64G HBM2e | `GA2-104-64h` |
125+
| Inst MI250 | 13312 | 208 | 128G HBM2e | `GA2-208-128h` |
126+
| Inst MI250X | 14080 | 229 | 128G HBM2e | `GA2-220-128h` |
127+
128+
##### CDNA 3 (`3`)
129+
130+
SRIOV partitioning is possible, resulting in pass-through for
131+
up to 8 partitions, somewhat similar to Nvidia MIG. 4 Tensor
132+
Cores and 64 Stream Processors per CU.
133+
134+
| AMD GPU | Tensor C | Stream Proc | CUs | VRAM | SCS name piece |
135+
|-------------|----------|-------------|-----|------------|----------------|
136+
| Inst MI300X | 1216 | 19456 | 304 | 192G HBM3 | `GA3-304-192h` |
137+
| Inst MI325X | 1216 | 19456 | 304 | 288G HBM3 | `GA3-304-288h` |
138+
139+
#### intel Xe (`I`)
140+
141+
##### Xe-HPC (Ponte Vecchio) (`12.7`)
142+
143+
1 EU corresponds to one Tensor Core and contains 128 Shading Units.
144+
145+
| intel DC GPU | Tensor C | Shading U | EUs | VRAM | SCS name piece |
146+
|--------------|----------|-----------|-----|------------|-------------------|
147+
| Max 1100 | 56 | 7168 | 56 | 48G HBM2e | `GI12.7-56-48h` |
148+
| Max 1550 | 128 | 16384 | 128 | 128G HBM2e | `GI12.7-128-128h` |
149+
48150
## Automated tests
49151

50152
### Errors

Tests/iaas/flavor-naming/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ def parse(cfg, version, name, output='none'):
7272
if flavorname is None:
7373
print(f"NOT an SCS flavor: {namestr}")
7474
elif output == 'prose':
75-
printv(name, end=': ')
75+
printv(namestr, end=': ')
7676
print(f"{prettyname(flavorname)}")
7777
elif output == 'yaml':
7878
print(yaml.dump(flavorname_to_dict(flavorname), explicit_start=True))

Tests/iaas/flavor-naming/flavor_names.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ class GPU:
212212
type = "GPU"
213213
component_name = "gpu"
214214
gputype = TblAttr("Type", {"g": "vGPU", "G": "Pass-Through GPU"})
215-
brand = TblAttr("Brand", {"N": "nVidia", "A": "AMD", "I": "Intel"})
215+
brand = TblAttr("Brand", {"N": "Nvidia", "A": "AMD", "I": "Intel"})
216216
gen = DepTblAttr("Gen", brand, {
217217
"N": {'': '(unspecified)', "f": "Fermi", "k": "Kepler", "m": "Maxwell", "p": "Pascal",
218218
"v": "Volta", "t": "Turing", "a": "Ampere", "l": "AdaLovelace", "g": "GraceHopper"},
@@ -222,7 +222,9 @@ class GPU:
222222
"3": "Arc/Gen12.7/DG2"},
223223
})
224224
cu = OptIntAttr("#.N:SMs/A:CUs/I:EUs")
225-
perf = TblAttr("Performance", {"": "Std Perf", "h": "High Perf", "hh": "Very High Perf", "hhh": "Very Very High Perf"})
225+
perf = TblAttr("Frequency", {"": "Std Freq", "h": "High Freq", "hh": "Very High Freq"})
226+
vram = OptIntAttr("#.V:GiB VRAM")
227+
vramperf = TblAttr("Bandwidth", {"": "Std BW {<~1GiB/s)", "h": "High BW", "hh": "Very High BW"})
226228

227229

228230
class IB:
@@ -278,7 +280,7 @@ class Outputter:
278280
hype = "_%s"
279281
hwvirt = "_%?"
280282
cpubrand = "_%s%0%s"
281-
gpu = "_%s%s%s%-%s"
283+
gpu = "_%s%s%s%-%s%-%s"
282284
ib = "_%?"
283285

284286
def output_component(self, pattern, component, parts):
@@ -341,7 +343,7 @@ class SyntaxV1:
341343
hwvirt = re.compile(r"\-(hwv)")
342344
# cpubrand needs final lookahead assertion to exclude confusion with _ib extension
343345
cpubrand = re.compile(r"\-([izar])([0-9]*)(h*)(?=$|\-)")
344-
gpu = re.compile(r"\-([gG])([NAI])([^:h]*)(?::([0-9]+)|)(h*)")
346+
gpu = re.compile(r"\-([gG])([NAI])([^:h]*)(?::([0-9]+)|)(h*)(?::([0-9]+)|)(h*)")
345347
ib = re.compile(r"\-(ib)")
346348

347349
@staticmethod
@@ -366,7 +368,7 @@ class SyntaxV2:
366368
hwvirt = re.compile(r"_(hwv)")
367369
# cpubrand needs final lookahead assertion to exclude confusion with _ib extension
368370
cpubrand = re.compile(r"_([izar])([0-9]*)(h*)(?=$|_)")
369-
gpu = re.compile(r"_([gG])([NAI])([^\-h]*)(?:\-([0-9]+)|)(h*)")
371+
gpu = re.compile(r"_([gG])([NAI])([^\-h]*)(?:\-([0-9]+)|)(h*)(?:\-([0-9]+)|)(h*)")
370372
ib = re.compile(r"_(ib)")
371373

372374
@staticmethod
@@ -697,10 +699,14 @@ def prettyname(flavorname, prefix=""):
697699
if flavorname.gpu:
698700
stg += "and " + _tbl_out(flavorname.gpu, "gputype")
699701
stg += _tbl_out(flavorname.gpu, "brand")
700-
stg += _tbl_out(flavorname.gpu, "perf", True)
701702
stg += _tbl_out(flavorname.gpu, "gen", True)
702703
if flavorname.gpu.cu is not None:
703-
stg += f"(w/ {flavorname.gpu.cu} SMs/CUs/EUs) "
704+
stg += f"(w/ {flavorname.gpu.cu} {_tbl_out(flavorname.gpu, 'perf', True)}SMs/CUs/EUs"
705+
# Can not specify VRAM without CUs
706+
if flavorname.gpu.vram:
707+
stg += f" and {flavorname.gpu.vram} GiB {_tbl_out(flavorname.gpu, 'vramperf', True)}VRAM) "
708+
else:
709+
stg += ") "
704710
# IB
705711
if flavorname.ib:
706712
stg += "and Infiniband "

0 commit comments

Comments
 (0)