Skip to content

Commit 7271122

Browse files
authored
Merge branch 'main' into mandatory-and-supported-IaaS-services
2 parents 30a5eb0 + ed85718 commit 7271122

File tree

21 files changed

+314
-486
lines changed

21 files changed

+314
-486
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@
44
.DS_Store
55
node_modules
66
Tests/kaas/results/
7+
Tests/kaas/kaas-sonobuoy-tests/results/
78
*.tar.gz

Standards/scs-0100-v3-flavor-naming.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -366,13 +366,15 @@ The options for arch are as follows:
366366
The generation is vendor specific and can be left out, but it can only be specified in
367367
conjunction with a vendor. At present, these values are possible:
368368

369-
| Generation | i (Intel x86-64) | z (AMD x86-64) |  a (AArch64) | r (RISC-V) |
370-
| ---------- | ---------------- | -------------- | ------------------ | ---------- |
371-
| 0 | pre Skylake | pre Zen | pre Cortex A76 | TBD |
372-
| 1 | Skylake | Zen-1 (Naples) | A76/NeoN1 class | TBD |
373-
| 2 | Cascade Lake | Zen-2 (Rome) | A78/x1/NeoV1 class | TBD |
374-
| 3 | Ice Lake | Zen-3 (Milan) | A71x/NeoN2 (ARMv9) | TBD |
375-
| 4 | Sapphire Rapids | Zen-4 (Genoa) | | TBD |
369+
| Generation | i (Intel x86-64) | z (AMD x86-64) |  a (AArch64) | r (RISC-V) |
370+
| ---------- | ----------------- | -------------- | -------------------- | ---------- |
371+
| 0 | pre Skylake | pre Zen | pre Cortex A76 | TBD |
372+
| 1 | Skylake | Zen-1 (Naples) | A76/NeoN1 class | TBD |
373+
| 2 | Cascade Lake | Zen-2 (Rome) | A78/x1/NeoV1 class | TBD |
374+
| 3 | Ice Lake | Zen-3 (Milan) | A71x/NeoN2/V2(ARMv9) | TBD |
375+
| 4 | Sapphire Rapids | Zen-4 (Genoa) | AmpereOne (ARMv8.6) | TBD |
376+
| 5 | Sierra Forest(E) | Zen-5 (Turin) | A72x/NeoN3/V3(Av9.2) | TBD |
377+
| 6 | Granite Rapids(P) | | | TBD |
376378

377379
It is recommended to leave out the `0` when specifying the old generation; this will
378380
help the parser tool, which assumes 0 for an unspecified value and does leave it
@@ -384,8 +386,11 @@ out when generating the name for comparison. In other words: 0 has a meaning of
384386
We don't differentiate between Zen-4 (Genoa) and Zen-4c (Bergamo); L3 cache per
385387
Siena core is smaller on Bergamo and the frequency lower but the cores are otherwise
386388
identical. As we already have a qualifier `h` that allows to specify higher frequencies
387-
(which Genoa thus may use more and Bergamo less or not), we have enough distinction
388-
capabilities.
389+
(which Genoa thus may use more and Bergamo not), we have enough distinction
390+
capabilities. The same applies to Zen-5 (Turin) and Zen-5c (Turin Dense).
391+
For intel with the server E-cores (Crestmont), these received their own
392+
generation assignment, as the difference to the server P-cores (Redwood Cove)
393+
is more significant.
389394

390395
:::
391396

@@ -430,9 +435,9 @@ Note that the vendor letter X is mandatory, generation and processing units are
430435
| `A` | AMD | compute units (CUs) |
431436
| `I` | Intel | execution units (EUs) |
432437

433-
For nVidia, the generation N can be f=Fermi, k=Kepler, m=Maxwell, p=Pascal, v=Volta, t=turing, a=Ampere, l=Ada Lovelace, ...,
434-
for AMD GCN-x=0.x, RDNA1=1, RDNA2=2, RDNA3=3,
435-
for Intel Gen9=0.9, Xe(12.1)=1, ...
438+
For nVidia, the generation N can be f=Fermi, k=Kepler, m=Maxwell, p=Pascal, v=Volta, t=turing, a=Ampere, l=Ada Lovelace, g=Grace Hopper, ...,
439+
for AMD GCN-x=0.x, RDNA1=1, C/RDNA2=2, C/RDNA3=3, C/RDNA3.5=3.5, C/RDNA4=4, ...
440+
for Intel Gen9=0.9, Xe(12.1/DG1)=1, Xe(12.2)=2, Arc(12.7/DG2)=3 ...
436441
(Note: This may need further work to properly reflect what's out there.)
437442

438443
The optional `h` suffix to the compute unit count indicates high-performance (e.g. high freq or special
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Replacement of the deprecated ceph-ansible tool
3+
type: Decision Record
4+
status: Draft
5+
track: IaaS
6+
---
7+
8+
## Abstract
9+
10+
This decision record evaluates the choice for a modern, future-proof deployment tool for the networked storage solution Ceph in the SCS reference implementation, [OSISM](https://osism.tech/).
11+
The new deployment tool aims to enhance Kubernetes integration within SCS, potentially allowing providers to manage the Ceph cluster with greater ease and efficiency.
12+
13+
## Context
14+
15+
The current reference implementation relies on `ceph-ansible`, [which is now deprecated](https://github.com/ceph/ceph-ansible/commit/a9d1ec844d24fcc3ddea7c030eff4cd6c414d23d). As a result, this decision record evaluates two alternatives: [Cephadm](https://docs.ceph.com/en/latest/cephadm/) and [Rook](https://rook.io/docs/rook/latest-release/Getting-Started/intro/).
16+
17+
Both tools are designed to roll out and configure Ceph clusters, providing the capability to manage clusters throughout their lifecycle. This includes functionalities such as adding or removing OSDs, upgrading Ceph services, and managing CRUSH maps, as outlined in the [Feature-Decision-Table](#feature-decision-table).
18+
19+
This decision record considers both the current and future needs of the reference implementation. The decision is guided by a comprehensive comparison of each tool's capabilities and limitations as well as the SCS communities needs and futures objectives.
20+
21+
### Comparison of Features
22+
23+
The tool selected in this decision MUST ensure:
24+
25+
* ease of migration
26+
* future-proofness
27+
* feature-completeness and feature-maturity
28+
* effective management of Ceph clusters
29+
30+
#### Feature Decision Table
31+
32+
A comparative analysis of Cephadm and Rook highlights the following:
33+
34+
| Feature | Supported in Cephadm | Supported in Rook |
35+
| ------- | -------------------- | ----------------- |
36+
| Migrate from other setups | ☑ Adoption of clusters, that where built with ceph-ansible [is officially supported](https://docs.ceph.com/en/quincy/cephadm/adoption/).| ☐ Migration from other setups is not offically supported. See this [issue](https://github.com/rook/rook/discussions/12045). Consequently, SCS develops a migration tool, named [rookify](https://github.com/SovereignCloudStack/rookify). Alternatively, Rook allows to use [Ceph as an external cluster](https://rook.io/docs/rook/latest-release/CRDs/Cluster/external-cluster/external-cluster/). |
37+
| Connect RGW with OpenStack Keystone || ☑ Experimental |
38+
| Deploy specific Ceph versions |||
39+
| Upgrade to specific Ceph versions | ☑ Streamlined upgrade process. | ☑ Rook, CSI and Ceph upgrades have to be aligned, there is a [guide](https://rook.io/docs/rook/latest-release/Upgrade/health-verification/) available for each Rook version. |
40+
| Deploy Ceph Monitors |||
41+
| Deploy Ceph Managers |||
42+
| Deploy Ceph OSDs |||
43+
| Deploy Ceph Object Gateway (RGW) |||
44+
| Removal of nodes |||
45+
| Purging of complete cluster |||
46+
47+
☐ not supported (yet)
48+
☑ supported
49+
☑☑ better option
50+
☒ not supported on purpose
51+
52+
#### Evaluation in the Light of SCS Community Plans and Preferences
53+
54+
**Environment**: Cephadm is better suited for traditional or standalone environments. Conversely, Rook is tailored for Kubernetes. That being said, it's important to note that the current state of resource deployment and management on Kubernetes within the IaaS reference implementation is still in its early stages. This would make Rook one of the first components to utilise Kubernetes in OSISM.
55+
56+
**Deployment**: Cephadm uses containerization for Ceph components, whereas Rook fully embraces the Kubernetes ecosystem for deployment and management. Although containerization is already a core concept in the reference implementation, there is a strong push from the SCS community to adopt more Kubernetes.
57+
58+
**Configuration and Management**: Rook offers a more straightforward experience for those already utilizing Kubernetes, leveraging Kubernetes' features for automation and scaling. In contrast, Cephadm grants finer control over Ceph components, albeit necessitating more manual intervention. In both cases, this is something that needs to be partly abstracted by the reference implementation.
59+
60+
**Integration**: Rook provides better integration with cloud-native tools and environments, whereas Cephadm offers a more Ceph-centric management experience.
61+
62+
**Migration**: Rook does not currently provide any migration support, while Cephadm does offer this capability. However, the SCS community is highly supportive of developing a migration tool (Rookify) for Rook, as this would enhance SCS's influence by offering the first migration solution specifically for Rook providers.
63+
64+
**SCS Community**: An important factor in our decision is the preferences and direction of the SCS community and its providers. There is a noticeable trend towards increased use of Kubernetes within the community. This indicates a preference for deployment tools that integrate well with Kubernetes environments.
65+
66+
**SCS Future Goals**: The SCS community is open to building tools that provide open-source, publicly available solutions beyond the scope of SCS. This openness to development efforts that address limitations of the chosen tools, such as Rook, is also a key consideration in our decision.
67+
68+
## Decision
69+
70+
As OSISM will increasingly focus on a Kubernetes-centric approach for orchestration in the near future, adopting Rook is a more suitable and standardized approach. Moreover, many service providers within the SCS community (including several who deploy OSISM) already have experience with Kubernetes. Regarding the missing OpenStack Keystone integration, we are confident that colleagues, who work on this issue, will provide a solution in a timely manner. We expect that deploying Ceph with Rook will simplify deployment and configuration form the outset.
71+
In order to allow for a migration from existing Ceph installations to Rook, we decided to develop a migration tool (called Rookify) for the reference implementation. If the development of Rookify goes beyond the targeted scope of the reference implementation the tool will add value to the Ceph as well as the Rook community.
72+
73+
## Consequences
74+
75+
Migrating an existing Ceph environment onto Kubernetes, as well as bringing together existing but independent Ceph and Kubernetes environments, will become straight forward without much manual interference needed.
76+
Landscapes that currently do not deploy a Kubernetes cluster have to adapt and provide a Kubernetes cluster in the future.

Tests/iaas/entropy/entropy-check.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -437,7 +437,7 @@ def main(argv):
437437
all_flavors = conn.list_flavors(get_extra=True)
438438

439439
if '*' not in image_visibility:
440-
logger.debug(f"Images: filter for visibility {', '.join(image_visibility)}")
440+
logger.debug(f"Images: filter for visibility {', '.join(sorted(image_visibility))}")
441441
all_images = [img for img in all_images if img.visibility in image_visibility]
442442
all_image_names = [f"{img.name} ({img.visibility})" for img in all_images]
443443
logger.debug(f"Images: {', '.join(all_image_names) or '(NONE)'}")

Tests/iaas/flavor-naming/flavor_names.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -192,9 +192,11 @@ class CPUBrand:
192192
component_name = "cpubrand"
193193
cpuvendor = TblAttr("CPU Vendor", {"i": "Intel", "z": "AMD", "a": "ARM", "r": "RISC-V"})
194194
cpugen = DepTblAttr("#.CPU Gen", cpuvendor, {
195-
"i": {None: '(unspecified)', 0: "Unspec/Pre-Skylake", 1: "Skylake", 2: "Cascade Lake", 3: "Ice Lake", 4: "Sapphire Rapids"},
196-
"z": {None: '(unspecified)', 0: "Unspec/Pre-Zen", 1: "Zen 1", 2: "Zen 2", 3: "Zen 3", 4: "Zen 4"},
197-
"a": {None: '(unspecified)', 0: "Unspec/Pre-A76", 1: "A76/NeoN1", 2: "A78/X1/NeoV1", 3: "A710/NeoN2"},
195+
"i": {None: '(unspecified)', 0: "Unspec/Pre-Skylake", 1: "Skylake", 2: "Cascade Lake", 3: "Ice Lake", 4: "Sapphire Rapids",
196+
5: 'Sierra Forest (E)', 6: 'Granite Rapids (P)'},
197+
"z": {None: '(unspecified)', 0: "Unspec/Pre-Zen", 1: "Zen 1", 2: "Zen 2", 3: "Zen 3", 4: "Zen 4/4c", 5: "Zen 5/5c"},
198+
"a": {None: '(unspecified)', 0: "Unspec/Pre-A76", 1: "A76/NeoN1", 2: "A78/X1/NeoV1", 3: "A71x/NeoN2/V2",
199+
4: "AmpereOne", 5: "A72x/NeoN3/V3"},
198200
"r": {None: '(unspecified)', 0: "Unspec"},
199201
})
200202
perf = TblAttr("Performance", {"": "Std Perf", "h": "High Perf", "hh": "Very High Perf", "hhh": "Very Very High Perf"})
@@ -213,11 +215,13 @@ class GPU:
213215
brand = TblAttr("Brand", {"N": "nVidia", "A": "AMD", "I": "Intel"})
214216
gen = DepTblAttr("Gen", brand, {
215217
"N": {'': '(unspecified)', "f": "Fermi", "k": "Kepler", "m": "Maxwell", "p": "Pascal",
216-
"v": "Volta", "t": "Turing", "a": "Ampere", "l": "AdaLovelace"},
217-
"A": {'': '(unspecified)', "0.4": "GCN4.0/Polaris", "0.5": "GCN5.0/Vega", "1": "RDNA1/Navi1x", "2": "RDNA2/Navi2x", "3": "RDNA3/Navi3x"},
218-
"I": {'': '(unspecified)', "0.9": "Gen9/Skylake", "0.95": "Gen9.5/KabyLake", "1": "Xe1/Gen12.1", "2": "Xe2"},
218+
"v": "Volta", "t": "Turing", "a": "Ampere", "l": "AdaLovelace", "g": "GraceHopper"},
219+
"A": {'': '(unspecified)', "0.4": "GCN4.0/Polaris", "0.5": "GCN5.0/Vega", "1": "RDNA1/Navi1x", "2": "C/RDNA2/Navi2x",
220+
"3": "C/RDNA3/Navi3x", "3.5": "C/RDNA3.5", "4": "C/RDNA4"},
221+
"I": {'': '(unspecified)', "0.9": "Gen9/Skylake", "0.95": "Gen9.5/KabyLake", "1": "Xe1/Gen12.1/DG1", "2": "Xe2/Gen12.2",
222+
"3": "Arc/Gen12.7/DG2"},
219223
})
220-
cu = OptIntAttr("#.CU/EU/SM")
224+
cu = OptIntAttr("#.N:SMs/A:CUs/I:EUs")
221225
perf = TblAttr("Performance", {"": "Std Perf", "h": "High Perf", "hh": "Very High Perf", "hhh": "Very Very High Perf"})
222226

223227

@@ -696,7 +700,7 @@ def prettyname(flavorname, prefix=""):
696700
stg += _tbl_out(flavorname.gpu, "perf", True)
697701
stg += _tbl_out(flavorname.gpu, "gen", True)
698702
if flavorname.gpu.cu is not None:
699-
stg += f"(w/ {flavorname.gpu.cu} CU/EU/SM) "
703+
stg += f"(w/ {flavorname.gpu.cu} SMs/CUs/EUs) "
700704
# IB
701705
if flavorname.ib:
702706
stg += "and Infiniband "

Tests/iaas/image-metadata/image-md-check.py

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,18 @@
1111
SPDX-License-Identifier: CC-BY-SA-4.0
1212
"""
1313

14+
import calendar
15+
from collections import Counter
16+
import getopt
17+
import logging
1418
import os
1519
import sys
1620
import time
17-
import calendar
18-
import getopt
21+
1922
import openstack
20-
from collections import Counter
23+
24+
25+
logger = logging.getLogger(__name__)
2126

2227

2328
def usage(ret):
@@ -31,8 +36,10 @@ def usage(ret):
3136
print(" -v/--verbose : Be more verbose")
3237
print(" -s/--skip-completeness: Don't check whether we have all mandatory images")
3338
print(" -h/--help : Print this usage information")
34-
print("If you pass images, only these will be validated, otherwise all (public unless")
35-
print(" -p is specified) images from the catalog will be processed.")
39+
print(" [-V/--image-visibility VIS_LIST] : filters images by visibility")
40+
print(" (default: 'public,community'; use '*' to disable)")
41+
print("If you pass images, only these will be validated, otherwise all images")
42+
print("(filtered according to -p, -V) from the catalog will be processed.")
3643
sys.exit(ret)
3744

3845

@@ -335,43 +342,59 @@ def miss_replacement_images(by_name, outd_list):
335342

336343
def main(argv):
337344
"Main entry point"
345+
# configure logging, disable verbose library logging
346+
logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO)
347+
openstack.enable_logging(debug=False)
338348
# Option parsing
339349
global verbose
350+
image_visibility = set()
340351
private = False
341352
skip = False
342353
cloud = os.environ.get("OS_CLOUD")
343354
err = 0
344355
try:
345-
opts, args = getopt.gnu_getopt(argv[1:], "phvc:s",
346-
("private", "help", "os-cloud=", "verbose", "skip-completeness"))
356+
opts, args = getopt.gnu_getopt(argv[1:], "phvc:sV:",
357+
("private", "help", "os-cloud=", "verbose", "skip-completeness", "image-visibility="))
347358
except getopt.GetoptError: # as exc:
348359
print("CRITICAL: Command-line syntax error", file=sys.stderr)
349360
usage(1)
350361
for opt in opts:
351362
if opt[0] == "-h" or opt[0] == "--help":
352363
usage(0)
353364
elif opt[0] == "-p" or opt[0] == "--private":
354-
private = True
365+
private = True # only keep this for backwards compatibility (we have -V now)
355366
elif opt[0] == "-v" or opt[0] == "--verbose":
356367
verbose = True
368+
logging.getLogger().setLevel(logging.DEBUG)
357369
elif opt[0] == "-s" or opt[0] == "--skip-completeness":
358370
skip = True
359371
elif opt[0] == "-c" or opt[0] == "--os-cloud":
360372
cloud = opt[1]
373+
if opt[0] == "-V" or opt[0] == "--image-visibility":
374+
image_visibility.update([v.strip() for v in opt[1].split(',')])
361375
images = args
362376
if not cloud:
363377
print("CRITICAL: Need to specify --os-cloud or set OS_CLOUD environment.", file=sys.stderr)
364378
usage(1)
379+
if not image_visibility:
380+
image_visibility.update(("public", "community"))
381+
if private:
382+
image_visibility.add("private")
365383
try:
366384
conn = openstack.connect(cloud=cloud, timeout=24)
367385
all_images = list(conn.image.images())
386+
if '*' not in image_visibility:
387+
logger.debug(f"Images: filter for visibility {', '.join(sorted(image_visibility))}")
388+
all_images = [img for img in all_images if img.visibility in image_visibility]
389+
all_image_names = [f"{img.name} ({img.visibility})" for img in all_images]
390+
logger.debug(f"Images: {', '.join(all_image_names) or '(NONE)'}")
368391
by_name = {img.name: img for img in all_images}
369392
if len(by_name) != len(all_images):
370393
counter = Counter([img.name for img in all_images])
371394
duplicates = [name for name, count in counter.items() if count > 1]
372395
print(f'WARNING: duplicate names detected: {", ".join(duplicates)}', file=sys.stderr)
373396
if not images:
374-
images = [img.name for img in all_images if private or img.visibility == 'public']
397+
images = [img.name for img in all_images]
375398
# Analyse image metadata
376399
outdated_images = []
377400
for imgnm in images:

0 commit comments

Comments
 (0)