|
| 1 | +--- |
| 2 | +sidebar_label: Hardware-Landscape |
| 3 | +sidebar_position: 99 |
| 4 | +--- |
| 5 | + |
| 6 | +# The SCS Hardware-Landscape |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +## General information |
| 11 | + |
| 12 | +The general aim of this environment is to install and operate the SCS reference implementation on hardware. |
| 13 | +In addition to the classic tasks in the area of quality assurance, the environment is also used to evaluate |
| 14 | +new concepts in the underlay/overlay network area, as a test environment for hardware-related developments, |
| 15 | +as a demonstration environment for interested parties and as a publicly accessible blueprint for users. |
| 16 | +The environment is designed for long-term use which a varying circle of users. |
| 17 | + |
| 18 | +The environment consists of 21 server and 12 switch components. The selection of hardware and the |
| 19 | +functions and properties used was designed so that the focus is on generally available or characteristic |
| 20 | +functions and dependency on manufacturer-specific functions is avoided. Instead of the x86 servers or SONiC |
| 21 | +switches used here, the realised environment could also be realised with hardware from other manufacturers. |
| 22 | + |
| 23 | +From 1 January 2025, the environment will be operated by [forum SCS-Standards](https://scs.community/2024/10/23/osba-forum-scs-standards/) |
| 24 | +and the participating companies. |
| 25 | + |
| 26 | +## Tasks and Objectives |
| 27 | + |
| 28 | +The tasks and objectives of the environments can be summarised as follows: |
| 29 | + |
| 30 | +* The division into several environments makes it possible to run a lab as well as to map a productive environment (near-live operation). |
| 31 | +* Operation of the compliance monitor (automated test for conformity with the SCS standards) |
| 32 | +* Implementation and validation of the developed standards in a reference environment |
| 33 | +* Analysis of problems in the interaction with the standards |
| 34 | +* Provision of proof-of-concept installations for interested parties who want to use, promote or further develop the project |
| 35 | +* The environment can be used by members of the SCS Standards forum and by contributors to the SCS community |
| 36 | + as a development and test environment for open-source development in connection with the further development |
| 37 | + of the SCS standards, SCS reference implementation and other relevant software components ('open-lab'/'near-live laboratory'). |
| 38 | +* Continuous Integration Environment ('Zuul as a Service') - Operation of non-critical zuul worker instances |
| 39 | + |
| 40 | +## Installation details |
| 41 | + |
| 42 | +The available hardware was divided into two distinct application areas: |
| 43 | + |
| 44 | +* The **lab environment** consists exclusively of switch hardware used to evaluate, test and develop |
| 45 | + concepts in the area of 'Software Defined Networking'. This means that various switch models can be |
| 46 | + used to test and implement development tasks in the area of the open [SONiC](https://sonicfoundation.dev/) NOS |
| 47 | + (a network operating system based on Debian Linux) or provisioning automation tasks in the SONiC environment with the |
| 48 | + open-source system Netbox, a solution that is used primarily for IPAM and DCIM (IP Address Management, Data Center Infrastructure Management). |
| 49 | +* The **production environment** is an exemplary installation of the relevant or most reference implementations with regard to an |
| 50 | + SCS system. It follows a configuration or approach that is based on the needs and circumstances of a real and much larger environment. |
| 51 | + To this end, characteristic infrastructure components were automatically installed on the manager nodes used for the installation. |
| 52 | + |
| 53 | +The setup of the entire environment is designed in such a way that it can be reproducibly restored or reset. |
| 54 | +Therefore, the Ansible automation available via OSISM was used in many areas. |
| 55 | +Areas that could not be usefully automated using Ansible were implemented using a Python command-line tooling stored in the GIT repository. |
| 56 | + |
| 57 | +## Available documentation |
| 58 | + |
| 59 | +The primary point of information and orientation is the [*readme file*](https://github.com/SovereignCloudStack/hardware-landscape?tab=readme-ov-file#references) |
| 60 | +which is stored at the top level of the [configuration repository](https://github.com/SovereignCloudStack/hardware-landscape). |
| 61 | + |
| 62 | +The relevant **References** section refers here to the individual documentation areas. |
| 63 | + |
| 64 | +## Specific installation and configuration details |
| 65 | + |
| 66 | +* Processes for access management to the environment (2 VPN gateways, SSH logins, SSH profiles,..) have been implemented |
| 67 | +* The production and lab environments have been set up, automated and documented as described above |
| 68 | +* The complete environment is managed in a [GIT repository](https://github.com/SovereignCloudStack/hardware-landscape), |
| 69 | + adjustments and further developments are managed via GIT merge requests |
| 70 | +* Almost all installation steps are [documented and automated](https://github.com/SovereignCloudStack/hardware-landscape/blob/main/documentation/System_Deployment.md) |
| 71 | + based on a pure rack installation (The setup is extensively documented, in particular the few manual steps) |
| 72 | + * The entire customized setup of the nodes is [implemented by OSISM/Ansible](https://github.com/SovereignCloudStack/hardware-landscape/tree/main/environments/custom) |
| 73 | + * All secrets (e.g. passwords) of the environment are stored and versioned in the encrypted Ansible Vault in |
| 74 | + the repository (when access is transferred, rekeying can be used to change the access or the rights to it). |
| 75 | + * A far-reaching or in-depth automation has been created that allows the environment to be re-set up or parts of it to |
| 76 | + be re-set up with a reasonable amount of personnel. |
| 77 | + * The setup of the basic environment was implemented appropriately with Ansible and using the OSISM environment (the reference implementation) |
| 78 | + * Python tooling was created that adds areas that are specific to the use case of the environment and provides functions that simplify the operation of the infrastructure |
| 79 | + * Server systems |
| 80 | + * Backup and restore of the hardware configuration |
| 81 | + * Templating of the BMC configuration |
| 82 | + * Automatic installation of the operating system base image via Redfish Virtual Media |
| 83 | + * Control of the server status via command line (to stop and start the system for test, maintenance and energy-saving purposes) |
| 84 | + * Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation |
| 85 | + * Switches |
| 86 | + * Backup and restore of the switch configuration |
| 87 | + * Generation of base profiles for the Ansible Inventory based on the hardware key data stored in the documentation |
| 88 | +* Network setup |
| 89 | + * The two management hosts act as redundant VPN gateways, ssh jumphosts, routers and uplink routers |
| 90 | + * The system is deployed with a layer 3 underlay concept |
| 91 | + * An "eBGP router on the host" is implemented for the node-interconnectivity |
| 92 | + (all nodes and all switches are running FRR instances) |
| 93 | + * All Ceph and Openstack nodes of the system do not have a direct upstream routing |
| 94 | + (access is configured and provided by HTTP-, NTP and DNS-proxies) |
| 95 | + * For security reasons, the system itself can only be accessed via VPN. |
| 96 | + The provider network of the production environment is realized with a VXLAN which is terminated on the managers for routing |
| 97 | + ('a virtual provider network')). |
| 98 | + * The basic node installation was realised in such a way that specific [node images](https://github.com/osism/node-image) |
| 99 | + are created for the respective rack, which make the operation or reconfiguration of network equipment for PXE bootstrap |
| 100 | + unnecessary. (Preliminary stage for rollout via OpenStack Ironic) |
| 101 | + * The management of the hardware (BMC and switch management) is implemented with a VLAN |
| 102 | + * Routing, firewalling and NAT is managed by a NFTables Script which adds rules in a idempotent way to the existing rules |
| 103 | + of the manager nodes. |
| 104 | +* The [openstack workload generator](https://github.com/SovereignCloudStack/openstack-workload-generator) is used put test workloads |
| 105 | + on the system |
| 106 | + * Automated creation of OpenStack domains, projects, servers, networks, users, etc. |
| 107 | + * Launching test workloads |
| 108 | + * Dismantling test workloads |
| 109 | +* An observability stack was built |
| 110 | + * Prometheus for metrics |
| 111 | + * Opensearch for log aggregation |
| 112 | + * Central syslog server for the switches on the managers (recorded via the manager nodes in Opensearch) |
| 113 | +* Specific documentation created for the project |
| 114 | + * Details of the hardware installed in the environment |
| 115 | + * The physical structure of the environment was documented in detail (rack installation and cabling) |
| 116 | + * The technical and logical structure of the environment was documented in detail |
| 117 | + * A FAQ for handling the open-source network operating system SONiC was created with relevant topics for the test environment |
| 118 | + * As part of the development, the documentation and implementation of the OSISM reference implementation was significantly improved (essentially resulting from |
0 commit comments