Add Artcodix Deployment Example (#188)

maxwolfs · michaelbayr · web-flow · commit 04f924415faf · 2024-09-23T11:44:03.000+02:00
* add sidebar nav and first doc

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;

* Initial draft for a deployment example

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* write out HCI

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;

* move into correct directory

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;

* Resolved todos with BOMs

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* Added network speed alternative

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* Fixed formatting issues

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* Fixed formatting issues

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* Added missing new line

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* add deployment guide url

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;

* consistent network speeds across the document

Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;

* fix menu structure and label

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;

---------

Signed-off-by: Max Wolfs &lt;mail@maxwolfs.com&gt;
Signed-off-by: Michael Bayr &lt;mb@artcodix.com&gt;
Co-authored-by: Michael Bayr &lt;mb@artcodix.com&gt;
diff --git a/docs/02-iaas/deployment-examples/artcodix/index.mdx b/docs/02-iaas/deployment-examples/artcodix/index.mdx
@@ -0,0 +1,161 @@
+# artcodix
+
+## Preface
+
+This document describes a possible environment setup for a pre-production or minimal production setup.
+In general hardware requirements can vary largely from environment to environment and this guide is not
+a hardware sizing guide nor the best placement solution of services for every setup. This guide intends to
+provide a starting point for a hardware based deployment of the SCS-IaaS reference implementation based on OSISM.
+
+## Node type definitions
+
+### Control Node
+
+A control node runs all or most of the openstack services, that are responsible for API-services and the corresponding
+runtimes. These nodes are necessary for any user to interact with the cloud and to keep the cloud in a managed state.
+However these nodes are usualy **not** running user virtual machines.
+Hence it is advisable to have the control nodes replicated. To have a RAFT-quorum three nodes are a good starting point.
+
+### Compute Node (HCI/no HCI)
+
+#### Not Hyperconverged Infrastructure (no HCI)
+
+Non HCI compute nodes are exclusively running user virtual machines. They are running no API-services, no storage daemons
+and no network routers, except for the necessary network infrastructure to connect virtual machines.
+
+#### Hyperconverged Infrastructure (HCI)
+
+HCI nodes generally run at least user virtual machines and storage daemons. It is possible to place networking services
+here as well but that is not considered good practice.
+
+#### No HCI / vs HCI
+
+Whether to use HCI nodes or not is in general not an easy question. For a getting started (pre production/smalles possible production)
+environment however, it is the most cost efficent option. Therefore we will continue with HCI nodes (compute + storage).
+
+### Storage Node
+
+A dedicated storage node runs only storage daemons. This can be necessary in larger deployments to protect the storage daemons from
+ressource starvation through user workloads.
+
+Not used in this setup.
+
+### Network Node
+
+A dedicated network node runs the routing infrastructure for user virtual machines that connects these machines with provider / external
+networks. In larger deployments these can be useful to enhance scaling and improve network performance.
+
+Not used in this setup.
+
+## Nodes in this deployment example
+
+As mentioned before we are running three dedicated control nodes. To be able to fully test an openstack environment it is
+recommended to run three compute nodes (HCI) as well. Technically you can get a setup running with just one compute node.
+See the following chapter (Use cases and validation) for more information.
+
+### Use cases and validation
+
+The setup described allows for the following use cases / test cases:
+
+- Highly available control plane
+  - Control plane failure toleration test (Database, RabbitMQ, Ceph Mons, Routers)
+- Highly available user virtual clusters (e.g. Kubernetes clusters)
+  - Compute host failure simulation
+- Host aggregates / compute node grouping
+- Host based storage replication (instead of OSD based)
+  - Fully replicated storage / storage high availability test
+
+### Control Node
+
+#### General requirements
+
+The control nodes do not run any user workloads. This means they are usually not sized as big as the compute nodes.
+Relevant metrics for control nodes are:
+
+- Fast and big enough discs. At least SATA-SSDs are recommended, NVMe will greatly improve the overall responsiveness.
+- A rather large amount of memory to house all the caches for databases and queues.
+- CPU performance should be average. A good compromise between amount of cores and speed should be used. However this is
+  the least important requirement on the list.
+
+#### Hardware recommendation
+
+The following server specs are just a starting point and can greatly vary between environments.
+
+Example:
+3x Dell R630/R640/R650 1HE Server
+
+- Dual 8 Core 3,00 GHz Intel/AMD
+- 128 GB RAM
+- 2x 3,84 TB NVMe in (Software-) RAID 1
+- 2x 10/25/40 GBit 2 Port SFP+/QSFP Network Cards
+
+### Compute Node (HCI)
+
+The compute nodes in this scenario run all the user virtual workloads **and** the storage infrastructure. To make sure
+we don't starve these nodes, they should be of decent size.
+
+> This setup takes local storage tests into consideration. The SCS-standards require certain flavors with very fast disc speed
+> to house customer kubernetes control planes (etcd). These speeds are usually not achievable with shared storage. If you don't
+> intend to test this scenario, you can skip the NVMe discs.
+
+#### Hardware recommendation
+
+The following server specs are just a starting point and can greatly vary between environments. The sizing of the nodes needs to fit
+the expected workloads (customer VMs).
+
+Example:
+3x Dell R730(xd)/R740(xd)/R750(xd)
+or
+3x Supermicro
+
+- Dual 16 Core 2,8 GHz Intel/AMD
+- 512 GB RAM
+- 2x 3,84 TB NVMe in (Software-) RAID 1 if you want to have local storage available (optional)
+
+For hyperconverged ceph osds:
+
+- 4x 10 TB HDD -> This leads to ~30 TB of available HDD storage (optional)
+- 4x 7,68 TB SSD -> This leads to ~25 TB of available SSD storage (optional)
+- 2x 10/25/40 GBit 2 Port SFP+/QSFP Network Cards
+
+## Network
+
+The network infrastructure can vary a lot from setup to setup. This guide does not intend to define the best networking solution
+for every cluster but rather give two possible scenarios.
+
+### Scenario A: Not recommended for production
+
+The smallest possible setup is just a single switch connected to all the nodes physically on one interface. The switch has to be
+VLAN enabled. Openstack recommends multiple isolated networks but the following are at least recommended to be split:
+
+- Out of Band network
+- Management networks
+- Storage backend network
+- Public / External network for virutal machines
+  If there is only one switch, these networks should all be defined as seperate VLANs. One of the networks can run in untagged default
+  VLAN 1.
+
+### Scenario B: Minimum recommended setup for small production environments
+
+The recommended setup uses two stacked switches connected in a LAG and at least three different physical network ports on each node.
+
+- Physical Network 1: VLANs for Public / External network for virutal machines, Management networks
+- Physical Network 2: Storage backend network
+- Physical Network 3: Out of Band network
+
+### Network adapters
+
+The out of band network does usually not need a lot of bandwith. Most modern servers come with 1Gbit/s adapters which are sufficient.
+For small test clusters, it might also be sufficient to use 1Gbit/s networks for the other two physical networks.
+For a minimum production cluster it is recommended to use the following:
+
+- Out of Band Network: 1Gbit/s
+- VLANs for Public / External network for virutal machines, Management networks: 10 / 25 Gbit/s
+- Storage backend network: 10 / 25 / 40 Gbit/s
+
+Whether you need a higher throughput for your storage backend services depends on your expected storage load. The faster the network
+the faster storage data can be replicated between nodes. This usually leads to improved performance and better/faster fault tolerance.
+
+## How to continue
+
+After implementing the recommended deployment example hardware, you can continue with the [deployment guide](https://docs.scs.community/docs/iaas/guides/deploy-guide/).
diff --git a/sidebarsDocs.js b/sidebarsDocs.js
@@ -48,6 +48,19 @@ const sidebarsDocs = {
               id: 'iaas/components/flavor-manager'
             }
           ]
+        },
+        {
+          type: 'category',
+          label: 'Deployment Examples',
+          link: {
+            type: 'generated-index'
+          },
+          items: [
+            {
+              type: 'doc',
+              id: 'artcodix/index'
+            }
+          ]
         }
       ]
     },

Original file line number	Diff line number	Diff line change
`@@ -48,6 +48,19 @@ const sidebarsDocs = {`
`48`	`48`	`id: 'iaas/components/flavor-manager'`
`49`	`49`	`}`
`50`	`50`	`]`
	`51`	`+ },`
	`52`	`+ {`
	`53`	`+ type: 'category',`
	`54`	`+ label: 'Deployment Examples',`
	`55`	`+ link: {`
	`56`	`+ type: 'generated-index'`
	`57`	`+ },`
	`58`	`+ items: [`
	`59`	`+ {`
	`60`	`+ type: 'doc',`
	`61`	`+ id: 'artcodix/index'`
	`62`	`+ }`
	`63`	`+ ]`
`51`	`64`	`}`
`52`	`65`	`]`
`53`	`66`	`},`