Skip to content

Commit 04f9244

Browse files
Add Artcodix Deployment Example (#188)
* add sidebar nav and first doc Signed-off-by: Max Wolfs <[email protected]> * Initial draft for a deployment example Signed-off-by: Michael Bayr <[email protected]> * write out HCI Signed-off-by: Max Wolfs <[email protected]> * move into correct directory Signed-off-by: Max Wolfs <[email protected]> * Resolved todos with BOMs Signed-off-by: Michael Bayr <[email protected]> * Added network speed alternative Signed-off-by: Michael Bayr <[email protected]> * Fixed formatting issues Signed-off-by: Michael Bayr <[email protected]> * Fixed formatting issues Signed-off-by: Michael Bayr <[email protected]> * Added missing new line Signed-off-by: Michael Bayr <[email protected]> * add deployment guide url Signed-off-by: Max Wolfs <[email protected]> * consistent network speeds across the document Signed-off-by: Michael Bayr <[email protected]> * fix menu structure and label Signed-off-by: Max Wolfs <[email protected]> --------- Signed-off-by: Max Wolfs <[email protected]> Signed-off-by: Michael Bayr <[email protected]> Co-authored-by: Michael Bayr <[email protected]>
1 parent 26f4910 commit 04f9244

File tree

2 files changed

+174
-0
lines changed

2 files changed

+174
-0
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# artcodix
2+
3+
## Preface
4+
5+
This document describes a possible environment setup for a pre-production or minimal production setup.
6+
In general hardware requirements can vary largely from environment to environment and this guide is not
7+
a hardware sizing guide nor the best placement solution of services for every setup. This guide intends to
8+
provide a starting point for a hardware based deployment of the SCS-IaaS reference implementation based on OSISM.
9+
10+
## Node type definitions
11+
12+
### Control Node
13+
14+
A control node runs all or most of the openstack services, that are responsible for API-services and the corresponding
15+
runtimes. These nodes are necessary for any user to interact with the cloud and to keep the cloud in a managed state.
16+
However these nodes are usualy **not** running user virtual machines.
17+
Hence it is advisable to have the control nodes replicated. To have a RAFT-quorum three nodes are a good starting point.
18+
19+
### Compute Node (HCI/no HCI)
20+
21+
#### Not Hyperconverged Infrastructure (no HCI)
22+
23+
Non HCI compute nodes are exclusively running user virtual machines. They are running no API-services, no storage daemons
24+
and no network routers, except for the necessary network infrastructure to connect virtual machines.
25+
26+
#### Hyperconverged Infrastructure (HCI)
27+
28+
HCI nodes generally run at least user virtual machines and storage daemons. It is possible to place networking services
29+
here as well but that is not considered good practice.
30+
31+
#### No HCI / vs HCI
32+
33+
Whether to use HCI nodes or not is in general not an easy question. For a getting started (pre production/smalles possible production)
34+
environment however, it is the most cost efficent option. Therefore we will continue with HCI nodes (compute + storage).
35+
36+
### Storage Node
37+
38+
A dedicated storage node runs only storage daemons. This can be necessary in larger deployments to protect the storage daemons from
39+
ressource starvation through user workloads.
40+
41+
Not used in this setup.
42+
43+
### Network Node
44+
45+
A dedicated network node runs the routing infrastructure for user virtual machines that connects these machines with provider / external
46+
networks. In larger deployments these can be useful to enhance scaling and improve network performance.
47+
48+
Not used in this setup.
49+
50+
## Nodes in this deployment example
51+
52+
As mentioned before we are running three dedicated control nodes. To be able to fully test an openstack environment it is
53+
recommended to run three compute nodes (HCI) as well. Technically you can get a setup running with just one compute node.
54+
See the following chapter (Use cases and validation) for more information.
55+
56+
### Use cases and validation
57+
58+
The setup described allows for the following use cases / test cases:
59+
60+
- Highly available control plane
61+
- Control plane failure toleration test (Database, RabbitMQ, Ceph Mons, Routers)
62+
- Highly available user virtual clusters (e.g. Kubernetes clusters)
63+
- Compute host failure simulation
64+
- Host aggregates / compute node grouping
65+
- Host based storage replication (instead of OSD based)
66+
- Fully replicated storage / storage high availability test
67+
68+
### Control Node
69+
70+
#### General requirements
71+
72+
The control nodes do not run any user workloads. This means they are usually not sized as big as the compute nodes.
73+
Relevant metrics for control nodes are:
74+
75+
- Fast and big enough discs. At least SATA-SSDs are recommended, NVMe will greatly improve the overall responsiveness.
76+
- A rather large amount of memory to house all the caches for databases and queues.
77+
- CPU performance should be average. A good compromise between amount of cores and speed should be used. However this is
78+
the least important requirement on the list.
79+
80+
#### Hardware recommendation
81+
82+
The following server specs are just a starting point and can greatly vary between environments.
83+
84+
Example:
85+
3x Dell R630/R640/R650 1HE Server
86+
87+
- Dual 8 Core 3,00 GHz Intel/AMD
88+
- 128 GB RAM
89+
- 2x 3,84 TB NVMe in (Software-) RAID 1
90+
- 2x 10/25/40 GBit 2 Port SFP+/QSFP Network Cards
91+
92+
### Compute Node (HCI)
93+
94+
The compute nodes in this scenario run all the user virtual workloads **and** the storage infrastructure. To make sure
95+
we don't starve these nodes, they should be of decent size.
96+
97+
> This setup takes local storage tests into consideration. The SCS-standards require certain flavors with very fast disc speed
98+
> to house customer kubernetes control planes (etcd). These speeds are usually not achievable with shared storage. If you don't
99+
> intend to test this scenario, you can skip the NVMe discs.
100+
101+
#### Hardware recommendation
102+
103+
The following server specs are just a starting point and can greatly vary between environments. The sizing of the nodes needs to fit
104+
the expected workloads (customer VMs).
105+
106+
Example:
107+
3x Dell R730(xd)/R740(xd)/R750(xd)
108+
or
109+
3x Supermicro
110+
111+
- Dual 16 Core 2,8 GHz Intel/AMD
112+
- 512 GB RAM
113+
- 2x 3,84 TB NVMe in (Software-) RAID 1 if you want to have local storage available (optional)
114+
115+
For hyperconverged ceph osds:
116+
117+
- 4x 10 TB HDD -> This leads to ~30 TB of available HDD storage (optional)
118+
- 4x 7,68 TB SSD -> This leads to ~25 TB of available SSD storage (optional)
119+
- 2x 10/25/40 GBit 2 Port SFP+/QSFP Network Cards
120+
121+
## Network
122+
123+
The network infrastructure can vary a lot from setup to setup. This guide does not intend to define the best networking solution
124+
for every cluster but rather give two possible scenarios.
125+
126+
### Scenario A: Not recommended for production
127+
128+
The smallest possible setup is just a single switch connected to all the nodes physically on one interface. The switch has to be
129+
VLAN enabled. Openstack recommends multiple isolated networks but the following are at least recommended to be split:
130+
131+
- Out of Band network
132+
- Management networks
133+
- Storage backend network
134+
- Public / External network for virutal machines
135+
If there is only one switch, these networks should all be defined as seperate VLANs. One of the networks can run in untagged default
136+
VLAN 1.
137+
138+
### Scenario B: Minimum recommended setup for small production environments
139+
140+
The recommended setup uses two stacked switches connected in a LAG and at least three different physical network ports on each node.
141+
142+
- Physical Network 1: VLANs for Public / External network for virutal machines, Management networks
143+
- Physical Network 2: Storage backend network
144+
- Physical Network 3: Out of Band network
145+
146+
### Network adapters
147+
148+
The out of band network does usually not need a lot of bandwith. Most modern servers come with 1Gbit/s adapters which are sufficient.
149+
For small test clusters, it might also be sufficient to use 1Gbit/s networks for the other two physical networks.
150+
For a minimum production cluster it is recommended to use the following:
151+
152+
- Out of Band Network: 1Gbit/s
153+
- VLANs for Public / External network for virutal machines, Management networks: 10 / 25 Gbit/s
154+
- Storage backend network: 10 / 25 / 40 Gbit/s
155+
156+
Whether you need a higher throughput for your storage backend services depends on your expected storage load. The faster the network
157+
the faster storage data can be replicated between nodes. This usually leads to improved performance and better/faster fault tolerance.
158+
159+
## How to continue
160+
161+
After implementing the recommended deployment example hardware, you can continue with the [deployment guide](https://docs.scs.community/docs/iaas/guides/deploy-guide/).

sidebarsDocs.js

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,19 @@ const sidebarsDocs = {
4848
id: 'iaas/components/flavor-manager'
4949
}
5050
]
51+
},
52+
{
53+
type: 'category',
54+
label: 'Deployment Examples',
55+
link: {
56+
type: 'generated-index'
57+
},
58+
items: [
59+
{
60+
type: 'doc',
61+
id: 'artcodix/index'
62+
}
63+
]
5164
}
5265
]
5366
},

0 commit comments

Comments
 (0)