|
49 | 49 | <section>
|
50 | 50 | <img src="pics/slide2_meeting.jpg" width=70% height=70%></th>
|
51 | 51 | <aside class="notes">
|
52 |
| -... not this ... - advance to next slide - |
| 52 | +... not this ... |
53 | 53 |
|
54 | 54 | </aside>
|
55 | 55 | </section>
|
56 | 56 | <!–– Slide3 ––>
|
57 | 57 | <section>
|
58 | 58 | <img src="pics/slide3_HDS.jpg" width=70% height=70%></th>
|
59 | 59 | <aside class="notes">
|
60 |
| -but this ... - advance to next slide - |
| 60 | +but this ... |
61 | 61 | </aside>
|
62 | 62 | </section>
|
63 | 63 | <!–– Slide4 ––>
|
64 | 64 | <section>
|
65 | 65 | <img src="pics/slide3_datacenter.jpg" width=70% height=70%></th>
|
66 | 66 | <aside class="notes">
|
67 |
| -or this ... - advance to next slide - |
| 67 | +or this ... |
68 | 68 | </aside>
|
69 | 69 | </section>
|
70 | 70 | <!–– Slide5 ––>
|
|
108 | 108 | <br>Traces of this decoupling of the network functions from proprietary hardware appliances have been there for many years now.
|
109 | 109 | <br>Around 2003, I worked in an ISP. We used Cisco routers to do BGP with customers and the upstream provider. I was in awe when GNU Zebra came out and I could run BGP in a Linux box.
|
110 | 110 | <br>Fast forward to today, part of SDN, we use opendaylight with Quagga soft router for BGP ( Quagga is what followed after zebra, it is actually an extinct sub-specie of the African zebra.)
|
111 |
| - |
112 |
| - |
113 |
| -- advance to next slide - |
114 | 111 | </aside>
|
115 | 112 | </section>
|
116 | 113 | <!–– SlideX ––>
|
|
137 | 134 | <th><img src="pics/slide7_smartNIC.png" width=100% height=100%></th>
|
138 | 135 | </tr>
|
139 | 136 | <aside class="notes">
|
140 |
| -This is how a smartNIC looks like ! |
| 137 | +This is how a smartNIC looks like! |
141 | 138 |
|
142 |
| -To cope with this increase in traffic, we are looking at using smartNICs in Openstack on top of which we run vEPC in VMs. |
143 |
| -What is a smartNIC? |
144 |
| -A smartNIC is a NIC capable of offloading processing from the host CPU. To be able to do this, it can have its own CPU, memory and flash storage embedded on the card. |
145 |
| -To offload processing from CPU, a smartNIC will, for instance, run the ovs control and data plane, a firewall or performance acceleration techniques, *inside* the NIC. (That is, you will run your ovs-vsctl show commands while logged in to the NIC). - advance to next slide - |
| 139 | +<br>To cope with this increase in traffic, we are looking at using smartNICs in Openstack on top of which we run vEPC in VMs. |
| 140 | +<br> |
| 141 | +<br>What is a smartNIC? |
| 142 | +<br>A smartNIC is a NIC capable of offloading processing from the host CPU. To be able to do this, it can have its own CPU, memory and flash storage embedded on the card. |
| 143 | +<br>To offload processing from the CPU, a smartNIC will, for instance, run the ovs control and data plane, a firewall or performance acceleration, *inside* the NIC. |
| 144 | +<br>This means, you will run your ovs-vsctl show commands while logged in to the NIC |
146 | 145 | </aside>
|
147 | 146 | </section>
|
148 | 147 | <!–– Slide8 ––>
|
|
157 | 156 | </tr>
|
158 | 157 | <aside class="notes">
|
159 | 158 | There are quite a few types of smartNICs, ranging from thousands to a few hundred dollars(some you can buy on eBay).
|
160 |
| - |
161 |
| -Some smartNICs use an FPGA (Field Programmable Gate Array) mounted on the PCIe network card. Using FPGA boards provides flexibility, they can be easily programmed and updated once installed. (P4-Programmable Protocol-independent Packet Processor- can be used for programming packet forwarding planes.) |
162 |
| - |
163 |
| -Some smartNICs do not use an FPGA, and are ASIC based instead, they are less flexible but still capable of doing lots of things, at a significantly lower price. |
164 |
| - |
165 |
| -SmartNICs might use ARM CPUs in a RISC architecure or x86 CPUs, different amounts of RAM, they can consume more or less power. |
166 |
| -SmartNICs can have different form factors HHHL(half hight half lenght) if you need to fit it in a 2Us compute or FHFL full hight full lenght for wider than 2Us computes. |
167 |
| - |
168 |
| -You can opt for 2 or 4 X 25, 4X 50, 2 X 100 Gbps ports, that is two ports (cages) on a NIC wifh HHHL form factor and 4 ports on NICs with FHFL form factor. |
169 |
| -You have options around the number of PCI lanes to be used, x8 or x16. - advance to next slide - |
| 159 | +<br>Some smartNICs use an FPGA (Field Programmable Gate Array) mounted on the PCIe network card. Using FPGA boards provides flexibility, they can be easily programmed and updated once installed. (P4-Programmable Protocol-independent Packet Processor- can be used for programming packet forwarding planes.) |
| 160 | +<br> |
| 161 | +<br>Some smartNICs do not use an FPGA, and are ASIC based instead, they are less flexible but still capable of doing lots of things, at a significantly lower price. |
| 162 | +<br> |
| 163 | +<br>SmartNICs might use ARM CPUs in a RISC architecure (the kind of HW you have on your *smart*phone) or x86 CPUs, different amounts of RAM, they can consume more or less power. |
| 164 | +<br> |
| 165 | +<br>SmartNICs can have different form factors HHHL(half hight half lenght) if you need to fit it in a 2Us compute or FHFL full hight full lenght for wider than 2Us computes. |
| 166 | +<br> |
| 167 | +<br>You can opt for 2 or 4 X 25, 4X 50, 2 X 100 Gbps ports, that is two ports (cages) on a NIC wifh HHHL form factor and 4 ports on NICs with FHFL form factor. |
| 168 | +<br>You have options around the number of PCI lanes to be used, x8 or x16. |
170 | 169 | </aside>
|
171 | 170 | </section>
|
172 | 171 |
|
|
180 | 179 |
|
181 | 180 | <aside class="notes">
|
182 | 181 | How should your architecture look like when using smartNICs.
|
183 |
| -<br><br> |
184 |
| -You could use one smartNIC for data per compute, we're talking 2x100 Gbps here, that's *a lot* of bandwidth. |
| 182 | +<br> |
| 183 | +<br>You could use one smartNIC for data per compute, we're talking 2x100 Gbps here, that's *a lot* of bandwidth. |
185 | 184 | In this case, high availability happens at compute level, not at network card level.
|
186 | 185 | <br><br>
|
187 | 186 | What if you have a dual socket system. If you plug a smartNIC in a PCIe socket, applications might not like crossing that QPI link (from 2017 called UPI) between the CPUs.
|
188 | 187 | In this case you can look into using a bifurcated smartNIC, that's basically splitting the card in two physical pieces that you can insert in two PCIe slots, one per NUMA node.
|
189 | 188 | <br><br>
|
190 | 189 | If you use two smartNICs, do you want two separate ovs controllers in your compute? will OpenStack even support that?
|
191 |
| - |
192 | 190 | </aside>
|
193 | 191 | </section>
|
194 | 192 |
|
195 | 193 | <!–– Slide9 ––>
|
196 | 194 | <section>
|
197 | 195 | <tr>
|
198 | 196 | <p>What is this smartNIC, anyway?</p>
|
199 |
| - <p>Is it an embedded linux, is it a linux mini server?</p> |
200 |
| - |
| 197 | + <p>Is it an embedded linux? Is it a linux mini server?</p> |
201 | 198 | </tr>
|
202 | 199 | <aside class="notes">
|
203 | 200 |
|
204 | 201 | What is this smartNIC, anyway? Is it an embedded linux, is it a linux mini server?
|
205 |
| -Would you want the possibility to say, in one go, configure all your smartNICs to PXE boot, so you can load a new linux on them. Then you need something like IPMI, we're talking rather something like a linux server here than an embedded linux. |
206 |
| - |
207 |
| -What security concerns are raised with introducing a smartNIC with linux running on it. I'm looking at you, Kim! |
208 |
| - |
| 202 | +<br> |
| 203 | +<br>If you see it as embedded linux, then to upgrade the "FW" you would run an agent on the compute to do this. Is that good enough if you have many such smartNICs in your deployment? |
| 204 | +<br> |
| 205 | +<br>Would you want the possibility to say, in one go, configure all your smartNICs to PXE boot, so you can load a new linux with ovs patches on them? |
| 206 | +<br>Then you need something like IPMI, hence it's something resembeling a linux server here than an embedded linux. |
| 207 | +<br> |
| 208 | +<br>What security concerns are raised with introducing a smartNIC with linux running on it. I'm looking at you, Kim! |
209 | 209 | </aside>
|
210 | 210 | </section>
|
211 | 211 | <!–– Slide9 ––>
|
|
218 | 218 | <aside class="notes">
|
219 | 219 |
|
220 | 220 | When should you use a smartNIC?
|
221 |
| -(Intel info) If on your host you use more than 4CPUs for OVS, then you should switch to using smartNICs, it makes sense from a business point of view. |
222 |
| -(Also, smartNIC is a good idea if you need low latency and don't care so much about migration) |
223 |
| -- advance to next slide - |
| 221 | +<br>(Intel info) If on your host you use more than 4CPUs for OVS, then you should switch to using smartNICs, it makes sense from a business point of view. |
| 222 | +<br>(Also, smartNIC is a good idea if you need low latency and don't care so much about migration) |
224 | 223 | </aside>
|
225 | 224 | </section>
|
226 | 225 | <!–– SlideX ––>
|
|
229 | 228 | <th><img src="pics/cpuisol1.png" width=90% height=90%></th>
|
230 | 229 | </tr>
|
231 | 230 | <aside class="notes">
|
232 |
| -Here you can see the CPUs allocated to ovs under nohz_full ( confirmed with /etc/default PMDs) |
233 |
| - |
234 |
| -Did you have this problem? Poor dimensioned compute (customer cannot aproximate how many VMs it will run on a compute to properly dimension mem for host OS to take be sufficent for device emulation), runs out of mem , oom killer takes down ovs and with it all VMs? with ovs in the NIC this will not happen, of course you need to properly dimension your system anyway. |
235 |
| - |
236 |
| -nohz_full=cpulist |
237 |
| -The nohz_full parameter is used to treat a list of CPUs differently, with respect to timer ticks. If a CPU is listed as a nohz_full CPU and there is only one runnable task on the CPU, then the kernel will stop sending timer ticks to that CPU, so more time may be spent running the application and less time spent servicing interrupts and context switching. |
238 |
| -https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/isolating_cpus_using_tuned-profiles-realtime |
239 |
| - |
240 |
| -2 for hostOS |
241 |
| -2 + threads for ovs |
| 231 | +Here comes an example where you can see the CPUs allocated to ovs under nohz_full ( confirmed with /etc/default/ovs-dpdk PMDs). If ovs is running on that CPU, the kernel will stop sending timer ticks to that CPU, running ovs rather than servicing interrupts and context switching. (but from my experience this works to a certain degree). |
| 232 | +<br> |
| 233 | +<br>Here's one interesting example: on a poor dimensioned compute (where you're running many VMs and did not take into account that you need to reserve resources for device emulation for each VM, *on the host OS side*), if you run out of mem , oom killer will take down your ovs and with it shutting down all your VMs? With ovs in the NIC this will not happen, but of course you still need to properly dimension your system!! |
| 234 | +<br> |
242 | 235 | </aside>
|
243 | 236 | </section>
|
244 | 237 |
|
|
248 | 241 | <th><img src="pics/cpuisol.png" width=99% height=99%></th>
|
249 | 242 | </tr>
|
250 | 243 | <aside class="notes">
|
251 |
| -2 for hostOS |
252 |
| -4 + threads for ovs |
| 244 | +Here comes another example for a high performance ovs compute, with more CPUs assigned to ovs. |
253 | 245 | </aside>
|
254 | 246 | </section>
|
255 | 247 |
|
|
263 | 255 | </tr>
|
264 | 256 |
|
265 | 257 | <aside class="notes">
|
266 |
| - Openstack working with SmartNICs |
267 |
| -Now the question is where are we in Openstack when it comes to integrating the wide range of smartNICs appearing on the market? |
268 |
| -Work is done in several Openstack projects, like ironic, nova and neutron of course. |
269 |
| - |
270 |
| -For instance when it comes to neutron, we need changes in the Neutron OVS driver and Neutron OVS agent in order to bind the Neutron port for the baremetal host with the smartNIC. |
271 |
| -This is needed so that neutron ovs agent can configure the OVS running on the smartNIC. |
272 |
| - |
273 |
| -We can have neutron ovs agent running locally on the smartNIC or remotely and manages the OVS bridges for all baremetal smartNICs. |
274 |
| - |
275 |
| -There are many interesting questions raised, like how do you know which smartNIC hostname belongs to which server. |
| 258 | +Now the question is where are we in Openstack when it comes to integrating the wide range of smartNICs popping up on the market? |
| 259 | +<br> |
| 260 | +<br>Work is done in several Openstack projects, like ironic, nova, neutron and cyborg. |
| 261 | +<br> |
| 262 | +<br>For instance when it comes to neutron, we need changes in the Neutron OVS driver and Neutron OVS agent in order to bind the Neutron port for the baremetal host with the smartNIC. |
| 263 | +<br>This is needed so that neutron ovs agent can configure the OVS running on the smartNIC. |
| 264 | +<br> |
| 265 | +<br>We can have neutron ovs agent running locally on the smartNIC or remotely and manages the OVS bridges for all baremetal smartNICs. (well, good luck with that) |
| 266 | +<br> |
| 267 | +<br> |
| 268 | +There are many interesting questions raised, like how do you know which smartNIC belongs to which server, in ironic we configure hostname and ssh keys |
276 | 269 | Ovs
|
277 | 270 |
|
278 | 271 | Ironic
|
|
497 | 490 | <br>
|
498 | 491 | <br>
|
499 | 492 | You can look at XDP and BPF with a dataplane/controlplane approach.
|
500 |
| -XDP is the dataplane which is inside the kernel |
501 | 493 |
|
502 | 494 | <br>
|
503 |
| -BPF is the control plane in user space, userspace load eBPF program. |
| 495 | +The dataplane is in the kernel and the control plane is in the userspace, where userspace load eBPF program. |
504 | 496 | <br>
|
505 | 497 | <br>
|
506 | 498 | (spanning tree daemon uses BPF to filter only BPDUs are comming on that socket)
|
| 499 | + |
| 500 | + |
| 501 | + |
507 | 502 |
|
508 | 503 | </aside>
|
509 |
| - </section> |
| 504 | + </section> |
510 | 505 | <!–– SlideX ––>
|
511 | 506 | <section>
|
512 |
| -<p>eBPF is a superpower</p> |
513 |
| - <aside class="notes"> |
514 |
| -What the heck is eBPF ? |
515 |
| -eBPF stands for "enhanced Berkeley Packet Filter" it's a linux kernel technology that is used by e.g. tcpdump and other analysis tools. |
516 |
| -eBPF is used to extract millions of metrics from the kernel and applications for troubleshooting purposes, deep monitoring or exploring running software. |
517 |
| -eBPF is basically like a superpower. |
518 |
| -BPF was initially used for tools like tcpdump but Alexei Starovoitov introduced eBPF to be used for things like to NATing, routing, doing what iptables does for example. |
| 507 | +<p>XDP_DROP</p> |
| 508 | +<p>XDP_PASS</p> |
| 509 | +<p>XDP_TX</p> |
| 510 | +<p>XDP_ABORTED</p> |
| 511 | +<p>XDP_REDIRECT</p> |
519 | 512 |
|
520 |
| - |
521 |
| - |
522 |
| -Jumping between kernel space and user space cost on performance - TO BE ADDED |
523 |
| -Mention eBPF with XDP kernel hook, DPDK, etc .. TO BE ADDED |
524 |
| -Native VirtIO driver benefits - TO BE ADDED |
525 |
| -- advance to next slide - |
| 513 | + <aside class="notes"> |
| 514 | +Thse are actions that can be executed on a packet |
| 515 | +<br><br> DROP is awesome, useful for DDOS, fb uses it heavily and cloudflare |
| 516 | +<br><br> PASS it lets the packet pass, after packet inspection, this does not do DPDK , light weight as you program it , lets the packet go back to stack |
| 517 | +<br><br> TX send it immedietly back out on the port that it was received, for load balancer cases for example |
| 518 | +<br><br> ABORTED basically drop but what is extra is that you will get some log about it , useful for debugging for sysadmin or developer |
| 519 | +<br><br> REDIRECT to another port, to other CPUs, you can modify headers on the packet (TX and REDIRECT similar to the DPDK ones) |
| 520 | +XDP DROP PASS TX .....fb user it for DDOS |
| 521 | +no support for jumbo frames in XDP |
526 | 522 | </aside>
|
527 | 523 | </section>
|
| 524 | + |
528 | 525 | <!–– SlideX ––>
|
529 | 526 | <section>
|
530 |
| -<p>SlideX</p> |
| 527 | +<p>eBPF is a superpower</p> |
531 | 528 | <aside class="notes">
|
532 |
| - |
| 529 | +What the heck is eBPF ? |
| 530 | +eBPF stands for "enhanced Berkeley Packet Filter" it's a linux kernel technology that is used by e.g. tcpdump and other analysis tools. |
| 531 | +eBPF is used to extract millions of metrics from the kernel and applications for troubleshooting purposes, deep monitoring or exploring running software. |
| 532 | +eBPF is basically like a superpower. |
| 533 | +BPF was initially used for tools like tcpdump but Alexei Starovoitov introduced eBPF to be used for things like to NATing, routing, doing what iptables does for example. |
533 | 534 | </aside>
|
534 | 535 | </section>
|
| 536 | + |
535 | 537 | <!–– SlideX ––>
|
536 | 538 | <section>
|
537 |
| -<p>SlideX</p> |
| 539 | + <tr> |
| 540 | + <th><img src="pics/bpf2.png" width=99% height=99%></th> |
| 541 | + </tr> |
538 | 542 | <aside class="notes">
|
539 |
| - |
| 543 | +This assembly instructions set, this is BPF. |
540 | 544 | </aside>
|
541 | 545 | </section>
|
542 | 546 | <!–– SlideX ––>
|
|
0 commit comments