|
1 | 1 | Channel Access and Other Protocols
|
2 | 2 | ==================================
|
3 | 3 |
|
4 |
| -explanations of the challenges and solutions |
| 4 | +Explanations of the challenges and solutions to routing protocols to and |
| 5 | +from IOCs running under Kubernetes. |
| 6 | + |
| 7 | +Container Network Interface |
| 8 | +--------------------------- |
| 9 | + |
| 10 | +A Kubernetes cluster will have a CNI (Container Network Interface) that |
| 11 | +provides some form of virtual network within which Pods communicate. |
| 12 | + |
| 13 | +For a useful discussion of this subject see `Kubernetes CNI providers`_ |
| 14 | + |
| 15 | +At DLS the Argus cluster uses Weave. |
| 16 | + |
| 17 | +In order to connect to a Pod from outside of the cluster you must configure |
| 18 | +a Service. A Service can provide an external IP and port to external clients |
| 19 | +and will typically load balance between multiple instances of a given Pod. |
| 20 | + |
| 21 | +In the case of IOCs we only run a single instance but would still normally be |
| 22 | +required to configure a service to proxy a connection to our IOC. |
| 23 | + |
| 24 | +The service provides Network Address Translation and routes packets to and |
| 25 | +from the Pod and the external client. |
| 26 | + |
| 27 | +Typically CNIs do not support broadcast traffic within their virtual LAN. |
| 28 | + |
| 29 | + |
| 30 | +.. _Kubernetes CNI providers: https://rancher.com/blog/2019/2019-03-21-comparing-kubernetes-cni-providers-flannel-calico-canal-and-weave/ |
| 31 | + |
| 32 | +Problems with CNI |
| 33 | +----------------- |
| 34 | + |
| 35 | +The following two behaviours for network protocols are not suitable for use |
| 36 | +between an external client and a kubernetes Pod: |
| 37 | + |
| 38 | +- use of broadcast packets |
| 39 | +- negotiating an ephemeral port in the application layer (NAT cannot route to |
| 40 | + a such a port since it looks like a new connection) |
| 41 | + |
| 42 | +When prototyping IOCs in Kubernetes we found that the following protocols |
| 43 | +had issues for the above reasons: |
| 44 | + |
| 45 | +- Channel Access |
| 46 | +- Process Variable Access |
| 47 | +- GVSP (Gige Vision Streaming Protocol) |
| 48 | + |
| 49 | +Initially we looked into workarounds to these issues. For example the |
| 50 | +diagram below shows a 'ca-forwarder' that sits on the EPICS client subnet |
| 51 | +and forwards requests to IOCs in the cluster. |
| 52 | + |
| 53 | +.. image:: ../images/caforwarder.png |
| 54 | + :width: 1500px |
| 55 | + :align: center |
| 56 | + |
| 57 | +However this 2nd diagram shows why this approach fails when the client is in |
| 58 | +the cluster itself. |
| 59 | + |
| 60 | + |
| 61 | +.. image:: ../images/cabackwarder.png |
| 62 | + :width: 1500px |
| 63 | + :align: center |
| 64 | + |
| 65 | +The conclusion of this study was that workarounds were fiddly and needed to be |
| 66 | +implemented on a per protocol basis, plus there is no guarantee that there |
| 67 | +is a solution for all protocols we will need. |
| 68 | + |
| 69 | +Solution - hostNetwork |
| 70 | +---------------------- |
| 71 | +To get round these issues and all possible future network issues we: |
| 72 | + |
| 73 | +- Use remote worker nodes that sit in the beamline subnet |
| 74 | +- We use hostNetwork=true which bypasses the CNI and gives Pods direct access |
| 75 | + to the host node's network |
| 76 | + |
| 77 | +This means that, from a networking perspective, all IOCs have identical |
| 78 | +status to the traditional IOCs running on beamline servers. When a container |
| 79 | +listens on a port it is listening on the IP address of its host and can |
| 80 | +receive broadcasts. It can also open new ephemeral ports and a client that |
| 81 | +knows the port number can connect because no NAT is in the way. |
| 82 | + |
| 83 | +The downside of this approach is that Pods need elevated privileges in order |
| 84 | +to be allowed to use hostNetwork. At DLS the K8S team has implemented a |
| 85 | +set of restrictions that mitigate this issue. See `argus` for details |
| 86 | +of the remote worker nodes and suggestions for secure configuration. |
5 | 87 |
|
6 |
| -**TODO** |
|
0 commit comments