|
| 1 | +# perfSONAR Deployment Options |
| 2 | + |
| 3 | +The primary motivation for perfSONAR deployment is to test isolation, i.e. only one end-to-end test should run on a host |
| 4 | +at a time. This ensures that the test results are not impacted by the other tests. Otherwise it is much more difficult |
| 5 | +to interpret test results, which may vary due to host effects rather then network effects. Taking this into account it |
| 6 | +means that perfSONAR measurement tools are much more accurate running on a dedicated hardware and while it may be useful |
| 7 | +to run them on other hosts such as Data Transfer Nodes the current recommendation is to have specific measurement |
| 8 | +machine. In addition, as bandwidth testing could impact latency testing, we recommend to deploy two different nodes, |
| 9 | +each focused on specific set of tests. The following deployment options are currently available: |
| 10 | + |
| 11 | +* **Bare metal** - preffered option in one of two possible configurations: |
| 12 | + |
| 13 | +```text |
| 14 | + |
| 15 | +* Two bare metal servers, one for latency node, one for bandwidth node |
| 16 | + |
| 17 | +* One bare metal server running both latency and bandwidth node together provided that there are two NICs available, please refer to dual NIC section for more details on this. |
| 18 | + |
| 19 | +``` |
| 20 | + |
| 21 | +* **Virtual Machine** - if bare metal is not available then it is also possible to run perfSONAR on a VM, however there are a set of additional requirements to fulfill: |
| 22 | + |
| 23 | +```text |
| 24 | + |
| 25 | +* Full-node VM is strongly preferred, having 2 VMs (latency/bandwidth node) on a single bare metal. Mixing perfSONAR VM(s) with others might have an impact on the measurements and is therefore not recommended. |
| 26 | +* VM needs to be configured to have SR-IOV to NIC(s) as well as pinned CPUs to ensure bandwidth tests are not impacted (by hypervisor switching CPUs during the test) |
| 27 | +* Succesfull full speed local bandwidth test is highly recommended prior to putting the VM into production |
| 28 | +``` |
| 29 | + |
| 30 | +* **Container** - perfSONAR has supported containers from version 4.1 (Q1 2018) and is documented at <https://docs.perfsonar.net/install_docker.html> but is not typically used in the same way as a full toolkit installation. |
| 31 | + |
| 32 | +```text |
| 33 | + |
| 34 | +* Docker perfSONAR test instance can however still be used by sites that run multiple perfSONAR instances on site for their internal testing as this deployment model allows to flexibly deploy a testpoint which can send results to a local measurement archive running on the perfSONAR toolkit node. |
| 35 | + |
| 36 | +``` |
| 37 | + |
| 38 | +## perfSONAR Toolkit vs Testpoint |
| 39 | + |
| 40 | +The perfSONAR team has documented the types of installations supported at |
| 41 | +<https://docs.perfsonar.net/install_options.html>. With the release of version 5, OSG/WLCG sites have a new option: |
| 42 | +instead of installing the full Toolkit sites can choose to install the Testpoint bundle. |
| 43 | + |
| 44 | +* Pros |
| 45 | + |
| 46 | +```text |
| 47 | +* Simpler deployment when a local web interface is not needed and a central measurement archive is available. |
| 48 | +* Less resource intensive for both memory and I/O capacity. |
| 49 | +``` |
| 50 | + |
| 51 | +* Cons |
| 52 | + |
| 53 | +```text |
| 54 | +* Measurements are not stored locally |
| 55 | +* No web interface to use for configuration or adding local tests |
| 56 | +* Unable to show results in MaDDash |
| 57 | + |
| 58 | +``` |
| 59 | + |
| 60 | +While sites are free to choose whatever deployment method they want, we would like to strongly recommend the use of |
| 61 | +perfSONAR's containerized testpoint. This method was chosen as a "best practice" recommendation because of the reduced |
| 62 | +resource constraints, less components and easier management. |
| 63 | + |
| 64 | +### perfSONAR Hardware Requirements |
| 65 | + |
| 66 | +There are two different nodes participating in the network testing, latency node and bandwidth node, while both are |
| 67 | +running on the exact same perfSONAR toolkit, they have very different requirements. Bandwidth node measures available |
| 68 | +(or peak) throughput with low test frequency and will thus require NIC with high capacity (1/10/40/100G are supported) |
| 69 | +as well as enough memory and CPU to support high bandwidth testing. Our recommendation is to match bandwidth node NIC |
| 70 | +speed with the one installed on the storage nodes as this would provide us with the best match when there are issues to |
| 71 | +investigate. In case you'd like to deploy high speed (100G) bandwidth node, please consult [ESNet tuning |
| 72 | +guide](https://fasterdata.es.net/host-tuning/100g-tuning/) and [100G tuning |
| 73 | +presentation](https://www.es.net/assets/Uploads/100G-Tuning-TechEx2016.tierney.pdf). Latency node on the other hand runs |
| 74 | +low bandwidth, but high frequency tests, sending a continuous stream of packets to measure delay and corresponding |
| 75 | +packet loss, packet reordering, etc. This means that while it doesn't require high capacity NIC, 1G is usually |
| 76 | +sufficient, it can impose significant load on the IO to disk as well as CPU as many tests run in parallel and need to |
| 77 | +continuously store its results into local measurement archive. The minimum hardware requirements to run perfSONAR |
| 78 | +toolkit are documented [here](http://docs.perfsonar.net/install_hardware_details.html). For WLCG/OSG deployment and |
| 79 | +taking into account the amount of testing that we perform, we recommend at least the following for perfSONAR 5.0+: |
| 80 | + |
| 81 | +* NIC for bandwidth node matching the capacity of the site storage nodes(10/25/40/100G), 1G NIC for latency node (for higher NIC capacities, 40/100G, please check [ESNet tuning guide](https://fasterdata.es.net/host-tuning/100g-tuning/)) |
| 82 | + |
| 83 | +* High clock speede CPU (3.0 Ghz+), fwere cores OK, with at least 32GB+ of RAM (8GB+ if using a Testpoint install) |
| 84 | + |
| 85 | +* NVMe or SSD disk (128GB should be sufficient) if using full Toolkit install with Opensearch. |
| 86 | + |
| 87 | +<!-- anchor removed; heading provides an automatic id --> |
| 88 | + |
| 89 | +### Multiple NIC (Network Interface Card) Guidance |
| 90 | + |
| 91 | +Many sites would prefer **not** to have to deploy two servers for cost, space and power reasons. Since perfSONAR 3.5+ |
| 92 | +there is a way to install both latency and bandwidth measurement services on a single node, as long as it has at least |
| 93 | +two NICs (one per 'flavor' of measurement) and sufficient processing power and memory. There are few additional steps |
| 94 | +required in order to configure the node with multiple network cards: |
| 95 | + |
| 96 | +* Please setup source routing as described in the [official documentation](http://docs.perfsonar.net/manage_dual_xface.html). |
| 97 | + |
| 98 | +* You'll need to register two hostnames in [OIM](installation.md)/[GOCDB](installation.md) (and have two reverse DNS entries) as you would normally for two separate nodes. |
| 99 | + |
| 100 | +* Instead of configuring just one auto-URL in for the remote URL, please add both, so you'll end up having something like this: |
| 101 | + |
| 102 | +```bash |
| 103 | +psconfig remote add "https://psconfig.opensciencegrid.org/pub/auto/<FQDN_latency>" |
| 104 | +psconfig remote add "https://psconfig.opensciencegrid.org/pub/auto/<FQDN_throughput>" |
| 105 | +... |
| 106 | +``` |
0 commit comments