The onet library allows for multiple levels of simulations:
- Localhost:
- up to 100 nodes
- Mininet:
- up to 300 nodes on a 48-core machine, multiplied by the number of machines available
- define max. bandwidth and delay for your network
- Deterlab:
- up to 1000 nodes on a strong machine, multiplied by the number of machines available
Refer to the simulation-examples in one of the following places:
Each simulation can have one or more .toml-files that describe a number of experiments to be run on localhost or deterlab.
The .toml-files are split in two parts, separated by an empty line. The first part consists of one or more 'global' variables that describe all experiments.
The second part starts with a line of variables that have to be defined for each experiment, where each experiment makes up one line.
Simulation- what simulation to runHosts- how many hosts to instantiate - this corresponds to the nodes that will be running and available in the mainRosterServers- how many servers to use maximum - if less than this number of servers are available, a warning will be printed, but the simulation will still be run
The Servers will mostly influence how the simulation will be run.
Depending on the platform, this will be handled differently:
localhost-Serversis ignored hereDeterlab- the system will distribute theHostsnodes over the available servers, but not over more thanServers. This allows for running simulations that are smaller than your whole DETERLab experiment without having to modify and restart the experiment.Mininet- as inDeterlab, theHostsnodes will be distributed over a maximum ofServers.
The standard simulation (and the only one implemented) is the
SimulationBFTree, which will prepare the Roster and the Tree for the
simulation.
Even if you use the SimulationBFTree, you're not restricted to use only the
prepared Tree.
However, there will not be more nodes available than the ones in the prepared
Roster.
Some restrictions apply when you're using the Deterlab simulation:
- all nodes on one server (
Hosts/ min(available servers,Servers)) are run in one binary, which means- bandwidth measurements cover all the nodes
- time measurements need to make sure no other calculations are taking place
- the bandwidth- and delay-restrictions only apply between two physical servers, so
- the simulation makes sure that all connected nodes in the
Treeare always on different servers. If you use another communication than the one in theTree, this will mean that the system cannot guarantee that the communication is restricted - the bandwidth restrictions apply to the sum of all communications between
two servers, so to a number of hosts
If you want to have a bandwidth restriction that is between all nodes, and
Hosts > Servers, you have to use theMininetplatform, which doesn't have this restriction.
- the simulation makes sure that all connected nodes in the
The following variables define how the original Tree is calculated - only
one of the two should be given:
BF- branching factor: how many children each node hasDepth- the depth of the tree in levels below the root-node
If there are 13 Hosts with a BF of 3, the system will create a complete
tree with the root-node having 3 children, and each of the children having 3
more children.
The same setup can be achieved with 13 Hosts and a Depth of 3.
If the tree to be created is not complete, it will be filled breath-first and the children of the last row will be distributed as evenly as possible.
In addition, Rounds defines how many rounds the simulation will run.
Buckets of statistics can be defined using the following variable:
Buckets- indices range of the buckets
The parameter is a string where the buckets are separated with spaces and the ranges
by a dash (e.g. Buckets = "0:5 5:10-15:20" that will create a bucket with hosts 0
to 4 and another one with hosts 5 to 9 and 15 to 19). Range indices can be compared
to Go slices so that the lower index is inclusive and the higher is exclusive.
A file will be written per bucket and the global one containing the statistics of all the conodes will always be present independently from the parameter. Each file will have the bucket number as suffix.
Per default, all rounds of an individual simulation-run will be averaged and
written to the csv-file. If you set IndividualStats to a non-""-value,
every round will create a new line. This is useful if you have a simulation
with a long setup-time and you want to do multiple measurements for the same
setup.
Timeouts are parsed according to Go's time.Duration: A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
Two timeout variables are available:
RunWait- how many seconds to wait for a run (one line of .toml-file) to finish (default: 180s)ExperimentWait- how many seconds to wait for the while experiment to finish (default: RunWait * #Runs)
If you need to run a script before the simulation is started (like installing a missing library), you can define
PreScript- a shell-script that is run before the simulation is started on each machine. It receives a single argument: the platform this simulation runs: [localhost,mininet,deterlab]
Mininet has support for setting up delays and bandwidth for each simulation. You can use the following two variables:
Delay[ms] - the delay between two hosts - the round-trip delay will be the double of thisBandwidth[Mbps] - the bandwidth in both sending and receiving direction for each host, measured in mega bits per second
You can put these variables either globally at the top of the .toml file or set them up for each line in the experiment (see the exapmles below).
SingleHost- which will reduce the tree to use only one host per server, and thus speeding up connections againTags- build-tags that will be called when building the binaries for the simulation
Simulation = "ExampleHandlers"
Servers = 16
BF = 2
Rounds = 10
#SingleHost = true
Hosts
3
7
15
31
This will run the ExampleHandlers-simulation on 16 servers with a branching
factor of 2 and 10 rounds. The SingleHost-argument is commented out, so it
will use as many hosts as described.
In the second part, 4 experiments are defined, each only changing the number
of Hosts. First 3, then 7, 15, and finally 31 hosts are run on the 16
servers. For each experiment 10 rounds are started.
Assuming the simulation runs on MiniNet, the network delay can be set globally as follows:
Simulation = "ExampleHandlers"
Delay = 100
Servers = 16
BF = 2
Rounds = 10
#SingleHost = true
Hosts
3
7
15
31
Alternatively, it can be set for each individual experiment:
Simulation = "ExampleHandlers"
Servers = 16
BF = 2
Rounds = 10
#SingleHost = true
Hosts,Delay
3,50
7,100
15,200
31,400
Every simulation will be written to the test_data directory with the name
of the simulation file as base and a .csv applied.
The configuration of the simulation file is written to the tables in the
following columns, which are copied as-is from the simulation file:
- hosts, bf, delay, depth, other, prescript, ratio, rounds, servers, suite
For all the other measurements, the following statistics are available:
_avg- the average_std- standard-deviation_min- minimum_max- maximum_sum- sum of all calls
The following measurements will be taken for measure.NewTimeMeasure:
_user- user-space time, crypto and other calculations_system- system-space time - disk i/o network i/o_wall- wall-clock, as described above
The measurements are given in seconds.
There is an important difference in the _wall and the _user/_system
measurements: the _wall measurements indicate how much time an external
observer would have measured.
So if the system waits for a reply of the network, this waiting time is
included in the measurement.
Contrary to this, the _user/_system measures how much work has been done
by the CPU during the measurement.
When measuring parallel execution of code, it is possible that the
_user/_system measurements are bigger than the _wall measurements
, because more than one CPU participated in the calculation.
The difference in _user/_system is explained for example here:
https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
The _wall corresponds to the real in this comment.
There are some standard time measurements done by the simulation:
ChildrenWait- how long the system had to wait for all children to be available - might show problems in setting up the serversSimulSyncWait- how long the system had to wait at the end of the simulation - might indicate problems in the wrap-up of the simulation
If you want to measure bandwidth, you can use measure.NewCounterIOMeasure.
But you have to be careful to make sure that the system will not include
traffic that is outside of your scope by putting the .Record() as close as
possible to the NewCounterIOMeasure.
Every CounterIOMeasure has the following statistics:
_tx- transmission-bytes_rx- bytes received_msg_tx- packets transmitted_msg_rx- packets received
Plus the standard modifiers (_avg, _std, ...).
There are two standard measurements done by every simulation:
bandwidth(empty) - all node bandwidthbandwidth_root- bandwidth of the first node of the roster