11# Monitoring
2+ [ ![ travis-ci] ( https://api.travis-ci.org/AliceO2Group/Monitoring.svg?branch=master )] ( https://travis-ci.org/AliceO2Group/Monitoring )
3+ [ ![ aliBuild] ( https://img.shields.io/badge/aliBuild-dashboard-lightgrey.svg )] ( https://alisw.cern.ch/dashboard/d/000000001/main-dashboard?orgId=1&var-storagename=All&var-reponame=All&var-checkname=build%2FMonitoring%2Fo2-dataflow%2F0&var-upthreshold=30m&var-minuptime=30 )
4+ [ ![ JIRA] ( https://img.shields.io/badge/JIRA-issues-blue.svg )] ( https://alice.its.cern.ch/jira/projects/OMON )
5+
26Monitoring module allows to inject user defined metrics and monitor the process itself. It supports multiple backends, protocols and data formats.
37
48## Table of contents
591 . [ Installation] ( #installation )
6102 . [ Getting started] ( #getting-started )
7113 . [ Features and additional information] ( #features-and-additional-information )
8- 3 . [ Code snippets] ( #code-snippets )
9- 4 . [ System monitoring and server-side backends installation and configuration] ( #system-monitoring-server-side-backends-installation-and-configuration )
12+ 4 . [ Code snippets] ( #code-snippets )
13+ 5 . [ System monitoring and server-side backends installation and configuration] ( #system-monitoring-server-side-backends-installation-and-configuration )
1014
1115## Installation
1216### RPM (CentOS 7 only)
13- <details >
14- <summary ><strong >Click here if you don't have <i >allsw</i > repo configured</strong ></summary >
15- <br >
16-
17- + Install ` CERN-CA-certs ` package (required by ` alisw ` repo) ** (as root)**
17+ + Install CERN certificates
1818~~~
1919yum -y install CERN-CA-certs
2020~~~
@@ -29,12 +29,10 @@ enabled=1
2929gpgcheck=0
3030EOF
3131~~~
32- </details >
33- <br >
3432
3533+ Install Monitoring RPM package ** (as root)**
3634~~~
37- yum -y install alisw-Monitoring+v1.5.0 -1.x86_64
35+ yum -y install alisw-Monitoring+v1.5.4 -1.x86_64
3836~~~
3937
4038+ Configure Modules
@@ -44,9 +42,9 @@ export MODULEPATH=/opt/alisw/el7/modulefiles:$MODULEPATH
4442
4543+ Load enviroment
4644~~~
47- eval `modulecmd bash load Monitoring/v1.5.0 -1`
45+ eval `modulecmd bash load Monitoring/v1.5.4 -1`
4846~~~
49- The installation directory is: ` /opt/alisw/el7/Monitoring/v1.5.0 -1 `
47+ The installation directory is: ` /opt/alisw/el7/Monitoring/v1.5.4 -1 `
5048
5149### aliBuild
5250<strong >[ Click here if you don't have aliBuild installed] ( https://alice-doc.github.io/alice-analysis-tutorial/building/ ) </strong >
@@ -77,6 +75,7 @@ Manual installation of the O<sup>2</sup> Monitoring module.
7775+ [ ApMon] ( http://monalisa.caltech.edu/monalisa__Download__ApMon.html ) (optional)
7876
7977#### Monitoring module compilation
78+
8079~~~
8180git clone https://github.com/AliceO2Group/Monitoring.git
8281cd Monitoring; mkdir build; cd build
@@ -105,6 +104,7 @@ See table below to find out how to create `URI` for each backend:
105104| Flume | UDP | `flume` | - |
106105
107106### Sending metric
107+
108108```cpp
109109send(Metric&& metric)
110110```
@@ -129,6 +129,8 @@ monitoring->send(Metric{10, "myMetric"}.addTags({{"tag1", "value1"}, {"tag2", "v
129129monitoring->send(Metric{10, "myCrazyMetric"}.setTimestamp(timestamp));
130130```
131131
132+ ## Features and additional information
133+
132134### Grouped values
133135It's also possible to send multiple, grouped values in a single metric (` Flume ` and ` InfluxDB ` backends are supproted, others fallback into sending values in seperate metrics)
134136``` cpp
@@ -140,7 +142,7 @@ For example:
140142monitoring->sendGroupped("measurementName", {{20, "myMetricIntMultiple"}, {20.30, "myMetricFloatMultple"}});
141143```
142144
143- ## Buffering metrics
145+ ### Buffering metrics
144146In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment.
145147This feature can be operated with following two methods:
146148``` cpp
@@ -161,7 +163,6 @@ monitoring->send({20, "myMetricInt2"});
161163monitoring->flushBuffer();
162164```
163165
164- ## Features and additional information
165166### Metrics
166167Metrics consist of 4 parameters: name, value, timestamp and tags.
167168
@@ -208,190 +209,8 @@ Code snippets are available in [examples](examples/) directory.
208209## System monitoring, server-side backends installation and configuration
209210This guide explains manual installation. For `ansible` deployment see [AliceO2Group/system-configuration](https://gitlab.cern.ch/AliceO2Group/system-configuration/tree/master/ansible) gitlab repo.
210211
211- ### collectD
212- + Install collectd package **(as root)**
213- ~~~
214- yum -y install collectd
215- ~~~
216-
217- + Edit configuration file: `/etc/collectd.conf`**(as root)**
218- ~~~
219- Interval 10
220- Include "/etc/collectd.d"
221- ~~~
222-
223- + Configure `network` write plugin: `/etc/collectd.d/network.conf` in order to push metrics to InfluxDB instance. Replace `<influxdb-host>` with InfluxDB hostname. **(as root)**
224- ~~~
225- LoadPlugin network
226- <Plugin network>
227- Server "<influxdb-host>" "25826"
228- </Plugin>
229- ~~~
230-
231- + Configure `cpu` module: `/etc/collectd.d/cpu.conf` **(as root)**
232- ~~~
233- LoadPlugin cpu
234- <Plugin cpu>
235- ReportByCpu true
236- ReportByState true
237- ValuesPercentage true
238- </Plugin>
239- ~~~
240-
241- + Configure `disk` plugin: `/etc/collectd.d/disk.conf` **(as root)**
242- ~~~
243- LoadPlugin disk
244- <Plugin disk>
245- Disk "/[hs]d[a-f][0-9]?$/"
246- IgnoreSelected false
247- UseBSDName false
248- UdevNameAttr "DEVNAME"
249- </Plugin>
250- ~~~
251-
252- + Configure `interface` plugin: `/etc/collectd.d/interface.conf` **(as root)**
253- ~~~
254- LoadPlugin interface
255- ~~~
256-
257- + Configure `load` plugin: `/etc/collectd.d/load.conf` **(as root)**
258- ~~~
259- LoadPlugin interface
260- ~~~
261-
262- + Configure `memory` plugin: `/etc/collectd.d/memory.conf` **(as root)**
263- ~~~
264- LoadPlugin memory
265- ~~~
266-
267- + Configure `uptime` plugin: `/etc/collectd.d/uptime.conf` **(as root)**
268- ~~~
269- LoadPlugin uptime
270- ~~~
271-
272- + Start collectd **(as root)**
273- ~~~
274- systemctl start collectd.service
275- systemctl enable collectd.service
276- ~~~
277-
278- ### InfluxDB
279- + Add `influxdb` repo **(as root)**
280- ~~~
281- cat > /etc/yum.repos.d/influxdb.repo <<EOF
282- [influxdb]
283- name = InfluxDB Repository - RHEL \$releasever
284- baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
285- enabled = 1
286- gpgcheck = 1
287- gpgkey = https://repos.influxdata.com/influxdb.key
288- EOF
289- ~~~
290-
291- + Install InfluxDB package **(as root)**
292- ~~~
293- yum -y install influxdb collectd
294- ~~~
295-
296- + Add UDP endpoint for application related metrics by editing configuration file `/etc/influxdb/influxdb.conf` with database name `test` and UDP port number `8088`. **(as root)**
297- ~~~
298- [[udp]]
299- enabled = true
300- bind-address = ":8088"
301- database = "test"
302- batch-size = 5000
303- batch-timeout = "1s"
304- batch-pending = 100
305- read-buffer = 8388608
306- ~~~
307-
308- + Add an endpoint for `collectd` **(as root)**
309- ~~~
310- [[collectd]]
311- enabled = true
312- bind-address = ":25826"
313- database = "system-monitoring"
314- typesdb = "/usr/share/collectd/types.db"
315- ~~~
316-
317- + Open UDP port `25826` and `8088` **(as root)**
318- ~~~
319- firewall-cmd --zone=public --permanent --add-port=8088/udp
320- firewall-cmd --zone=public --permanent --add-port=25826/udp
321- firewall-cmd --reload
322- ~~~
323-
324- + Start InfluxDB **(as root)**
325- ~~~
326- systemctl start influxdb
327- ~~~
328-
329- + Create database `test` and `system-monitoring`
330- ~~~
331- influx
332- influx> create database test
333- influx> create database system-monitoring
334- ~~~
335- More details available at [InfluxDB page](https://docs.influxdata.com/influxdb/v1.2/introduction/installation/).
336-
337- ### Flume
338- + Install Java **(as root)**
339- ~~~
340- yum -y install java
341- ~~~
342-
343- + Download [latest release](http://www-eu.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz) of Apache Flume
344-
345- + Unpack file
346- ~~~
347- tar -xvzf apache-flume-1.7.0-bin.tar.gz
348- ~~~
349-
350- + Install custom source and/or sink from [MonitoringCustomComponents repo]( https://github.com/AliceO2Group/MonitoringCustomComponents).
351- Adjust configuration file according to source/sink instructions. The sample configuration file is available in `conf/flume-conf.properties.template`.
352-
353- + Launch Flume using following command:
354- ~~~
355- $ bin/flume-ng agent -n <agent-name> -c conf -f conf/<flume-confing>
356- ~~~
357- Set correct `<agent-name>` and `<flume-confing>` name.
358-
359- See [Flume User Guide](https://flume.apache.org/FlumeUserGuide.html) documentation for more details.
360-
361- ### Grafana
362-
363- + Add Grafana repo **(as root)**
364- ~~~
365- curl -s https://packagecloud.io/install/repositories/grafana/stable/script.rpm.sh | bash
366- ~~~
367-
368- + Install Grafana package **(as root)**
369- ~~~
370- yum -y install grafana
371- ~~~
372-
373- + Open port 3000 **(as root)**
374- ~~~
375- firewall-cmd --zone=public --add-port 3000/tcp --permanent
376- firewall-cmd --reload
377- ~~~
378-
379- + Change default `admin_user` and `admin_password`: `/etc/grafana/grafana.ini`. **(as root)**
380-
381- See more regarding configuration file in the official documentation: http://docs.grafana.org/installation/configuration/
382-
383- + (Enable SSL)
384- + Set protocol to `https`, `ssl_mode` to `skip-verify` in configuration file
385- + Generate private key and certificate via [CERN Certification Authority](https://ca.cern.ch/ca/host/HostCertificates.aspx)
386- + Set `cert_file` and `cert_key` value in configuration file
387-
388- + (Configure LDAP-based login: `/etc/grafana/ldap.toml`)
389- See official documentation at the Grafana webpage: http://docs.grafana.org/installation/ldap/
390-
391- + Start Grafana **(as root)**
392- ~~~
393- systemctl start grafana-server
394- ~~~
395-
396- ### MonALISA Service
397- Follow official [MonALISA Service Installation Guide](http://monalisa.caltech.edu/monalisa__Documentation__Service_Installation_Guide.html).
212+ + [Collectd](doc/collectd.md)
213+ + [Flume](doc/flume.md)
214+ + [InfluxDB](doc/influxdb.md)
215+ + [Grafana](doc/grafana.md)
216+ + [MonALISA](http://monalisa.caltech.edu/monalisa__Documentation__Service_Installation_Guide.html) (external link)
0 commit comments