Skip to content

Commit 52d09ea

Browse files
committed
Clickhouse - introduce docs
1 parent 6f7f8df commit 52d09ea

File tree

3 files changed

+337
-0
lines changed

3 files changed

+337
-0
lines changed

README.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ network interface and a port. Multiple instances of these plugins can run concur
6464
format for long-term preservation
6565
- `UniRec <extra_plugins/output/unirec>`_ (*) - send flow records in UniRec format
6666
via TRAP communication interface (into Nemea modules)
67+
- `ClickHouse <extra_plugins/output/clickhouse>`_ (*) - insert flow records
68+
into a ClickHouse database
6769

6870
\* Must be installed individually due to extra dependencies
6971

Lines changed: 315 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
clickhouse (output plugin)
2+
===========================
3+
4+
The plugin converts and stores IPFIX flow records into a ClickHouse database.
5+
It is designed for high-performance environments, ensuring efficient data
6+
handling and storage.
7+
8+
Key Features
9+
------------
10+
11+
- **High-Speed Export**: The conversion and export of data to the database are
12+
performed at the binary level (i.e., without conversion to text SQL
13+
commands). This enables export speeds of hundreds of thousands or even a
14+
million flow records per second (*performance depends on machine
15+
configuration*).
16+
17+
- **Customizable Data Mapping**: Users can configure the plugin to send any
18+
IPFIX/NetFlow items to the database by specifying the item name and mapping
19+
it to the corresponding database column.
20+
21+
- **High Availability (HA) Support**: The plugin supports sending data to
22+
multiple ClickHouse endpoints. In case of a failure at one endpoint, data is
23+
automatically redirected to the next available endpoint. This failover
24+
mechanism ensures reliability in HA deployments. Note: This is not a
25+
round-robin distribution but a failover strategy.
26+
27+
How to build
28+
------------
29+
30+
By default, the plugin is not distributed with IPFIXcol due to extra dependencies.
31+
To build the plugin, IPFIXcol2 (and its header files) must be installed on your system.
32+
33+
Finally, compile and install the plugin:
34+
35+
.. code-block:: sh
36+
37+
$ mkdir build && cd build && cmake ..
38+
$ make
39+
# make install
40+
41+
Usage
42+
------
43+
44+
The plugin expects the ClickHouse database to already contain the table with
45+
appropriate schema corresponding to the configuration entered. The existence
46+
and schema of the table is checked after initiating connection to the database
47+
and an error is displayed if there is a mismatch. The table is not
48+
automatically created.
49+
50+
To run the example configuration below, you can create the ClickHouse table
51+
using the following SQL query:
52+
53+
.. code-block:: sql
54+
55+
CREATE TABLE ipfixcol2.flows (
56+
odid UInt32,
57+
srcip IPv6,
58+
dstip IPv6,
59+
flowstart DateTime64(9),
60+
flowend DateTime64(9),
61+
sourceTransportPort UInt16,
62+
destinationTransportPort UInt16,
63+
protocolIdentifier UInt8,
64+
octetDeltaCount UInt64,
65+
packetDeltaCount UInt64,
66+
INDEX srcipindex srcip TYPE bloom_filter GRANULARITY 16,
67+
INDEX dstipindex dstip TYPE bloom_filter GRANULARITY 16
68+
)
69+
ENGINE = MergeTree
70+
PARTITION BY toStartOfInterval(flowstart, INTERVAL 1 HOUR)
71+
ORDER BY flowstart
72+
73+
The following ClickHouse column types are expected to store the following IPFIX element types:
74+
75+
.. list-table:: Mapping of IPFIX types to ClickHouse column types
76+
77+
* - **IPFIX Abstract Data Type**
78+
- **ClickHouse Column Type**
79+
* - unsigned8
80+
- UInt8
81+
* - unsigned16
82+
- UInt16
83+
* - unsigned32
84+
- UInt32
85+
* - unsigned64
86+
- UInt64
87+
* - signed8
88+
- Int8
89+
* - signed16
90+
- Int16
91+
* - signed32
92+
- Int32
93+
* - signed64
94+
- Int64
95+
* - ipv4Address
96+
- IPv4
97+
* - ipv6Address
98+
- IPv6
99+
* - macAddress
100+
- UInt64
101+
* - dateTimeNanoseconds
102+
- DateTime64(9)
103+
* - dateTimeMicroseconds
104+
- DateTime64(6)
105+
* - dateTimeMilliseconds
106+
- DateTime64(3)
107+
* - dateTimeSeconds
108+
- DateTime
109+
* - string
110+
- String
111+
112+
In case the field is an alias mapping to multiple IPFIX elements of compatible
113+
types, the resulting type is unified to the type with higher precision, i.e.
114+
the type that can hold both of the values without data loss. To unify IPv4 and
115+
IPv6 addresses as one type, the IPv4 is stored as an IPv6 value as a IPv4
116+
mapped IPv6 address.
117+
118+
To be able to retrieve the stored values in a human-friendly format, the
119+
following ClickHouse functions can be defined and used:
120+
121+
.. code-block:: sql
122+
123+
CREATE FUNCTION ipToString AS (ip) ->
124+
if(isIPAddressInRange(toString(ip), '::ffff:0.0.0.0/96'), toString(toIPv4(ip)), toString(ip));
125+
126+
CREATE FUNCTION macToString AS (mac) ->
127+
concat(
128+
lpad(hex(bitAnd(bitShiftRight(mac, 40), 0xFF)), 2, '0'), ':',
129+
lpad(hex(bitAnd(bitShiftRight(mac, 32), 0xFF)), 2, '0'), ':',
130+
lpad(hex(bitAnd(bitShiftRight(mac, 24), 0xFF)), 2, '0'), ':',
131+
lpad(hex(bitAnd(bitShiftRight(mac, 16), 0xFF)), 2, '0'), ':',
132+
lpad(hex(bitAnd(bitShiftRight(mac, 8), 0xFF)), 2, '0'), ':',
133+
lpad(hex(bitAnd(mac, 0xFF)), 2, '0')
134+
);
135+
136+
137+
Example configuration
138+
---------------------
139+
140+
.. code-block:: xml
141+
142+
<output>
143+
<name>ClickHouse output</name>
144+
<plugin>clickhouse</plugin>
145+
<params>
146+
<connection>
147+
<endpoints>
148+
<!-- One or more ClickHouse databases (endpoints) -->
149+
<endpoint>
150+
<host>clickhouse.example.com</host>
151+
<port>9000</port>
152+
</endpoint>
153+
</endpoints>
154+
<user>ipfixcol2</user>
155+
<password>ipfixcol2</password>
156+
<database>ipfixcol2</database>
157+
<table>flows</table>
158+
</connection>
159+
<inserterThreads>8</inserterThreads>
160+
<blocks>64</blocks>
161+
<blockInsertThreshold>100000</blockInsertThreshold>
162+
<splitBiflow>true</splitBiflow>
163+
<nonblocking>true</nonblocking>
164+
<columns>
165+
<column>
166+
<!-- Special field representing the ODID the flow originated from. -->
167+
<name>odid</name>
168+
</column>
169+
<column>
170+
<!-- IPFIX field(s) identified by an alias. Maps to sourceIPv4Address or sourceIPv6Address, whichever exists. -->
171+
<name>srcip</name>
172+
</column>
173+
<column>
174+
<name>dstip</name>
175+
</column>
176+
<column>
177+
<name>flowstart</name>
178+
</column>
179+
<column>
180+
<name>flowend</name>
181+
</column>
182+
<column>
183+
<!-- IPFIX field identified by its IANA name stored to a column named "srcport" -->
184+
<name>srcport</name>
185+
<source>sourceTransportPort</source>
186+
</column>
187+
<column>
188+
<name>dstport</name>
189+
<source>destinationTransportPort</source>
190+
</column>
191+
<column>
192+
<!-- IPFIX field identified by its IANA name stored to a column with the same name -->
193+
<name>protocolIdentifier</name>
194+
</column>
195+
<column>
196+
<name>octetDeltaCount</name>
197+
</column>
198+
<column>
199+
<name>packetDeltaCount</name>
200+
</column>
201+
</columns>
202+
</params>
203+
</output>
204+
205+
**Warning**: The database and the table with the appropriate schema must already exist.
206+
It will not be created automatically.
207+
208+
Parameters
209+
----------
210+
211+
:``connection``:
212+
The database connection parameters.
213+
214+
:``endpoints``:
215+
The possible endpoints data can be sent to, i.e. all the replicas of a
216+
particular shard. In case one endpoint is unreachable, another one is used.
217+
218+
:``endpoint``:
219+
Connection parameters of one endpoint.
220+
221+
:``host``:
222+
The ClickHouse database host as a domain name or an IP address.
223+
224+
:``port``:
225+
The port of the ClickHouse database. [default: 9000]
226+
227+
:``username``:"
228+
The database username.
229+
230+
:``password``:
231+
The database password.
232+
233+
:``database``:
234+
The database name where the specified table is present.
235+
236+
:``table``:
237+
The name of the table to insert the data into.
238+
239+
:``splitBiflow``:
240+
When true, biflow records are split into two uniflow records. [default: true]
241+
242+
:``biflowEmptyAutoignore``:
243+
When true and ``splitBiflow`` is active, the uniflow records resulting from
244+
the split are also checked for emptiness and are omitted if empty. A flow
245+
is considered empty when ``octetDeltaCount = 0`` or ``packetDeltaCount = 0``.
246+
This exists because some IPFIX probes may export uniflow records as biflow
247+
with the reverse direction always empty, resulting in a large amount of
248+
empty flow records.
249+
[default: true]
250+
251+
:``blocks``:
252+
Number of data blocks in circulation. Each block is de-facto a memory
253+
buffer that the rows are written to before being sent out to the ClickHouse
254+
database. [default: 64]
255+
256+
:``inserterThreads``:
257+
Number of threads used for data insertion to ClickHouse. In other words,
258+
the number of ClickHouse connections that are concurrently used. [default: 8]
259+
260+
:``blockInsertThreshold``:
261+
Number of rows to be buffered into a block before the block is sent out to
262+
be inserted into the database. [default: 100000]
263+
264+
:``blockInsertMaxDelaySecs``:
265+
Maximum number of seconds to wait before a block gets sent out to be
266+
inserted into the database even if the threshold has not been reached yet.
267+
[default: 10]
268+
269+
:``nonblocking``:
270+
This option dictates what happens when all the blocks (buffers) are full.
271+
If true, the processing thread is not blocked, and some data is dropped to
272+
maintain flow of data.
273+
If false, the processing thread is blocked, waiting until a block becomes
274+
available. [default: true]
275+
276+
:``columns``:
277+
The fields that each row will consist of.
278+
279+
:``column``:
280+
281+
:``name``:
282+
Name of the column in the database. Also the source field if source
283+
is not explicitly defined.
284+
285+
:``nullable``:
286+
Whether null should be a special value. If false, zero value of the
287+
corresponding data type is used as null. Turning this option on
288+
might negatively affect performance. [default: false]
289+
290+
:``source``:
291+
An IPFIX element name or an alias. If not present, name is used.
292+
Aliases and IPFIX elements can be found
293+
`here <https://github.com/CESNET/libfds/tree/master/config/system>`_.
294+
List of standard IPFIX element names can be also found
295+
`here <https://www.iana.org/assignments/ipfix/ipfix.xhtml>`_.
296+
[default: same as name]
297+
298+
Performance tuning
299+
------------------
300+
301+
In case you are having performance issues with the default values, try
302+
increasing `blockInsertThreshold`, `blocks` and `inserterThreads` configuration
303+
parameters.
304+
305+
For example based on our testing, the following values should result in better
306+
performance at the cost of higher memory usage:
307+
308+
.. code-block:: xml
309+
310+
<inserterThreads>16</inserterThreads>
311+
<blocks>128</blocks>
312+
<blockInsertThreshold>500000</blockInsertThreshold>
313+
314+
You can further experiment with the values based on your input characteristics
315+
and your machine specifications.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
=============================
2+
ipfixcol2-clickhouse-output
3+
=============================
4+
5+
--------------------------
6+
ClickHouse (output plugin)
7+
--------------------------
8+
9+
:Author: Michal Sedlak ([email protected])
10+
:Date: 2024-11-04
11+
:Copyright: Copyright © 2024 CESNET, z.s.p.o.
12+
:Version: 1.0
13+
:Manual section: 7
14+
:Manual group: IPFIXcol collector
15+
16+
Description
17+
-----------
18+
19+
.. include:: ../README.rst
20+
:start-line: 3

0 commit comments

Comments
 (0)