|
| 1 | +clickhouse (output plugin) |
| 2 | +=========================== |
| 3 | + |
| 4 | +The plugin converts and stores IPFIX flow records into a ClickHouse database. |
| 5 | +It is designed for high-performance environments, ensuring efficient data |
| 6 | +handling and storage. |
| 7 | + |
| 8 | +Key Features |
| 9 | +------------ |
| 10 | + |
| 11 | +- **High-Speed Export**: The conversion and export of data to the database are |
| 12 | + performed at the binary level (i.e., without conversion to text SQL |
| 13 | + commands). This enables export speeds of hundreds of thousands or even a |
| 14 | + million flow records per second (*performance depends on machine |
| 15 | + configuration*). |
| 16 | + |
| 17 | +- **Customizable Data Mapping**: Users can configure the plugin to send any |
| 18 | + IPFIX/NetFlow items to the database by specifying the item name and mapping |
| 19 | + it to the corresponding database column. |
| 20 | + |
| 21 | +- **High Availability (HA) Support**: The plugin supports sending data to |
| 22 | + multiple ClickHouse endpoints. In case of a failure at one endpoint, data is |
| 23 | + automatically redirected to the next available endpoint. This failover |
| 24 | + mechanism ensures reliability in HA deployments. Note: This is not a |
| 25 | + round-robin distribution but a failover strategy. |
| 26 | + |
| 27 | +How to build |
| 28 | +------------ |
| 29 | + |
| 30 | +By default, the plugin is not distributed with IPFIXcol due to extra dependencies. |
| 31 | +To build the plugin, IPFIXcol2 (and its header files) must be installed on your system. |
| 32 | + |
| 33 | +Finally, compile and install the plugin: |
| 34 | + |
| 35 | +.. code-block:: sh |
| 36 | +
|
| 37 | + $ mkdir build && cd build && cmake .. |
| 38 | + $ make |
| 39 | + # make install |
| 40 | +
|
| 41 | +Usage |
| 42 | +------ |
| 43 | + |
| 44 | +The plugin expects the ClickHouse database to already contain the table with |
| 45 | +appropriate schema corresponding to the configuration entered. The existence |
| 46 | +and schema of the table is checked after initiating connection to the database |
| 47 | +and an error is displayed if there is a mismatch. The table is not |
| 48 | +automatically created. |
| 49 | + |
| 50 | +To run the example configuration below, you can create the ClickHouse table |
| 51 | +using the following SQL query: |
| 52 | + |
| 53 | +.. code-block:: sql |
| 54 | +
|
| 55 | + CREATE TABLE ipfixcol2.flows ( |
| 56 | + odid UInt32, |
| 57 | + srcip IPv6, |
| 58 | + dstip IPv6, |
| 59 | + flowstart DateTime64(9), |
| 60 | + flowend DateTime64(9), |
| 61 | + sourceTransportPort UInt16, |
| 62 | + destinationTransportPort UInt16, |
| 63 | + protocolIdentifier UInt8, |
| 64 | + octetDeltaCount UInt64, |
| 65 | + packetDeltaCount UInt64, |
| 66 | + INDEX srcipindex srcip TYPE bloom_filter GRANULARITY 16, |
| 67 | + INDEX dstipindex dstip TYPE bloom_filter GRANULARITY 16 |
| 68 | + ) |
| 69 | + ENGINE = MergeTree |
| 70 | + PARTITION BY toStartOfInterval(flowstart, INTERVAL 1 HOUR) |
| 71 | + ORDER BY flowstart |
| 72 | +
|
| 73 | +The following ClickHouse column types are expected to store the following IPFIX element types: |
| 74 | + |
| 75 | +.. list-table:: Mapping of IPFIX types to ClickHouse column types |
| 76 | + |
| 77 | + * - **IPFIX Abstract Data Type** |
| 78 | + - **ClickHouse Column Type** |
| 79 | + * - unsigned8 |
| 80 | + - UInt8 |
| 81 | + * - unsigned16 |
| 82 | + - UInt16 |
| 83 | + * - unsigned32 |
| 84 | + - UInt32 |
| 85 | + * - unsigned64 |
| 86 | + - UInt64 |
| 87 | + * - signed8 |
| 88 | + - Int8 |
| 89 | + * - signed16 |
| 90 | + - Int16 |
| 91 | + * - signed32 |
| 92 | + - Int32 |
| 93 | + * - signed64 |
| 94 | + - Int64 |
| 95 | + * - ipv4Address |
| 96 | + - IPv4 |
| 97 | + * - ipv6Address |
| 98 | + - IPv6 |
| 99 | + * - macAddress |
| 100 | + - UInt64 |
| 101 | + * - dateTimeNanoseconds |
| 102 | + - DateTime64(9) |
| 103 | + * - dateTimeMicroseconds |
| 104 | + - DateTime64(6) |
| 105 | + * - dateTimeMilliseconds |
| 106 | + - DateTime64(3) |
| 107 | + * - dateTimeSeconds |
| 108 | + - DateTime |
| 109 | + * - string |
| 110 | + - String |
| 111 | + |
| 112 | +In case the field is an alias mapping to multiple IPFIX elements of compatible |
| 113 | +types, the resulting type is unified to the type with higher precision, i.e. |
| 114 | +the type that can hold both of the values without data loss. To unify IPv4 and |
| 115 | +IPv6 addresses as one type, the IPv4 is stored as an IPv6 value as a IPv4 |
| 116 | +mapped IPv6 address. |
| 117 | + |
| 118 | +To be able to retrieve the stored values in a human-friendly format, the |
| 119 | +following ClickHouse functions can be defined and used: |
| 120 | + |
| 121 | +.. code-block:: sql |
| 122 | +
|
| 123 | + CREATE FUNCTION ipToString AS (ip) -> |
| 124 | + if(isIPAddressInRange(toString(ip), '::ffff:0.0.0.0/96'), toString(toIPv4(ip)), toString(ip)); |
| 125 | +
|
| 126 | + CREATE FUNCTION macToString AS (mac) -> |
| 127 | + concat( |
| 128 | + lpad(hex(bitAnd(bitShiftRight(mac, 40), 0xFF)), 2, '0'), ':', |
| 129 | + lpad(hex(bitAnd(bitShiftRight(mac, 32), 0xFF)), 2, '0'), ':', |
| 130 | + lpad(hex(bitAnd(bitShiftRight(mac, 24), 0xFF)), 2, '0'), ':', |
| 131 | + lpad(hex(bitAnd(bitShiftRight(mac, 16), 0xFF)), 2, '0'), ':', |
| 132 | + lpad(hex(bitAnd(bitShiftRight(mac, 8), 0xFF)), 2, '0'), ':', |
| 133 | + lpad(hex(bitAnd(mac, 0xFF)), 2, '0') |
| 134 | + ); |
| 135 | +
|
| 136 | +
|
| 137 | +Example configuration |
| 138 | +--------------------- |
| 139 | + |
| 140 | +.. code-block:: xml |
| 141 | +
|
| 142 | + <output> |
| 143 | + <name>ClickHouse output</name> |
| 144 | + <plugin>clickhouse</plugin> |
| 145 | + <params> |
| 146 | + <connection> |
| 147 | + <endpoints> |
| 148 | + <!-- One or more ClickHouse databases (endpoints) --> |
| 149 | + <endpoint> |
| 150 | + <host>clickhouse.example.com</host> |
| 151 | + <port>9000</port> |
| 152 | + </endpoint> |
| 153 | + </endpoints> |
| 154 | + <user>ipfixcol2</user> |
| 155 | + <password>ipfixcol2</password> |
| 156 | + <database>ipfixcol2</database> |
| 157 | + <table>flows</table> |
| 158 | + </connection> |
| 159 | + <inserterThreads>8</inserterThreads> |
| 160 | + <blocks>64</blocks> |
| 161 | + <blockInsertThreshold>100000</blockInsertThreshold> |
| 162 | + <splitBiflow>true</splitBiflow> |
| 163 | + <nonblocking>true</nonblocking> |
| 164 | + <columns> |
| 165 | + <column> |
| 166 | + <!-- Special field representing the ODID the flow originated from. --> |
| 167 | + <name>odid</name> |
| 168 | + </column> |
| 169 | + <column> |
| 170 | + <!-- IPFIX field(s) identified by an alias. Maps to sourceIPv4Address or sourceIPv6Address, whichever exists. --> |
| 171 | + <name>srcip</name> |
| 172 | + </column> |
| 173 | + <column> |
| 174 | + <name>dstip</name> |
| 175 | + </column> |
| 176 | + <column> |
| 177 | + <name>flowstart</name> |
| 178 | + </column> |
| 179 | + <column> |
| 180 | + <name>flowend</name> |
| 181 | + </column> |
| 182 | + <column> |
| 183 | + <!-- IPFIX field identified by its IANA name stored to a column named "srcport" --> |
| 184 | + <name>srcport</name> |
| 185 | + <source>sourceTransportPort</source> |
| 186 | + </column> |
| 187 | + <column> |
| 188 | + <name>dstport</name> |
| 189 | + <source>destinationTransportPort</source> |
| 190 | + </column> |
| 191 | + <column> |
| 192 | + <!-- IPFIX field identified by its IANA name stored to a column with the same name --> |
| 193 | + <name>protocolIdentifier</name> |
| 194 | + </column> |
| 195 | + <column> |
| 196 | + <name>octetDeltaCount</name> |
| 197 | + </column> |
| 198 | + <column> |
| 199 | + <name>packetDeltaCount</name> |
| 200 | + </column> |
| 201 | + </columns> |
| 202 | + </params> |
| 203 | + </output> |
| 204 | +
|
| 205 | +**Warning**: The database and the table with the appropriate schema must already exist. |
| 206 | +It will not be created automatically. |
| 207 | + |
| 208 | +Parameters |
| 209 | +---------- |
| 210 | + |
| 211 | +:``connection``: |
| 212 | + The database connection parameters. |
| 213 | + |
| 214 | + :``endpoints``: |
| 215 | + The possible endpoints data can be sent to, i.e. all the replicas of a |
| 216 | + particular shard. In case one endpoint is unreachable, another one is used. |
| 217 | + |
| 218 | + :``endpoint``: |
| 219 | + Connection parameters of one endpoint. |
| 220 | + |
| 221 | + :``host``: |
| 222 | + The ClickHouse database host as a domain name or an IP address. |
| 223 | + |
| 224 | + :``port``: |
| 225 | + The port of the ClickHouse database. [default: 9000] |
| 226 | + |
| 227 | + :``username``:" |
| 228 | + The database username. |
| 229 | + |
| 230 | + :``password``: |
| 231 | + The database password. |
| 232 | + |
| 233 | + :``database``: |
| 234 | + The database name where the specified table is present. |
| 235 | + |
| 236 | + :``table``: |
| 237 | + The name of the table to insert the data into. |
| 238 | + |
| 239 | +:``splitBiflow``: |
| 240 | + When true, biflow records are split into two uniflow records. [default: true] |
| 241 | + |
| 242 | +:``biflowEmptyAutoignore``: |
| 243 | + When true and ``splitBiflow`` is active, the uniflow records resulting from |
| 244 | + the split are also checked for emptiness and are omitted if empty. A flow |
| 245 | + is considered empty when ``octetDeltaCount = 0`` or ``packetDeltaCount = 0``. |
| 246 | + This exists because some IPFIX probes may export uniflow records as biflow |
| 247 | + with the reverse direction always empty, resulting in a large amount of |
| 248 | + empty flow records. |
| 249 | + [default: true] |
| 250 | + |
| 251 | +:``blocks``: |
| 252 | + Number of data blocks in circulation. Each block is de-facto a memory |
| 253 | + buffer that the rows are written to before being sent out to the ClickHouse |
| 254 | + database. [default: 64] |
| 255 | + |
| 256 | +:``inserterThreads``: |
| 257 | + Number of threads used for data insertion to ClickHouse. In other words, |
| 258 | + the number of ClickHouse connections that are concurrently used. [default: 8] |
| 259 | + |
| 260 | +:``blockInsertThreshold``: |
| 261 | + Number of rows to be buffered into a block before the block is sent out to |
| 262 | + be inserted into the database. [default: 100000] |
| 263 | + |
| 264 | +:``blockInsertMaxDelaySecs``: |
| 265 | + Maximum number of seconds to wait before a block gets sent out to be |
| 266 | + inserted into the database even if the threshold has not been reached yet. |
| 267 | + [default: 10] |
| 268 | + |
| 269 | +:``nonblocking``: |
| 270 | + This option dictates what happens when all the blocks (buffers) are full. |
| 271 | + If true, the processing thread is not blocked, and some data is dropped to |
| 272 | + maintain flow of data. |
| 273 | + If false, the processing thread is blocked, waiting until a block becomes |
| 274 | + available. [default: true] |
| 275 | + |
| 276 | +:``columns``: |
| 277 | + The fields that each row will consist of. |
| 278 | + |
| 279 | + :``column``: |
| 280 | + |
| 281 | + :``name``: |
| 282 | + Name of the column in the database. Also the source field if source |
| 283 | + is not explicitly defined. |
| 284 | + |
| 285 | + :``nullable``: |
| 286 | + Whether null should be a special value. If false, zero value of the |
| 287 | + corresponding data type is used as null. Turning this option on |
| 288 | + might negatively affect performance. [default: false] |
| 289 | + |
| 290 | + :``source``: |
| 291 | + An IPFIX element name or an alias. If not present, name is used. |
| 292 | + Aliases and IPFIX elements can be found |
| 293 | + `here <https://github.com/CESNET/libfds/tree/master/config/system>`_. |
| 294 | + List of standard IPFIX element names can be also found |
| 295 | + `here <https://www.iana.org/assignments/ipfix/ipfix.xhtml>`_. |
| 296 | + [default: same as name] |
| 297 | + |
| 298 | +Performance tuning |
| 299 | +------------------ |
| 300 | + |
| 301 | +In case you are having performance issues with the default values, try |
| 302 | +increasing `blockInsertThreshold`, `blocks` and `inserterThreads` configuration |
| 303 | +parameters. |
| 304 | + |
| 305 | +For example based on our testing, the following values should result in better |
| 306 | +performance at the cost of higher memory usage: |
| 307 | + |
| 308 | +.. code-block:: xml |
| 309 | +
|
| 310 | + <inserterThreads>16</inserterThreads> |
| 311 | + <blocks>128</blocks> |
| 312 | + <blockInsertThreshold>500000</blockInsertThreshold> |
| 313 | +
|
| 314 | +You can further experiment with the values based on your input characteristics |
| 315 | +and your machine specifications. |
0 commit comments