This RFC proposes a new API for the lua transform.
Currently, the lua transform has some limitations in its API. In particular, the following features are missing:
-
Nested Fields
Currently accessing nested fields is possible using the field path notation:
event["nested.field"] = 5
However, users expect nested fields to be accessible as native Lua structures, for example like this:
event["nested"]["field"] = 5
-
Setup Code
Some scripts require expensive setup steps, for example, loading of modules or invoking shell commands. These steps should not be part of the main transform code.
For example, this code adding custom hostname
if event["host"] == nil then local f = io.popen ("/bin/hostname") local hostname = f:read("*a") or "" f:close() hostname = string.gsub(hostname, "\n$", "") event["host"] = hostname end
Should be split into two parts, the first part executed just once at the initialization:
local f = io.popen ("/bin/hostname") local hostname = f:read("*a") or "" f:close() hostname = string.gsub(hostname, "\n$", "")
and the second part executed for each incoming event:
if event["host"] == nil then event["host"] = hostname end
See #1864.
-
Control Flow
It should be possible to define channels for output events, similarly to how it is done in
swimlanestransform.See #1942.
The following example illustrates fields manipulations with the new approach.
[transforms.lua]
type = "lua"
inputs = []
version = "2"
hooks.process = """
function (event, emit)
-- add new field (simple)
event.new_field = "example"
-- add new field (nested, overwriting the content of "nested" map)
event.nested = {
field = "example value"
}
-- add new field (nested, to already existing map)
event.nested.another_field = "example value"
-- add new field (nested, without assumptions about presence of the parent map)
if event.possibly_existing == nil then
event.possibly_existing = {}
end
event.possibly_existing.example_field = "example value"
-- remove field (simple)
event.removed_field = nil
-- remove field (nested, keep parent maps)
event.nested.field = nil
-- remove field (nested, if the parent map is empty, the parent map is removed too)
event.another_nested.field = nil
if next(event.another_nested) == nil then
event.another_nested = nil
end
-- rename field from "original_field" to "another_field"
event.original_field, event.another_field = nil, event.original_field
emit(event)
end
"""This example is a log to metric transform which produces metric events from incoming log events using the following algorithm:
- There is an internal counter which is increased on each incoming log event.
- The log events are discarded.
- Each 10 seconds the transform produces a metric event with the count of received log events.
- Edge cases are handled in the following way:
- If there are no incoming invents, the metric event with the counter equal to 0 still has to be produced.
- On Vector's shutdown the transform has to produce the final metric event with the count of received events since the last flush.
Two versions of a config running the same Lua code are listed below, both of them implement the transform described above.
This config uses Lua functions defined as inline strings. It is easier to get started with runtime transforms.
[transforms.lua]
type = "lua"
inputs = []
version = "2"
hooks.init = """
function init (emit)
event_counter = 0
emit({
log = {
message = "starting up"
}
}, "auxiliary")
end
"""
hooks.process = """
function (event, emit)
event_counter = event_counter + 1
end
"""
hooks.shutdown = """
function shutdown (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
emit({
log = {
message = "shutting down"
}
}, "auxiliary")
end
"""
[[timers]]
interval_seconds = 10
handler = """
function (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
counter = 0
end
"""This version of the config uses the same Lua code as the config using inline Lua functions above, but all of the functions are defined in a single source option:
[transforms.lua]
type = "lua"
inputs = []
version = "2"
source = """
function init (emit)
event_counter = 0
emit({
log = {
message = "starting up"
}
}, "auxiliary")
end
function process (event, emit)
event_counter = event_counter + 1
end
function shutdown (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
emit({
log = {
message = "shutting down"
}
}, "auxiliary")
end
function timer_handler (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
counter = 0
end
"""
hooks.init = "init"
hooks.process = "process"
hooks.shutdown = "shutdown"
timers = [{interval_seconds = 10, handler = "timer_handler"}]In this example the code from the source of the example above is put into a separate file:
example_transform.lua
function init (emit)
event_counter = 0
emit({
log = {
message = "starting up"
}
}, "auxiliary")
end
function process (event, emit)
event_counter = event_counter + 1
end
function shutdown (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
emit({
log = {
message = "shutting down"
}
}, "auxiliary")
end
function timer_handler (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
counter = 0
endIt reduces the size of the transform configuration:
[transforms.lua]
type = "lua"
inputs = []
version = "2"
search_dirs = ["/example/search/dir"]
source = "require 'example_transform.lua'"
hooks.init = "init"
hooks.process = "process"
hooks.shutdown = "shutdown"
timers = [{interval_seconds = 10, handler = "timer_handler"}]The way to create modules in previous example above is simple, but might cause name collisions if there are multiple modules to be loaded.
It is recommended to create tables for modules and put functions inside them:
example_transform.lua
local example_transform = {}
local event_counter = 0
function example_transform.init (emit)
emit({
log = {
message = "starting up"
}
}, "auxiliary")
end
function example_transform.process (event, emit)
event_counter = event_counter + 1
end
function example_transform.shutdown (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
emit({
log = {
message = "shutting down"
}
}, "auxiliary")
end
function example_transform.timer_handler (emit)
emit {
metric = {
name = "counter_10s",
counter = {
value = event_counter
}
}
}
counter = 0
end
return example_transformThen the transform configuration is the following:
[transforms.lua]
type = "lua"
inputs = []
version = "2"
search_dirs = ["/example/search/dir"]
source = "example_transform = require 'example_transform.lua'"
hooks.init = "example_transform.init"
hooks.process = "example_transform.process"
hooks.shutdown = "example_transform.shutdown"
timers = [{interval_seconds = 10, handler = "example_transform.timer_handler"}]Lua transform configuration have to be versioned in order to distinguish between the old and the new APIs.
The old API is identified by version 1 and the new one, which is proposed in the present RFC, is identified by version 2. The version can be set using a version option in the configuration file. During the transitional period, omitting the version should result in using version 1. After all changes proposed here are implemented and sufficiently tested, version 1 could be deprecated and version 2 used as the default version.
In order to enable writing complex transforms, such as the one from the motivating example, a few new concepts have to be introduced.
Hooks are user-defined functions which are called on certain events.
-
inithook is a function with signaturefunction (emit) -- ... end
which is called when the transform is created. It takes a single argument,
emitfunction, which can be used to produce new events from the hook. -
shutdownhook is a function with signaturefunction (emit) -- ... end
which is called when the transform is destroyed, for example on Vector's shutdown. After the shutdown is called, no code from the transform would be called.
-
processhook is a function with signaturefunction (event, emit) -- ... end
which takes two arguments, an incoming event and the
emitfunction. It is called immediately when a new event comes to the transform.
Timers are user-defined functions called on predefined time interval. The specified time interval sets the minimal interval between subsequent invocations of the same timer function.
The timer functions have the following signature:
function (emit)
-- ...
endThe emit argument is an emitting function which allows the timer to produce new events.
Emitting function is a function that can be passed to a hook or timer. It has the following signature:
function (event, lane)
-- ...
endHere event is an encoded event to be produced by the transform, and lane is an optional parameter specifying the output lane. In order to read events produced by the transform on a certain lane, the downstream components have to use the name of the transform suffixed by . character and the name of the lane.
An emitting function is called from a transform component called
example_transformwithlaneparameter set toexample_lane. Then the downstreamconsolesink have to be defined as the following to be able to read the emitted event:[sinks.example_console] type = "console" inputs = ["example_transform.example_lane"] # would output the event from `example_lane` encoding.codec = "text"Other components connected to the same transform, but with different lanes names or without lane names at all would not receive any event.
Events passed to the transforms have userdata type with custom implementation of the __index metamethod. This data type is used instead of table because it allows to avoid copying of the data which is not used.
Events produced by the transforms through calling an emitting function can have either the same userdata type as the events passed to the transform, or be a newly created Lua tables with the same schema outlines below.
Both log and metrics events are encoded using external tagging.
-
Log events could be seen as tables created using
{ log = { -- ... } }The content of the
logfield corresponds to the usual log event structure, with possible nesting of the fields.If a log event is created by the user inside the transform is a table, then, if default fields named according to the global schema are not present in such a table, then they are automatically added to the event. This rule does not apply to events having
userdatatype.Example 1
The global schema is configured so that
message_keyis"message",timestamp_keyis"timestamp", andhost_keyis"instance_id".If a new event is created inside the user-defined Lua code as a table
event = { log = { message = "example message", nested = { field = "example nested field value" }, array = {1, 2, 3}, } }
and then emitted through an emitting function, Vector would examine its fields and add
timestampcontaining the current timestamp andinstance_idfield with the current hostname.Example 2
The global schema has default settings.
A log event created by
stdinsource is passed to theprocesshook inside the transform, where it appears to haveuserdatatype. The Lua code inside the transform deletes thetimestampfield by setting it tonil:event.log.timestamp = nil
And then emits the event. In that case Vector would not automatically insert the
timestampfield. -
Metric events could be seen as tables created using
{ metric = { -- ... } }The content of the
metricfield matches the metric data model. The values use external tagging with respect to the metric type, see the examples.In case when the metric events are created as tables in user-defined code, the following default values are assumed if they are not provided:
Field Name Default Value timestampCurrent time kindabsolutetagsempty map Furthermore, for
aggregated_histogramthecountfield inside thevaluemap can be omitted.Example:
counterThe minimal Lua code required to create a counter metric is the following:
{ metric = { name = "example_counter", counter = { value = 10 } } }Example:
gaugeThe minimal Lua code required to create a gauge metric is the following:
{ metric = { name = "example_gauge", gauge = { value = 10 } } }Example:
setThe minimal Lua code required to create a set metric is the following:
{ metric = { name = "example_set", set = { values = {"a", "b", "c"} } } }Example:
distributionThe minimal Lua code required to create a distribution metric is the following:
{ metric = { name = "example_distribution", distribution = { values = {"a", "b", "c"} } } }Example:
aggregated_histogramThe minimal Lua code required to create an aggregated histogram metric is the following:
{ metric = { name = "example_histogram", aggregated_histogram = { buckets = {1.0, 2.0, 3.0}, counts = {30, 20, 10}, sum = 1000 -- total sum of all measured values, cannot be inferred from `counts` and `buckets` } } } Note that the field [`count`](https://vector.dev/docs/architecture/data-model/metric/#count) is not required because it can be inferred by Vector automatically by summing up the values from `counts`.Example:
aggregated_summaryThe minimal Lua code required to create an aggregated summary metric is the following:
{ metric = { name = "example_summary", aggregated_summary = { quantiles = {0.25, 0.5, 0.75}, values = {1.0, 2.0, 3.0}, sum = 200, count = 100 } } }
The mapping between Vector data types and Lua data types is the following:
| Vector Type | Lua Type | Comment |
|---|---|---|
String |
string |
|
Integer |
integer |
|
Float |
number |
|
Boolean |
boolean |
|
Timestamp |
userdata |
There is no dedicated timestamp type in Lua. However, there is a standard library function os.date which returns a table with fields year, month, day, hour, min, sec, and some others. Other standard library functions, such as os.time, support tables with these fields as arguments. Because of that, Vector timestamps passed to the transform are represented as userdata with the same set of accessible fields. In order to have one-to-one correspondence between Vector timestamps and Lua timestamps, os.date function from the standard library is patched to return not a table, but userdata with the same set of fields as it usually would return instead. This approach makes it possible to have both compatibility with the standard library functions and a dedicated data type for timestamps. |
Null |
empty string | In Lua setting a table field to nil means deletion of this field. Furthermore, setting an array element to nil leads to deletion of this element. In order to avoid inconsistencies, already present Null values are visible represented as empty strings from Lua code, and it is impossible to create a new Null value in the user-defined code. |
Map |
userdata or table |
Maps which are parts of events passed to the transform from Vector have userdata type. User-created maps have table type. Both types are converted to Vector's Map type when they are emitted from the transform. |
Array |
sequence |
Sequences in Lua are a special case of tables. Because of that fact, the indexes can in principle start from any number. However, the convention in Lua is to start indexes from 1 instead of 0, so Vector should adhere it. |
The new configuration options are the following:
| Option Name | Required | Example | Description |
|---|---|---|---|
version |
yes | 2 |
In order to use the proposed API, the config has to contain version option set to 2. If it is not provided, Vector assumes that API version 1 is used. |
search_dirs |
no | ["/etc/vector/lua"] |
A list of directories where require function would look at if called from any part of the Lua code. |
source |
no | example_module = require("example_module") |
Lua source evaluated when the transform is created. It can call require function or define variables and handler functions inline. It is not called for each event like the source parameter in version 1 of the transform |
hooks.init |
no | example_function or function (emit) ... end |
Contains a Lua expression evaluating to init hook function. |
hooks.shutdown |
no | example_function or function (emit) ... end |
Contains a Lua expression evaluating to shutdown hook function. |
hooks.process |
yes | example_function or function (event, emit) ... end |
Contains a Lua expression evaluating to shutdown hook function. |
timers |
no | [{interval_seconds = 10, handler = "example_function"}] or [{interval_seconds = 10, handler = "function (emit) ... end"}] |
Contains an array of tables. Each table in the array has two fields, interval_seconds which can take an integer number of seconds, and handler, which is a Lua expression evaluating to a handler function for the timer. |
The implementation of lua transform supports only log events. Processing of log events has the following design:
- There is a
sourceparameter which takes a string of code. - When a new event comes in, the global variable
eventis set inside the Lua context and the code fromsourceis evaluated. - After that, Vector reads the global variable
eventas the processed event. - If the global variable
eventis set tonil, then the event is dropped.
Events have type userdata with custom metamethods, so they are views to Vector's events. Thus passing an event to Lua has zero cost, so only when fields are actually accessed the data is copied to Lua.
The fields are accessed through string indexes using Vector's field path notation.
The proposal
- gives users more power to create custom transforms;
- supports both logs and metrics;
- makes it possible to add complexity to the configuration of the transform gradually when needed.
- Implement support for
versionconfig option and split implementations for versions 1 and 2. - Add support for
userdatatype for timestamps. - Implement access to the nested structure of logs events.
- Implement metrics support.
- Support creation of events as table inside the transform.
- Support emitting functions.
- Implement hooks invocation.
- Implement timers invocation.
- Add behavior tests and examples to the documentation.