You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A dataset processor that evaluates a Python expression on each data entry and either stores
1130
+
the result in a new field or uses it as a filtering condition.
1131
+
1132
+
This processor is useful for dynamic field computation or conditional filtering of entries based
1133
+
on configurable expressions. It leverages ``evaluate_expression``, which safely evaluates expressions
1134
+
using the abstract syntax tree (AST).
1135
+
1136
+
Filtering behavior:
1137
+
If ``filter=True``, the expression is evaluated for each entry. Only entries for which the expression evaluates to ``True`` are kept; all others are filtered out (removed from the output).
1138
+
If ``filter=False``, the result of the expression is stored in the field specified by ``new_field`` for each entry (no filtering occurs).
1139
+
1140
+
Examples::
1141
+
1142
+
# Example 1: Filtering entries where the duration is greater than 5.0 seconds
1143
+
LambdaExpression(
1144
+
new_field="keep", # This field is ignored when filter=True
1145
+
expression="entry['duration'] > 5.0",
1146
+
lambda_param_name="entry",
1147
+
filter=True
1148
+
)
1149
+
# Only entries with duration > 5.0 will be kept in the output manifest.
1150
+
1151
+
# Example 2: Adding a new field with the number of words in the text
1152
+
LambdaExpression(
1153
+
new_field="num_words",
1154
+
expression="len(entry['text'].split())",
1155
+
lambda_param_name="entry",
1156
+
filter=False
1157
+
)
1158
+
# Each entry will have a new field 'num_words' with the word count of the 'text' field.
1159
+
1160
+
Supported operations:
1161
+
1162
+
The expression supports a safe subset of Python operations, including:
0 commit comments