Skip to content

Conversation

charan2628
Copy link

@charan2628 charan2628 commented Mar 20, 2025

Description

Support extracting subLists from a list

Issues Resolved

Resolves #5529

Check List

  • [X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

List sourceList = (List)value;
int startIndex = (Integer)args.get(1), endIndex = (Integer)args.get(2);

if (startIndex < 0 || startIndex >= sourceList.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be

if (startIndex < 0 || startIndex > sourceList.size())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using 0 based index and startIndex is inclusive so startIndex equals to sourceList.size() causes index out of bounds for subList function, that's why I used >=
For endIndex I used > since it's exclusive

Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this contribution. I left a couple comments.

@graytaylor0
Copy link
Member

graytaylor0 commented Mar 20, 2025

Also the build is failing due to lack of test coverage, You also need to sign your commit. You can do git commit --amend and modify the message to include the signature. For example, this is my signature

Signed-off-by: Taylor Gray <[email protected]>
[ant:jacocoReport] Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

> Task :data-prepper-expression:jacocoTestCoverageVerification FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':data-prepper-expression:jacocoTestCoverageVerification'.
> Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

@charan2628 charan2628 force-pushed the sublist-function-5529 branch 3 times, most recently from cdded93 to 9aca2dc Compare March 22, 2025 09:14
@charan2628
Copy link
Author

Also the build is failing due to lack of test coverage, You also need to sign your commit. You can do git commit --amend and modify the message to include the signature. For example, this is my signature

Signed-off-by: Taylor Gray <[email protected]>
[ant:jacocoReport] Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

> Task :data-prepper-expression:jacocoTestCoverageVerification FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':data-prepper-expression:jacocoTestCoverageVerification'.
> Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

Added test cases for 100% coverage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one test case where the key is nested in the event object for example, if the event object look like

{
    "level1": {
           "level2": [ 1, 2, 3, 4, 5]
    }
}

and

assertThat(subListExpressionFunction.evaluate(List.of("/level1/level2", 1, 3), testEvent, testFunction), equalTo(List.of(2, 3)));

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added please check #testWithValidArgumentsCase3

throw new RuntimeException("subList() end index should be between 0 and list length or -1 for list length (exclusive)");
}
if (startIndex > endIndex) {
throw new RuntimeException("subList() start index should be less or equal to end index");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Less than or equal to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

if (!(args.get(0) instanceof String)) {
throw new IllegalArgumentException("subList() takes 1st argument as string type");
}
if (!(args.get(1) instanceof Integer) || !(args.get(2) instanceof Integer)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run data prepper end to end to test this function? I pulled your PR and it looks like Integers can't be passed as arguments to the functions due to the grammar. I think we will need these to be strings for now and we can throw illegal argument exception if we can't parse the string to int.

  - add_entries:
        entries:
          - key: "result"
            value_expression: 'subList(/my_list, "0", "1")'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting integers as string now, please check the message is fine or not "subList() takes 2nd and 3rd arguments as integers"

@graytaylor0
Copy link
Member

Also the build is failing due to lack of test coverage, You also need to sign your commit. You can do git commit --amend and modify the message to include the signature. For example, this is my signature

Signed-off-by: Taylor Gray <[email protected]>
[ant:jacocoReport] Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

> Task :data-prepper-expression:jacocoTestCoverageVerification FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':data-prepper-expression:jacocoTestCoverageVerification'.
> Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

Added test cases for 100% coverage.

Thanks! You still need to amend the commit message with the DCO though

@charan2628 charan2628 force-pushed the sublist-function-5529 branch from 9aca2dc to 0f5d36a Compare March 26, 2025 17:10
@charan2628
Copy link
Author

Also the build is failing due to lack of test coverage, You also need to sign your commit. You can do git commit --amend and modify the message to include the signature. For example, this is my signature

Signed-off-by: Taylor Gray <[email protected]>
[ant:jacocoReport] Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

> Task :data-prepper-expression:jacocoTestCoverageVerification FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':data-prepper-expression:jacocoTestCoverageVerification'.
> Rule violated for bundle data-prepper-expression: instructions covered ratio is 0.9, but expected minimum is 1.0

Added test cases for 100% coverage.

Thanks! You still need to amend the commit message with the DCO though

Signed now

san81
san81 previously approved these changes Mar 26, 2025
graytaylor0
graytaylor0 previously approved these changes Mar 27, 2025
Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good! Will you creating a PR to the documentation website for this (https://github.com/opensearch-project/documentation-website/blob/main/_data-prepper/pipelines/functions.md)?

@graytaylor0
Copy link
Member

Thanks this looks good! Will you creating a PR to the documentation website for this (https://github.com/opensearch-project/documentation-website/blob/main/_data-prepper/pipelines/functions.md)?

I will @graytaylor0

Thanks!

@graytaylor0
Copy link
Member

I tested again with the changes and am still getting exceptions when passing the 2nd and 3rd argument. Looking into what needs to change for that to work

@graytaylor0
Copy link
Member

graytaylor0 commented Mar 27, 2025

@charan2628 I have gotten it working with this format subList(/tags, 0, 1), but it requires some changes

  1. Modify the grammar expression to this to support Function Arg as integer, and modify integer to allow for negative values ( )
fragment
FunctionArg
    : JsonPointer
    | String
    | Integer
    ;

Integer
    : ZERO
    | '-'? NonZeroDigit Digit*
    ;
  1. Modify the ExpressionEvaluator to support integers as arguments. There may be a better way, but this is the code I changed to here ( )
else {
                        try {
                            argList.add(Integer.parseInt(trimmedArg));
                            continue;
                        } catch (final Exception e) {
                        }

                        throw new RuntimeException("Unsupported type passed as function argument");
                    }
  1. Lastly, modify the sublist expression to take in the ints.
try {
			startIndex = (Integer) args.get(1);
			endIndex = (Integer) args.get(2);
		} catch (NumberFormatException | ClassCastException e) {
			throw new IllegalArgumentException("subList() takes 2nd and 3rd arguments as integers");
		}

@dlvenable Please weigh in on these changes, especially the best way to modify for number 2 if there is an alternative

@graytaylor0 graytaylor0 dismissed their stale review March 28, 2025 16:11

Does not work yet after testing, recommendations to fix added in comment

@graytaylor0 graytaylor0 self-requested a review March 28, 2025 16:11
@charan2628
Copy link
Author

@graytaylor0 sublist should support both integers and strings for 2nd and 3rd arguments?

 - add_entries:
        entries:
          - key: "result"
            value_expression: 'subList(/my_list, "0", "1")'
 - add_entries:
        entries:
          - key: "result"
            value_expression: 'subList(/my_list, 0, 1)'

int startIndex, endIndex;
try {
startIndex = args.get(1) instanceof Integer ? (Integer) args.get(1) : Integer.parseInt((String)args.get(1));
endIndex = args.get(2) instanceof Integer ? (Integer) args.get(2) : Integer.parseInt((String)args.get(2));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@graytaylor0 Done the changes to include both integer and string.

@graytaylor0
Copy link
Member

@graytaylor0 sublist should support both integers and strings for 2nd and 3rd arguments?

 - add_entries:
        entries:
          - key: "result"
            value_expression: 'subList(/my_list, "0", "1")'
 - add_entries:
        entries:
          - key: "result"
            value_expression: 'subList(/my_list, 0, 1)'

We could just require Integer here with the change I had shared previously. But if you were able to test this end to end with both string and integers then that works too

graytaylor0
graytaylor0 previously approved these changes Apr 15, 2025
Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the changes. Were you able to run some tests end to end before we merge?

@charan2628 charan2628 force-pushed the sublist-function-5529 branch 2 times, most recently from 8ddce87 to 0fde339 Compare April 16, 2025 20:16
@charan2628
Copy link
Author

1

Thanks for addressing all the changes. Were you able to run some tests end to end before we merge?

image

simple-sample-pipeline:
  workers: 2
  delay: "5000"
  source:
    file:
      path: /home/schar/workspace/opensource/data-prepper-sublist/release/archives/linux/build/install/opensearch-data-prepper-2.11.0-SNAPSHOT-linux-x64/pipelines/input.json
      format: json
      record_type: event
      codec:
        newline:
  processor:
    - parse_json:
        source: message
    - add_entries:
        entries:
          - key: "sublist1"
            value_expression: "subList(/my_list, 0, 2)"
          - key: "sublist2"
            value_expression: 'subList(/my_list, "0", "2")'
          - key: "sublist3"
            value_expression: 'subList(/my_list, "3", "-1")'
    - delete_entries:
        with_keys:
           - message
  sink:
    - stdout:
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}

@graytaylor0 Done changes to remove double quotes if we receive string as arguments
Tested using above pipeline.yaml for both integer and string arguments with the input.json working as expected.

I name the new function as subList, in the original issue it was sublist, I will change it if needs to be changed

@charan2628
Copy link
Author

@graytaylor0 Tried with example mentioned in the issue, replacing the original list with sublist
pipeline.yaml

simple-sample-pipeline:
  workers: 2
  delay: "5000"
  source:
    file:
      path: /home/schar/workspace/opensource/data-prepper-sublist/release/archives/linux/build/install/opensearch-data-prepper-2.11.0-SNAPSHOT-linux-x64/pipelines/input.json
      format: json
      record_type: event
      codec:
        newline:
  processor:
    - parse_json:
        source: message
    - delete_entries:
        with_keys:
           - message
    - add_entries:
        entries:
          - key: "my_list"
            value_expression: 'subList(/my_list, "4", "-1")'
            overwrite_if_key_exists: true
  sink:
    - stdout:

input.json

{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}
{"my_list": [ 0, 1, 2, 3, 4, 5, 6]}

OUTPUT:
image

Submitted PR for documentation, please review opensearch-project/documentation-website#9718

graytaylor0
graytaylor0 previously approved these changes Apr 24, 2025
Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of this!

san81
san81 previously approved these changes Apr 24, 2025
Copy link
Collaborator

@san81 san81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution!

@dlvenable
Copy link
Member

@charan2628 , Thank you for this great contribution!

The unit tests are failing.

You can more easily run all the relevant tests locally using this command:

./gradlew -p data-prepper-expression clean build

This will allow you to quickly test these changes without building everything.

Below are the failing tests. Some appear to be based on restrictions for other functions which are not allowed to take in numbers.

GenericExpressionEvaluator_ConditionalIT > testGenericExpressionEvaluatorThrows(String, Event) > [8] hasTags(10), org.opensearch.dataprepper.model.event.JacksonEvent@8206075 FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

GenericExpressionEvaluator_ConditionalIT > testGenericExpressionEvaluatorThrows(String, Event) > [15] contains(1234, /strField), org.opensearch.dataprepper.model.event.JacksonEvent@300f43a3 FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

GenericExpressionEvaluator_ConditionalIT > testGenericExpressionEvaluatorThrows(String, Event) > [17] contains(/strField, 1234), org.opensearch.dataprepper.model.event.JacksonEvent@5c1e1d0f FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

GenericExpressionEvaluator_ConditionalIT > testGenericExpressionEvaluatorThrows(String, Event) > [29] getMetadata(10), org.opensearch.dataprepper.model.event.JacksonEvent@8206075 FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

GenericExpressionEvaluator_ConditionalIT > testGenericExpressionEvaluatorThrows(String, Event) > [32] cidrContains(/sourceIp,123), org.opensearch.dataprepper.model.event.JacksonEvent@54d9de9c FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

ParseTreeCoercionServiceTest > testCoerceTerminalNodeLengthFunctionWithInvalidArgument() FAILED
    org.opentest4j.AssertionFailedError: Expected java.lang.RuntimeException to be thrown, but nothing was thrown.

ParseTreeTest > testSubtractOperator() FAILED
    java.lang.AssertionError: 
    Expected: is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
         but: Expected context "/status_code==-200" | ConditionalExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		Expected context "/status_code==-200" | EqualityOperatorExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		Expected context "-200" | RegexOperatorExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext

1067 tests completed, 7 failed
    		Expected context "-200" | RelationalOperatorExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		Expected context "-200" | SetOperatorExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		Expected context "-200" | UnaryOperatorExpressionContext to match is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		is an instance of org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$ArithmeticUnaryExpressionContext
    		finally <[185 161 155 142 124 60]> is a org.opensearch.dataprepper.expression.antlr.DataPrepperExpressionParser$UnaryOperatorExpressionContext

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need tests to pass.

@charan2628
Copy link
Author

Need tests to pass.

Working on it

@charan2628
Copy link
Author

@dlvenable functions are now accepting Integer arguments, can I remove test cases checking for invalid arguments which are of type integer?

* SubList function opensearch-project#5529

Signed-off-by: Sai charan raj Gudala <[email protected]>
@charan2628 charan2628 dismissed stale reviews from san81 and graytaylor0 via 320840a May 28, 2025 18:21
@charan2628 charan2628 force-pushed the sublist-function-5529 branch from 0fde339 to 320840a Compare May 28, 2025 18:21
@charan2628
Copy link
Author

@dlvenable Since functions now accepting Integer arguments I removed those scenarios as part of invalid argument case in GenericExpressionEvaluator_ConditionalIT and added new valid case where subList function accepting integer argument. All tests are now passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support extracting subLists from a list
4 participants