Skip to content

added yaml rest integration test & document-indexing test#20691

Open
ThyTran1402 wants to merge 3 commits intoopensearch-project:mainfrom
ThyTran1402:fix/term_vector_for_search_as_you_type
Open

added yaml rest integration test & document-indexing test#20691
ThyTran1402 wants to merge 3 commits intoopensearch-project:mainfrom
ThyTran1402:fix/term_vector_for_search_as_you_type

Conversation

@ThyTran1402
Copy link
Contributor

@ThyTran1402 ThyTran1402 commented Feb 20, 2026

Description

  • The bugs for this issue are search_as_you_type creates three subfields: - ._2gram, ._3gram (ShingleFieldMapper): should inherit term_vector - ._index_prefix (PrefixFieldMapper): must NOT get term_vector; the prefix index structure is incompatible with term vector storage
  • When term_vector offsets/positions flags were set on the root field, the _index_prefix subfield's Lucene FieldType could end up with offsets/positions flags set but storeTermVectors = false, which is an illegal combination that Lucene rejects at indexing time with an exception.
  • The fix for this bug is adding YAML REST integration test and document-indexing test (regression risk). And explicitly call prefixft.setStoreTermVectors(false) when constructing the prefix field's FieldType SearchAsYouTypeFieldMapper.Builder.build(). Although new FieldType() defaults to false, the explicit call guards against any future refactoring that might copy or propagate the root field's FieldType settings.

Related Issues

Resolves #1901

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@ThyTran1402 ThyTran1402 requested a review from a team as a code owner February 20, 2026 13:51
@github-actions github-actions bot added bug Something isn't working good first issue Good for newcomers hacktoberfest Global event that encourages people to contribute to open-source. Search:Relevance labels Feb 20, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Test additions validating that documents can be indexed when a search_as_you_type field has term_vector settings configured. The tests ensure term_vector flags apply only to shingle subfields and not to the _index_prefix subfield.

Changes

Cohort / File(s) Summary
Unit Test Enhancement
modules/mapper-extras/src/test/java/org/opensearch/index/mapper/SearchAsYouTypeFieldMapperTests.java
Added runtime document indexing check to testTermVectors method to verify no exception occurs when indexing documents with term_vector settings and that the _index_prefix subfield does not inherit term-vector flags.
YAML REST API Test
modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml
Added new test scenario validating successful document indexing with search_as_you_type field configured with term_vector: with_positions_offsets.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: adding YAML REST integration test and document-indexing test for the search_as_you_type term_vector bug.
Linked Issues check ✅ Passed The PR adds tests (YAML REST and document-indexing) that address issue #1901's objective of preventing term_vector propagation to _index_prefix subfield while preserving it for shingle subfields.
Out of Scope Changes check ✅ Passed All changes are focused on test additions (one Java test modification and one YAML REST test) directly addressing the term_vector bug in search_as_you_type fields, with no unrelated modifications.
Description check ✅ Passed The PR description includes a description section explaining the bug and fix, related issue #1901, and confirms the Apache 2.0 license agreement, though the description is somewhat informal and could be clearer.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml (1)

1244-1266: No teardown for test-term-vector-index — adds retry/rerun fragility.

test-term-vector-index is created inside a named test section, not in the global setup, so there is no corresponding automatic cleanup. The setup section runs before each test section in order to set up the same environment for each test section. Without a teardown, a second execution within the same cluster (e.g., retry on failure, parallel suites) will hit a resource_already_exists_exception on indices.create.

A teardown section can clean up resources created during the setup or test actions. Consider one of:

  1. Adding an indices.delete at the top of this test section with ignore: 404 (defensive delete before create).
  2. Promoting index creation to the global setup and deletion to a file-level teardown.
💡 Minimal fix — guard at the top of the test section
 "index document with term_vector on search_as_you_type field":

+  - do:
+      indices.delete:
+        index: test-term-vector-index
+        ignore_unavailable: true
+
   - do:
       indices.create:
         index: test-term-vector-index
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml`
around lines 1244 - 1266, The test creates an index named test-term-vector-index
inside the test section but never deletes it, which causes flaky
retries/resource_already_exists exceptions; fix by adding a defensive
indices.delete action for test-term-vector-index with ignore: 404 at the start
of this test section (or alternatively move the indices.create into the global
setup and add a corresponding file-level teardown indices.delete), ensuring the
index is removed before creation so subsequent runs don't fail.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc98ec8 and fe7d0e4.

📒 Files selected for processing (2)
  • modules/mapper-extras/src/test/java/org/opensearch/index/mapper/SearchAsYouTypeFieldMapperTests.java
  • modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (1)
modules/mapper-extras/src/test/java/org/opensearch/index/mapper/SearchAsYouTypeFieldMapperTests.java (1)

385-388: LGTM — correctly exercises the Lucene FieldType.validate() path.

mapper.parse() is exactly the right call site: Lucene Field objects (which internally invoke FieldType.validate()) are instantiated during document parsing. Without the fix, the illegal combination of storeTermVectorOffsets=true / storeTermVectors=false on _index_prefix would throw IllegalStateException at this point. The discarded return value is idiomatic for a "must not throw" assertion. Running this check inside the existing loop ensures all six term-vector variants are covered.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml`:
- Around line 1244-1266: The test "index document with term_vector on
search_as_you_type field" creates a persistent index named
test-term-vector-index and lacks cleanup; add a teardown step that calls
indices.delete for test-term-vector-index (e.g., append a do: indices.delete
block targeting index: test-term-vector-index) either inside this test section
or as a file-level teardown to ensure the index is removed after the test.

---

Nitpick comments:
In
`@modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml`:
- Around line 1244-1266: The test creates an index named test-term-vector-index
inside the test section but never deletes it, which causes flaky
retries/resource_already_exists exceptions; fix by adding a defensive
indices.delete action for test-term-vector-index with ignore: 404 at the start
of this test section (or alternatively move the indices.create into the global
setup and add a corresponding file-level teardown indices.delete), ensuring the
index is removed before creation so subsequent runs don't fail.

Comment on lines +1244 to +1266
---
"index document with term_vector on search_as_you_type field":

- do:
indices.create:
index: test-term-vector-index
body:
settings:
number_of_replicas: 0
mappings:
properties:
my_field:
type: search_as_you_type
term_vector: with_positions_offsets

- do:
index:
index: test-term-vector-index
id: 1
body:
my_field: "hello world"

- match: { result: created }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for duplicate occurrences of the test scenario name across YAML test files
rg -rn "index document with term_vector on search_as_you_type field" --type yaml

Repository: opensearch-project/OpenSearch

Length of output: 177


🏁 Script executed:

#!/bin/bash
# Find the actual file and examine its structure
find . -name "10_basic.yml" -path "*search-as-you-type*" -type f

Repository: opensearch-project/OpenSearch

Length of output: 174


🏁 Script executed:

#!/bin/bash
# Get file size and line count
wc -l "modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml" 2>/dev/null || echo "File not found"

Repository: opensearch-project/OpenSearch

Length of output: 177


🏁 Script executed:

#!/bin/bash
# Read the end of the file to see the test in question and any teardown
tail -50 "modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml"

Repository: opensearch-project/OpenSearch

Length of output: 1403


🏁 Script executed:

#!/bin/bash
# Search for all occurrences of "term_vector" and "search_as_you_type" together in the file
rg -n "term_vector|teardown|---" "modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml" | head -100

Repository: opensearch-project/OpenSearch

Length of output: 793


🏁 Script executed:

#!/bin/bash
# Simple grep to find if the test scenario name appears more than once
grep -n "index document with term_vector" "modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml"

Repository: opensearch-project/OpenSearch

Length of output: 140


Add a teardown section to clean up the test-term-vector-index created in this test.

This test creates a persistent index (test-term-vector-index) at line 1251 but does not delete it. Since this is the final test in the file and there is no file-level teardown, the index remains in the cluster. Add either a teardown block within this test section or a file-level teardown to delete the index after tests complete:

  - do:
      indices.delete:
        index: test-term-vector-index

This prevents state pollution and ensures the test can be re-run without encountering a 409 Conflict error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml`
around lines 1244 - 1266, The test "index document with term_vector on
search_as_you_type field" creates a persistent index named
test-term-vector-index and lacks cleanup; add a teardown step that calls
indices.delete for test-term-vector-index (e.g., append a do: indices.delete
block targeting index: test-term-vector-index) either inside this test section
or as a file-level teardown to ensure the index is removed after the test.

@github-actions
Copy link
Contributor

❌ Gradle check result for fe7d0e4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@github-actions
Copy link
Contributor

❌ Gradle check result for 73dcab6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


- match: { hits.total: 1 }
- match: { hits.hits.0._source.a_field: "quick brown fox jump lazy dog" }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the issue resolved by #3119? So this PR aims to add more tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. The issue fixed by PR #3119. This PR only added test to verifies actual document indexing doesn't throw, and not just that FieldType flags are set correctly. They close a gap in the existing test coverage so we could regress the production fix without the existing tests catching the actual exception.

Let me know what your thoughts about it?

Thank you!

@ThyTran1402
Copy link
Contributor Author

Hi @gaobinlong

I think the PR is ready for review.

Thank you!

@github-actions
Copy link
Contributor

❌ Gradle check result for 73dcab6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Thy Tran <58045538+ThyTran1402@users.noreply.github.com>
@ThyTran1402
Copy link
Contributor Author

ThyTran1402 commented Mar 18, 2026

Hi @gaobinlong

I just checked again and realized that the bug was not fully fixed it yet. And the bug is _index_prefix subfield does not call setTermVectorParams, but there may be a code path possibly through Lucene's segment merging or the test framework's document writer where term vector metadata from the sibling shingle fields bleeds into the prefix field's FieldInfo, causing a conflict. So, I fixed it recently, let me know if that works.

Can you review it again please?

Thank you!

@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix term vector propagation to _index_prefix and add unit test

Relevant files:

  • modules/mapper-extras/src/main/java/org/opensearch/index/mapper/SearchAsYouTypeFieldMapper.java
  • modules/mapper-extras/src/test/java/org/opensearch/index/mapper/SearchAsYouTypeFieldMapperTests.java

Sub-PR theme: Add YAML REST integration test for term_vector on search_as_you_type

Relevant files:

  • modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml

⚡ Recommended focus areas for review

Missing Cleanup

The new YAML test creates an index test-term-vector-index but only deletes it at the start (with ignore_unavailable: true). There is no teardown section to clean up the index after the test completes, which could leave stale state affecting other test runs.

---
"index document with term_vector on search_as_you_type field":

  - do:
      indices.delete:
        index: test-term-vector-index
        ignore_unavailable: true

  - do:
      indices.create:
        index: test-term-vector-index
        body:
          settings:
            number_of_replicas: 0
          mappings:
            properties:
              my_field:
                type: search_as_you_type
                term_vector: with_positions_offsets

  - do:
      index:
        index: test-term-vector-index
        id: 1
        body:
          my_field: "hello world"

  - match: { result: created }
Incomplete Assertion

The YAML test only asserts result: created after indexing the document, but does not verify that the document is actually retrievable or that term vectors are stored correctly on the shingle subfields. A more thorough test would also validate that term vectors can be fetched via the term vectors API for the _2gram and _3gram subfields.

- do:
    index:
      index: test-term-vector-index
      id: 1
      body:
        my_field: "hello world"

- match: { result: created }

@github-actions
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Verify term vectors are actually stored

The test only verifies that the document was created but does not validate that the
term vectors are actually stored and retrievable for the my_field field. Adding a
termvectors API call to verify the term vectors are correctly indexed would make the
test more meaningful and ensure the fix works end-to-end.

modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml [1264-1271]

 - do:
     index:
       index: test-term-vector-index
       id: 1
       body:
         my_field: "hello world"
 
 - match: { result: created }
 
+- do:
+    termvectors:
+      index: test-term-vector-index
+      id: 1
+      fields: [ "my_field" ]
+
+- match: { found: true }
+- is_true: term_vectors.my_field
+
Suggestion importance[1-10]: 5

__

Why: The suggestion to add a termvectors API call would make the test more comprehensive by verifying the actual fix works end-to-end, not just that indexing doesn't throw an exception. This would strengthen the test coverage for the bug fix.

Low
Add cleanup after test execution

The new test section is missing a setup or teardown to clean up the
test-term-vector-index after the test runs. While the delete at the start handles
pre-existing state, the index will remain after the test completes, potentially
affecting other test runs. Consider adding a teardown block or ensuring the index is
deleted at the end of the test.

modules/mapper-extras/src/yamlRestTest/resources/rest-api-spec/test/search-as-you-type/10_basic.yml [1245-1250]

 "index document with term_vector on search_as_you_type field":
 
   - do:
       indices.delete:
         index: test-term-vector-index
         ignore_unavailable: true
 
+  # ... (rest of test steps) ...
+
+  - do:
+      indices.delete:
+        index: test-term-vector-index
+        ignore_unavailable: true
+
Suggestion importance[1-10]: 3

__

Why: While adding a teardown is a good practice, the test already deletes the index at the start with ignore_unavailable: true, which handles pre-existing state. The leftover index is unlikely to affect other tests since it uses a unique name test-term-vector-index. The suggestion is valid but has low impact.

Low

@github-actions
Copy link
Contributor

❌ Gradle check result for cb6eff4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working good first issue Good for newcomers hacktoberfest Global event that encourages people to contribute to open-source. lucene Search:Relevance skip-changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] term_vector does not work on search_as_you_type fields

2 participants