Tag recognizer pagination #25537

edg956 · 2026-01-26T18:52:26Z

Describe your changes:

This PR includes an endpoint that returns a tag's recognizers with pagination

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

I have added tests around the new logic.
For connector/ingestion changes: I updated the documentation.

Summary by Gitar

New API endpoints:
- GET /v1/tags/{id}/recognizers and GET /v1/tags/name/{fqn}/recognizers enable paginated retrieval of tag recognizers
Cursor-based pagination:
- TagRepository.getRecognizersOfTag() implements bidirectional cursor navigation with before/after parameters
New exception handling:
- BadCursorException validates and handles malformed pagination cursors returning HTTP 400

_{This will update automatically on new commits.}

gitar-bot · 2026-01-26T18:58:17Z

🔍 CI failure analysis for dce5dac: TagResourceIT fails (PR bug). NEW: Maven SonarCloud has same AWS config errors as Maven PostgreSQL + database memory exhaustion after 3h run. Total: 12 failures, 83% unrelated to PR.

Issue

Twelve CI jobs have failed across multiple workflow runs:

1-2. integration-tests (2 Java jobs) - TagResourceIT failures
3. py-run-tests (3.11) - 11 Python lineage failures
4-8. py-run-tests (3.10) - 5 S3 failures
9-10. playwright-ci-postgresql - 2 E2E jobs (shards 4,6 and 5,6)
11. maven-postgresql-ci - 1 Apps + 3 AWS test errors
12. maven-sonarcloud-ci (job 61511518768) - Same AWS test errors + database issues (NEW)

Java Integration Tests: TagResourceIT.test_recognizerPaginationEndpoint

Status: Fails in MySQL+Elasticsearch AND PostgreSQL+OpenSearch

Test Failure:

Expected: Recognizer_19
Actual: Recognizer_29

Root Cause

Backward pagination bug in TagRepository.listRecognizersBefore() at line 1143

Fix: Pass null as the after parameter

PR Related: ✓ Yes

Maven SonarCloud CI (NEW FAILURE)

Job: maven-sonarcloud-ci - job 61511518768

Status: Build failed after 3 hours 20 minutes

Failure: Test failures in openmetadata-service

Errors Identified

1. AwsCredentialsUtilTest (3 errors)

Same as maven-postgresql-ci: IllegalArgumentException: AWS credentials not configured
Tests checking AWS credentials configuration/validation

2. Database Resource Exhaustion

SearchIndexException: Failed to read batch for entityType: tag. 
Error: java.sql.SQLException: Out of sort memory, consider increasing server sort buffer size

SQL query: SELECT json FROM tag WHERE TRUE ORDER BY name, id LIMIT 15
Database ran out of sort memory during tag entity indexing

3. RdfIndexApp Installation Failures

Repeated warnings: Application [RdfIndexApp] Exception [install] due to [null]
25+ occurrences throughout the 3+ hour run
Application installation/configuration issues

Analysis

Why This is NOT PR-Related:

This PR only modifies tag recognizer pagination code:

TagRepository.java - pagination logic
TagResource.java - REST endpoints
BadCursorException.java - exception class
Integration tests

Zero changes to:

AWS credentials handling
Database configuration or buffer sizes
RdfIndexApp application
Search indexing logic beyond pagination
SonarCloud configuration

Root Cause Analysis:

AwsCredentialsUtilTest (Same as maven-postgresql-ci):

AWS credentials/IAM configuration missing in CI environment
Not code issue - environment configuration

Database "Out of sort memory":

SQL query is basic tag selection with sorting
This is a CI environment resource issue, not code bug
Database server doesn't have enough sort buffer allocated
Query itself is not introduced by this PR - it's existing indexing logic
After 3+ hours of testing, database resources likely exhausted

RdfIndexApp failures:

Application installation/configuration issue
Unrelated to tag recognizer pagination
Pre-existing or environment-specific problem

Long Build Time (3h 20min):

Extremely long compared to typical builds
Indicates resource contention, slow environment, or configuration issues
Maven + SonarCloud analysis + testing is resource-intensive
Similar to maven-postgresql-ci (2h 42min)

Recommendation

This maven-sonarcloud-ci failure is infrastructure/environment/configuration issue unrelated to the PR:

Not caused by PR code changes (PR only touches tag pagination)
AWS credentials: Environment configuration issue (same as maven-postgresql-ci)
Database memory: CI environment resource exhaustion after 3+ hour run
RdfIndexApp: Application configuration issue
Build time: 3h 20min indicates severe environment/resource problems

Pattern: This is essentially the same failure pattern as maven-postgresql-ci (job 11) plus additional resource exhaustion from the longer SonarCloud analysis run.

PR Related: ✗ No - Infrastructure/environment/configuration issues

Summary

Job Category	Tests	Failure Type	PR Related	Count	Pass Rate
Java integration	TagResourceIT	Code bug	✓ Yes	2	N/A
Maven CI	Apps/AWS/DB	Timing/Config/Resource	✗ No	2	~99.95%
Playwright E2E	UI tests	Flaky (2/6 shards)	✗ No	2	67% shards
Python S3	test_s3_ingestion	Infrastructure	✗ No	5	0%
Other	Various	Test isolation/infra	✗ No	2	N/A

Critical Statistics:

12 total failed CI jobs
Only 2 jobs (17%) are PR-related
10 jobs (83%) fail due to unrelated issues
Maven jobs: 2 with similar issues (AWS config, resource exhaustion)
S3 test: Still 0% pass rate (critical infrastructure issue)
Playwright: 2/6 shards failed (flaky/environment)

Required for PR: Fix backward pagination bug in TagRepository.java:1143

Infrastructure/Environment Issues:

URGENT: S3/STS API incompatibility (5 failures)
URGENT: Maven CI environment resources (3h+ builds, memory exhaustion)
High: AWS credentials configuration (2 Maven jobs)
Monitor: RdfIndexApp configuration
Monitor: Playwright E2E stability (2/6 shards)
Monitor: Python lineage test isolation

Strong Recommendation: This PR should NOT be blocked by maven-sonarcloud-ci failures or any of the other infrastructure/environment issues (83% of failures). The maven-sonarcloud-ci failure is a duplicate of the maven-postgresql-ci pattern with additional resource exhaustion from the long analysis run.

Code Review ⚠️ Changes requested 0 resolved / 3 findings

New paginated recognizers endpoint with two issues remaining: duplicate OpenAPI operationId causing spec conflicts, and backward pagination bug where the wrong cursor parameter is passed.

🚨 Bug: Backward pagination passes wrong cursor parameter

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java:561

The listRecognizersBefore method has a critical bug where it validates the before cursor but then passes the same raw before string to listRecognizersAfter as the after parameter (line 561). This causes two problems:

The beforeId that was parsed and validated is never used
listRecognizersAfter receives before as its second param, which it will then try to parse again as an after cursor

The backward pagination logic is fundamentally broken because the cursor string is passed through unchanged.

Suggested fix: Pass null as the after cursor and rely on the reversed list to handle pagination naturally, OR refactor to encode the cursor properly for the reversed list:

return listRecognizersAfter(recognizers.reversed(), null, limit);

Then handle the cursor-based starting position differently in the calling code.

⚠️

Bug: Unused `beforeId` variable - parsed cursor value is discarded

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java:548

In the listRecognizersBefore method, the beforeId variable is parsed from the cursor at line 548, but is never used. The method then calls listRecognizersAfter(recognizers.reversed(), before, limit) at line 558, passing the original raw before cursor string instead of the validated beforeId.

This means:

The cursor parsing at line 548 is effectively dead code
beforeId is extracted but never referenced again
The validation is performed twice (once here, once in listRecognizersAfter)

The method needs restructuring to properly use the parsed cursor value.

⚠️

Quality: Duplicate OpenAPI operationId will cause spec conflicts

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/tags/TagResource.java:811-814 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/tags/TagResource.java:857-860

Both REST endpoints (listRecognizersByTagId and listRecognizersByTagFQN) use the same operationId = "listATagsRecognizers". OpenAPI specification requires unique operation IDs across all endpoints. This will cause issues with:

OpenAPI spec validation
SDK code generation (client generators use operationId to generate method names)
API documentation tools

Suggested fix: Use unique operation IDs for each endpoint:

For the ID-based endpoint:

operationId = "listTagRecognizersById"

For the FQN-based endpoint:

operationId = "listTagRecognizersByFQN"

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

gitar-bot · 2026-01-26T18:59:13Z

openmetadata-service/src/main/java/org/openmetadata/service/resources/tags/TagResource.java

  }
+
+  @GET
+  @Path("{id}/recognizers")


⚠️ Quality: Duplicate OpenAPI operationId will cause spec conflicts

Details

Both REST endpoints (listRecognizersByTagId and listRecognizersByTagFQN) use the same operationId = "listATagsRecognizers". OpenAPI specification requires unique operation IDs across all endpoints. This will cause issues with:

OpenAPI spec validation

SDK code generation (client generators use operationId to generate method names)

API documentation tools

Suggested fix: Use unique operation IDs for each endpoint:

For the ID-based endpoint:

operationId = "listTagRecognizersById"

For the FQN-based endpoint:

operationId = "listTagRecognizersByFQN"

Was this helpful? React with 👍 / 👎

Add pagination for tag's recognizers

9d8861c

edg956 self-assigned this Jan 26, 2026

edg956 requested a review from a team as a code owner January 26, 2026 18:52

edg956 added safe to test Add this label to run secure Github workflows on PRs governance labels Jan 26, 2026

edg956 temporarily deployed to test January 26, 2026 18:52 — with GitHub Actions Inactive

github-actions bot added the Ingestion label Jan 26, 2026

Add tests

dce5dac

edg956 force-pushed the feat/tag-recognizer-pagination branch from e2dca3b to dce5dac Compare January 26, 2026 18:56

harshach approved these changes Jan 26, 2026

View reviewed changes

gitar-bot bot reviewed Jan 26, 2026

View reviewed changes

edg956 deployed to test January 26, 2026 20:21 — with GitHub Actions Active

edg956 temporarily deployed to test January 26, 2026 20:21 — with GitHub Actions Inactive

edg956 had a problem deploying to test January 26, 2026 20:21 — with GitHub Actions Failure

edg956 requested a deployment to test January 26, 2026 20:21 — with GitHub Actions In progress

edg956 temporarily deployed to test January 26, 2026 20:21 — with GitHub Actions Inactive

edg956 had a problem deploying to test January 26, 2026 20:21 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tag recognizer pagination #25537

Tag recognizer pagination #25537

edg956 commented Jan 26, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

gitar-bot bot commented Jan 26, 2026 •

edited

Loading

Issue

Java Integration Tests: TagResourceIT.test_recognizerPaginationEndpoint

Root Cause

Maven SonarCloud CI (NEW FAILURE)

Errors Identified

Analysis

Recommendation

Summary

Uh oh!

gitar-bot bot Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tag recognizer pagination #25537

Are you sure you want to change the base?

Tag recognizer pagination #25537

Conversation

edg956 commented Jan 26, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

gitar-bot bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Java Integration Tests: TagResourceIT.test_recognizerPaginationEndpoint

Root Cause

Maven SonarCloud CI (NEW FAILURE)

Errors Identified

Analysis

Recommendation

Summary

Uh oh!

gitar-bot bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edg956 commented Jan 26, 2026 •

edited by gitar-bot bot

Loading

gitar-bot bot commented Jan 26, 2026 •

edited

Loading