Skip to content

Conversation

@edg956
Copy link
Contributor

@edg956 edg956 commented Jan 26, 2026

Describe your changes:

This PR includes an endpoint that returns a tag's recognizers with pagination

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • I have added tests around the new logic.
  • For connector/ingestion changes: I updated the documentation.

Summary by Gitar

  • New API endpoints:
    • GET /v1/tags/{id}/recognizers and GET /v1/tags/name/{fqn}/recognizers enable paginated retrieval of tag recognizers
  • Cursor-based pagination:
    • TagRepository.getRecognizersOfTag() implements bidirectional cursor navigation with before/after parameters
  • New exception handling:
    • BadCursorException validates and handles malformed pagination cursors returning HTTP 400

This will update automatically on new commits.


@edg956 edg956 force-pushed the feat/tag-recognizer-pagination branch from e2dca3b to dce5dac Compare January 26, 2026 18:56
@gitar-bot
Copy link

gitar-bot bot commented Jan 26, 2026

🔍 CI failure analysis for dce5dac: TagResourceIT fails (PR bug). NEW: Maven SonarCloud has same AWS config errors as Maven PostgreSQL + database memory exhaustion after 3h run. Total: 12 failures, 83% unrelated to PR.

Issue

Twelve CI jobs have failed across multiple workflow runs:

1-2. integration-tests (2 Java jobs) - TagResourceIT failures
3. py-run-tests (3.11) - 11 Python lineage failures
4-8. py-run-tests (3.10) - 5 S3 failures
9-10. playwright-ci-postgresql - 2 E2E jobs (shards 4,6 and 5,6)
11. maven-postgresql-ci - 1 Apps + 3 AWS test errors
12. maven-sonarcloud-ci (job 61511518768) - Same AWS test errors + database issues (NEW)


Java Integration Tests: TagResourceIT.test_recognizerPaginationEndpoint

Status: Fails in MySQL+Elasticsearch AND PostgreSQL+OpenSearch

Test Failure:

  • Expected: Recognizer_19
  • Actual: Recognizer_29

Root Cause

Backward pagination bug in TagRepository.listRecognizersBefore() at line 1143

Fix: Pass null as the after parameter

PR Related: ✓ Yes


Maven SonarCloud CI (NEW FAILURE)

Job: maven-sonarcloud-ci - job 61511518768

Status: Build failed after 3 hours 20 minutes

Failure: Test failures in openmetadata-service

Errors Identified

1. AwsCredentialsUtilTest (3 errors)

  • Same as maven-postgresql-ci: IllegalArgumentException: AWS credentials not configured
  • Tests checking AWS credentials configuration/validation

2. Database Resource Exhaustion

SearchIndexException: Failed to read batch for entityType: tag. 
Error: java.sql.SQLException: Out of sort memory, consider increasing server sort buffer size
  • SQL query: SELECT json FROM tag WHERE TRUE ORDER BY name, id LIMIT 15
  • Database ran out of sort memory during tag entity indexing

3. RdfIndexApp Installation Failures

  • Repeated warnings: Application [RdfIndexApp] Exception [install] due to [null]
  • 25+ occurrences throughout the 3+ hour run
  • Application installation/configuration issues

Analysis

Why This is NOT PR-Related:

This PR only modifies tag recognizer pagination code:

  • TagRepository.java - pagination logic
  • TagResource.java - REST endpoints
  • BadCursorException.java - exception class
  • Integration tests

Zero changes to:

  • AWS credentials handling
  • Database configuration or buffer sizes
  • RdfIndexApp application
  • Search indexing logic beyond pagination
  • SonarCloud configuration

Root Cause Analysis:

AwsCredentialsUtilTest (Same as maven-postgresql-ci):

  • AWS credentials/IAM configuration missing in CI environment
  • Not code issue - environment configuration

Database "Out of sort memory":

  • SQL query is basic tag selection with sorting
  • This is a CI environment resource issue, not code bug
  • Database server doesn't have enough sort buffer allocated
  • Query itself is not introduced by this PR - it's existing indexing logic
  • After 3+ hours of testing, database resources likely exhausted

RdfIndexApp failures:

  • Application installation/configuration issue
  • Unrelated to tag recognizer pagination
  • Pre-existing or environment-specific problem

Long Build Time (3h 20min):

  • Extremely long compared to typical builds
  • Indicates resource contention, slow environment, or configuration issues
  • Maven + SonarCloud analysis + testing is resource-intensive
  • Similar to maven-postgresql-ci (2h 42min)

Recommendation

This maven-sonarcloud-ci failure is infrastructure/environment/configuration issue unrelated to the PR:

  1. Not caused by PR code changes (PR only touches tag pagination)
  2. AWS credentials: Environment configuration issue (same as maven-postgresql-ci)
  3. Database memory: CI environment resource exhaustion after 3+ hour run
  4. RdfIndexApp: Application configuration issue
  5. Build time: 3h 20min indicates severe environment/resource problems

Pattern: This is essentially the same failure pattern as maven-postgresql-ci (job 11) plus additional resource exhaustion from the longer SonarCloud analysis run.

PR Related: ✗ No - Infrastructure/environment/configuration issues


Summary

Job Category Tests Failure Type PR Related Count Pass Rate
Java integration TagResourceIT Code bug ✓ Yes 2 N/A
Maven CI Apps/AWS/DB Timing/Config/Resource ✗ No 2 ~99.95%
Playwright E2E UI tests Flaky (2/6 shards) ✗ No 2 67% shards
Python S3 test_s3_ingestion Infrastructure ✗ No 5 0%
Other Various Test isolation/infra ✗ No 2 N/A

Critical Statistics:

  • 12 total failed CI jobs
  • Only 2 jobs (17%) are PR-related
  • 10 jobs (83%) fail due to unrelated issues
  • Maven jobs: 2 with similar issues (AWS config, resource exhaustion)
  • S3 test: Still 0% pass rate (critical infrastructure issue)
  • Playwright: 2/6 shards failed (flaky/environment)

Required for PR: Fix backward pagination bug in TagRepository.java:1143

Infrastructure/Environment Issues:

  1. URGENT: S3/STS API incompatibility (5 failures)
  2. URGENT: Maven CI environment resources (3h+ builds, memory exhaustion)
  3. High: AWS credentials configuration (2 Maven jobs)
  4. Monitor: RdfIndexApp configuration
  5. Monitor: Playwright E2E stability (2/6 shards)
  6. Monitor: Python lineage test isolation

Strong Recommendation: This PR should NOT be blocked by maven-sonarcloud-ci failures or any of the other infrastructure/environment issues (83% of failures). The maven-sonarcloud-ci failure is a duplicate of the maven-postgresql-ci pattern with additional resource exhaustion from the long analysis run.

Code Review ⚠️ Changes requested 0 resolved / 3 findings

New paginated recognizers endpoint with two issues remaining: duplicate OpenAPI operationId causing spec conflicts, and backward pagination bug where the wrong cursor parameter is passed.

🚨 Bug: Backward pagination passes wrong cursor parameter

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java:561

The listRecognizersBefore method has a critical bug where it validates the before cursor but then passes the same raw before string to listRecognizersAfter as the after parameter (line 561). This causes two problems:

  1. The beforeId that was parsed and validated is never used
  2. listRecognizersAfter receives before as its second param, which it will then try to parse again as an after cursor

The backward pagination logic is fundamentally broken because the cursor string is passed through unchanged.

Suggested fix: Pass null as the after cursor and rely on the reversed list to handle pagination naturally, OR refactor to encode the cursor properly for the reversed list:

return listRecognizersAfter(recognizers.reversed(), null, limit);

Then handle the cursor-based starting position differently in the calling code.

⚠️ Bug: Unused `beforeId` variable - parsed cursor value is discarded

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TagRepository.java:548

In the listRecognizersBefore method, the beforeId variable is parsed from the cursor at line 548, but is never used. The method then calls listRecognizersAfter(recognizers.reversed(), before, limit) at line 558, passing the original raw before cursor string instead of the validated beforeId.

This means:

  1. The cursor parsing at line 548 is effectively dead code
  2. beforeId is extracted but never referenced again
  3. The validation is performed twice (once here, once in listRecognizersAfter)

The method needs restructuring to properly use the parsed cursor value.

⚠️ Quality: Duplicate OpenAPI operationId will cause spec conflicts

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/tags/TagResource.java:811-814 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/tags/TagResource.java:857-860

Both REST endpoints (listRecognizersByTagId and listRecognizersByTagFQN) use the same operationId = "listATagsRecognizers". OpenAPI specification requires unique operation IDs across all endpoints. This will cause issues with:

  1. OpenAPI spec validation
  2. SDK code generation (client generators use operationId to generate method names)
  3. API documentation tools

Suggested fix: Use unique operation IDs for each endpoint:

For the ID-based endpoint:

operationId = "listTagRecognizersById"

For the FQN-based endpoint:

operationId = "listTagRecognizersByFQN"

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Comment on lines 811 to +814
}

@GET
@Path("{id}/recognizers")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Quality: Duplicate OpenAPI operationId will cause spec conflicts

Details

Both REST endpoints (listRecognizersByTagId and listRecognizersByTagFQN) use the same operationId = "listATagsRecognizers". OpenAPI specification requires unique operation IDs across all endpoints. This will cause issues with:

  1. OpenAPI spec validation
  2. SDK code generation (client generators use operationId to generate method names)
  3. API documentation tools

Suggested fix: Use unique operation IDs for each endpoint:

For the ID-based endpoint:

operationId = "listTagRecognizersById"

For the FQN-based endpoint:

operationId = "listTagRecognizersByFQN"

Was this helpful? React with 👍 / 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

governance Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants