Skip to content

Conversation

@ardatan
Copy link
Member

@ardatan ardatan commented Dec 22, 2025

Ref ROUTER-235

User reported that they have a schema like following;

interface IContent {
  id: ID!
}

interface IElementContent {
  id: ID!
}

type ContentAChild {
  title: String!
}

type ContentA implements IContent & IElementContent {
  id: ID!
  contentChildren: [ContentAChild!]!
}

type ContentBChild {
  title: String!
}

type ContentB implements IContent & IElementContent {
  id: ID!
  contentChildren: [ContentBChild!]!
}

type Query {
  contentPage: ContentPage!
}

type ContentPage {
  contentBody: [ContentContainer!]!
}

type ContentContainer {
  id: ID!
  section: IContent
}

And when they send the following query;

query {
  contentPage {
    contentBody {
      section {
        ...ContentAData
        ...ContentBData
      }
    }
  }
}

fragment ContentAData on ContentA {
  contentChildren {
    title
  }
}

fragment ContentBData on ContentB {
  contentChildren {
    title
  }
}

The subgraph returns the following data;

{
            "contentPage": {
                    "contentBody": [
                        {
                            "id": "container1",
                            "section": {
                                "__typename": "ContentA",
                                "id": "contentA1",
                                "contentChildren": []
                            }
                        },
                        {
                            "id": "container2",
                            "section": {
                                "__typename": "ContentB",
                                "id": "contentB1",
                                "contentChildren": [
                                    {
                                        "title": "contentBChild1"
                                    }
                                ]
                            }
                        }
                    ]
                }
        }

But the router responds like below;

{
            "contentPage": {
                    "contentBody": [
                        {
                            "id": "container1",
                            "section": {
                                "__typename": "ContentA",
                                "id": "contentA1",
                                "contentChildren": []
                            }
                        },
                        {
                            "id": "container2",
                            "section": {
                                "__typename": "ContentB",
                                "id": "contentB1",
                                "contentChildren": null
                            }
                        }
                    ]
                }
        }

Initially I assumed that the subgraph returns the data with __typenames like below;

            "__typename": "Query",
            "contentPage": [
                {
                    "__typename": "ContentPage",
                    "contentBody": [
                        {
                            "__typename": "ContentContainer",
                            "id": "container1",
                            "section": {
                                "__typename": "ContentA",
                                "contentChildren": []
                            }
                        },
                        {
                            "__typename": "ContentContainer",
                            "id": "container2",
                            "section": {
                                "__typename": "ContentB",
                                "contentChildren": [
                                    {
                                        "__typename": "ContentBChild",
                                        "title": "contentBChild1"
                                    }
                                ]
                            }
                        }
                    ]
                }
            ]
        }

And I saw that projection puts null to contentChildren on the second item because it expects ContentBChild but it gets ContentAChild. I realized that something is wrong with the plan merging logic when it is nested, it was duplicating the fields in the plan so for each possible type, the field is serialized like "id": "value", "id": "value" because it wasn't merging the same fields when they are nested (so it can have multiple plans for the same field).
So I did a refactor to apply the plan merging nestedly but in order to do that I needed to convert Vec usage to IndexMap which is needed by merging logic. Then this ended up changing all Vec<Plan> types to IndexMaps.

The merging was fine but it was still getting ContentBChilds typename as ContentAChild because it couldn't get __typename from the array and it was using plan.field_type as a fallback incorrectly. Then I did another refactor to apply condition checking for arrays item-by-item instead of using the fallback value directly, because fallback value was always the first possible item of the abstract field type (ContentAChild) in our case while it can be ContentBChild.

Then the first test started passing as expected but the user's issue is not still solved. Why? Because it was never sending __typename s for non-abstract values because planner didn't need it to be sent. So I dropped the fallback logic for __typename, and check for __typenames only when it is there which made fallback logic unnecessary and I dropped field_type from the projection plan entirely.
So the second test case which is user's actual issue started passing.

While working on this, I dropped unnecessary Arc usage, and added a helper function get_value_by_key to reduce some duplicate logic in the code, I also improved tracing messages so we can get more information about projection condition checks.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ardatan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue, ROUTER-235, by refactoring how projection plans are managed within the router's authorization, normalization, and execution phases. The change involves migrating from a simple vector to an ordered map (IndexMap) for storing FieldProjectionPlan instances. This ensures that the order of fields is preserved and allows for more robust handling of projections, particularly when dealing with GraphQL fragments and authorization modifications. A dedicated end-to-end test has been introduced to validate the correctness of these changes.

Highlights

  • Projection Plan Data Structure Update: The core data structure for FieldProjectionPlan collections has been changed from a Vec (vector) to an IndexMap<String, FieldProjectionPlan> across the router's pipeline. This ensures ordered and key-accessible projection plans.
  • New Dependency Introduced: The indexmap crate, version 2.10.0, has been added as a new dependency to the router.
  • Reproduction Test for ROUTER-235: A new end-to-end test (reprod_router_235) has been added to reproduce and verify the fix for an issue identified as ROUTER-235. This test specifically targets GraphQL queries involving fragments and complex data structures.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for ROUTER-235 by changing the data structure for projection plans from Vec<FieldProjectionPlan> to IndexMap<String, FieldProjectionPlan>. This is a crucial change to correctly handle the merging of fields from different fragments, thus preventing field duplication in the final response. The change is consistently applied across the authorization, normalization, and execution pipelines. A new end-to-end test has been added to reproduce the original issue and verify the fix. The changes are sound and address the problem effectively. I have one suggestion to refactor a loop in the authorization rebuilding logic to improve its readability, in line with the repository's style guide.

@github-actions
Copy link

github-actions bot commented Dec 22, 2025

k6-benchmark results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 210018      ✗ 0    
     data_received..................: 6.1 GB  204 MB/s
     data_sent......................: 82 MB   2.7 MB/s
     http_req_blocked...............: avg=4.01µs   min=722ns   med=1.81µs  max=11.28ms  p(90)=2.56µs  p(95)=2.91µs  
     http_req_connecting............: avg=1.31µs   min=0s      med=0s      max=3.92ms   p(90)=0s      p(95)=0s      
     http_req_duration..............: avg=20.96ms  min=2.43ms  med=20.05ms max=163.85ms p(90)=28.24ms p(95)=31.46ms 
       { expected_response:true }...: avg=20.96ms  min=2.43ms  med=20.05ms max=163.85ms p(90)=28.24ms p(95)=31.46ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 70026
     http_req_receiving.............: avg=142.39µs min=26.51µs med=40.39µs max=110.73ms p(90)=87.69µs p(95)=388.65µs
     http_req_sending...............: avg=27.01µs  min=5.66µs  med=10.78µs max=44.55ms  p(90)=15.66µs p(95)=26.05µs 
     http_req_tls_handshaking.......: avg=0s       min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=20.79ms  min=2.38ms  med=19.92ms max=65.87ms  p(90)=28ms    p(95)=31.15ms 
     http_reqs......................: 70026   2327.932343/s
     iteration_duration.............: avg=21.42ms  min=5.73ms  med=20.4ms  max=204.08ms p(90)=28.71ms p(95)=31.99ms 
     iterations.....................: 70006   2327.267467/s
     vus............................: 50      min=50        max=50 
     vus_max........................: 50      min=50        max=50 

@github-actions
Copy link

github-actions bot commented Dec 22, 2025

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-629 ghcr.io/graphql-hive/router:sha-86ccd6b

Docker metadata
{
"buildx.build.ref": "builder-90c01579-36cc-49db-b362-66463dfec095/builder-90c01579-36cc-49db-b362-66463dfec0950/qz59t1vq2ifkijtof90fho41e",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:624c8ffaf29c6931a85afdf4a9bc031b7a56959168ecbf95db898f5fdba94261",
  "size": 1609
},
"containerimage.digest": "sha256:624c8ffaf29c6931a85afdf4a9bc031b7a56959168ecbf95db898f5fdba94261",
"image.name": "ghcr.io/graphql-hive/router:pr-629,ghcr.io/graphql-hive/router:sha-86ccd6b"
}

@ardatan ardatan force-pushed the reprod-235 branch 2 times, most recently from 5d63d28 to 704979d Compare December 22, 2025 23:26
@ardatan ardatan changed the title Reproduction for ROUTER-235 fix: handle projection conditions per item in an array, and merge plans nestedly Dec 22, 2025
@ardatan ardatan marked this pull request as ready for review December 23, 2025 13:06
@ardatan ardatan requested a review from kamilkisiela December 23, 2025 13:11
@ardatan ardatan enabled auto-merge (squash) December 23, 2025 13:12
@kamilkisiela
Copy link
Contributor

This is how I understand the problem.

The projection plan expects ContentAChild when ContentA is a parent, and ContentBChild when ContentB is parent, but because there’s no __typename in the response (as it's not needed, since it's not an abstract type), the response projection takes the type from the field projection plan struct, that is a product of merging two field plans (one from ContentB and one from ContentA - they come from the inline fragments).

Since the field projection plan can only reference a single type, then the merged state is broken as it represents only one of the types - ContentA in this case, it picks whatever is first.

That’s why projecting object of type ContentB fails as none of the conditions are met. The state says ContentB -> ContentAChild , but the conditions are: ContentB -> ContentBChild OR ContentA -> ContentAChild.

@kamilkisiela kamilkisiela changed the base branch from main to kamil-projection-dev January 2, 2026 10:23
@kamilkisiela kamilkisiela changed the base branch from kamil-projection-dev to main January 2, 2026 10:23
@dotansimha
Copy link
Member

Replaced by #633

@dotansimha dotansimha closed this Jan 11, 2026
auto-merge was automatically disabled January 11, 2026 16:04

Pull request was closed

@ardatan ardatan deleted the reprod-235 branch January 21, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants