Skip to content

Support http refs in Json Schema#273

Merged
lewisjkl merged 40 commits intodisneystreaming:mainfrom
Jacoby6000:support-http-refs
Sep 5, 2025
Merged

Support http refs in Json Schema#273
lewisjkl merged 40 commits intodisneystreaming:mainfrom
Jacoby6000:support-http-refs

Conversation

@Jacoby6000
Copy link
Contributor

@Jacoby6000 Jacoby6000 commented Aug 15, 2025

This PR adds support for json-schema refs that point to an http endpoint.

The majority of the changes are:

  • Updating the RefParser to output a ParsedRef which may be either a Remote or Local ref
  • Introducing the CompilationUnitResolver which acts as a pre-compilation unfold. It accepts the compiler input outputs all related compilation units ($defs, definitions, and remote references). We can move this in to the main JsonSchemaToIModel refold if necessary, but I think it makes sense to remain a pre-compilation step, because we can do things like remapping namespaces and removing duplication before entering the compilation phase. We now have a list of all compilation units (top level definitions) before entering the main JsonSchemaToIModel refold.
  • Updating the JsonSchemaToIModel class to accept CompilationUnit as input
  • Updating the TestUtils class to support expected generated smithy specs with no associated json schema (because they come from a remote server), and adding support for running tests with custom compiler options
  • Updated the CLI/json schema pipeline to support namesapace remapping.
    • eg: --remap-namespace foo.bar:baz.woozle will remap foo.bar.* to baz.woozle.*
  • Updated the CLI/json schema pipeline to accept a set of base paths whitelisting external refs.
    • eg: --allow-remote-base-url https://example.com/my/specs would allow remote references to any url starting with the specified url. The evaluated root namespace of this would be com.example.my.specs, which can be remapped using the aforementioned CLI arg.

The test suite works by spinning up a SimpleFileServer, available from JDK 18 onward (let me know if that's a deal breaker). The SimpleFileServer serves files from a specified directory. For the tests, this is a temporary directory which has the sample json-schemas written to it.

The majority of the diff comes from new test cases.

@Jacoby6000
Copy link
Contributor Author

Jacoby6000 commented Aug 15, 2025

Just realized I should add a test where the referenced schema is exclusively on the fileserver.

My usecase involved a schema that mixes local and remote refs, despite the remote refs having local definitions, so I hadn't caught this. I suspect this will significantly increase the complexity.

Resolving the reference works fine with the current implementation, but the model is not actually assembled from the remote server.

@Jacoby6000
Copy link
Contributor Author

Jacoby6000 commented Aug 18, 2025

Upon further inspection, this fix doesn't really fix anything... It makes the json-schema loader happy, but then when the smithy model is being assembled, any remotely fetched json schemas get dropped and they're silently ignored in the smithy model. I'll look in to working them in to the model Fixed and updated the OP

@Jacoby6000
Copy link
Contributor Author

Jacoby6000 commented Aug 18, 2025

Just about finished with the changes to get this to pull in the remotely referenced schemas. Ready for review

@lewisjkl
Copy link
Collaborator

Hey, just catching up here.

This might prove to be a bit tricky seeing as how we are currently not creating compilers within the context of IO to capture effects like making external http requests. It might be a pretty big refactor to change it to support that, but I'm certainly open to seeing a PR that went this direction

@Jacoby6000 Jacoby6000 changed the title Support http refs Support http refs in Json Schema Aug 20, 2025
@Jacoby6000
Copy link
Contributor Author

This is ready to review now. I still havent implemented any usage of IO, though it should be fairly simple to add to the RemoteRefResolver when that time comes.

@Jacoby6000
Copy link
Contributor Author

Wound up running to a few more issues along the way and had to implement namespace remapping as a part of this in order to support our usecase. Namespaces had to be remapped before getting in to the smithy model. If remapping was done after the fact, remote sources that conflict with local sources could be pulled and cause a lot of confusion. The way it is now, local sources will always be used when a remote ref is referencing a locally available namespace.

@lewisjkl
Copy link
Collaborator

Updated the CLI/json schema pipeline to support namesapace remapping.

Question on this, was this functionality necessary for the remote schemas to work, or is this more of a separate thing? I just want to make sure I am following here

@Jacoby6000
Copy link
Contributor Author

Jacoby6000 commented Aug 29, 2025

It wound up being necessary to move forward. The OpenLineage spec is served in a bit of an awkward fashion. The folder structure is a bit strange, where the version is always at the end of the namespace

eg:

  • https://openlineage.io/spec/2-0-2/*
  • https://openlineage.io/spec/facets/2-0-2/*

This produces namespaces io.openlineage.spec.2_0_2.* and io.openlineage.spec.facets.2_0_2.*

This presented two problems:

  1. Smithy doesn't like namespace segments with numbers at the start
  2. I don't actually want the version in the namespace at all (or spec for that matter)

I could move it to a separate PR if you'd like.

@lewisjkl
Copy link
Collaborator

It wound up being necessary to move forward. The OpenLineage spec is served in a bit of an awkward fashion. The folder structure is a bit strange, where the version is always at the end of the namespace

eg:

  • https://openlineage.io/spec/2-0-2/*
  • https://openlineage.io/spec/facets/2-0-2/*

This produces namespaces io.openlineage.spec.2_0_2.* and io.openlineage.spec.facets.2_0_2.*

This presented two problems:

  1. Smithy doesn't like namespace segments with numbers at the start
  2. I don't actually want the version in the namespace at all (or spec for that matter)

I could move it to a separate PR if you'd like.

That makes sense, just wanted to clarify whether it was directly related or just related to your use case. I think it's fine to leave in this PR as it isn't that much extra going on.

Copy link
Collaborator

@lewisjkl lewisjkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @Jacoby6000, thanks again for the contribution! A few minor comments throughout and then if we could also add some docs / update docs in https://github.com/disneystreaming/smithy-translate/blob/main/modules/docs/json_schema.md , that'd be great as well.

Probably should have a note in the docs or at least in the ScalaDoc comment for JsonSchemaCompiler that if remote references are looked up, it will be using untracked IO. Just so there are no surprises there.

Anyway, thanks again and let me know if any of my feedback doesn't make sense and I will clarify. Cheers!

@Jacoby6000
Copy link
Contributor Author

@lewisjkl This is ready for re-evaluation

val dir = os.temp.dir()
val serverAddress = new InetSocketAddress("localhost", 0)

// N.B. : Requires at least JDK 18.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and remove this comment now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@lewisjkl lewisjkl merged commit ce37713 into disneystreaming:main Sep 5, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants