Skip to content

Correctly URI-encode HTTP Label values#1668

Merged
kubukoz merged 12 commits intodisneystreaming:series/0.18from
Dwolla:uri-encoding
Jul 23, 2025
Merged

Correctly URI-encode HTTP Label values#1668
kubukoz merged 12 commits intodisneystreaming:series/0.18from
Dwolla:uri-encoding

Conversation

@bpholt
Copy link
Copy Markdown
Contributor

@bpholt bpholt commented Mar 26, 2025

I was a little surprised to find that I didn't need to modify toSmithy4sHttpUri to make this test pass, so I wanted to get some feedback before I get too much further into modifying AwsClient and SimpleRestJson as laid out in #1663.

Needs disneystreaming/alloy#210

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@bpholt bpholt marked this pull request as ready for review April 2, 2025 21:31
else {
// Other services double encode and normalize

if (preparedRequest.uri.path.segments.exists(s => dotSegments.contains(s.encoded))) {
Copy link
Copy Markdown
Contributor Author

@bpholt bpholt Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I think it would also be valid to simply use java.net.URI#normalize regardless of whether the path contains any dot segments. (The tests pass if I make that change.) I can switch to that if you'd prefer.

(That would also remove the reference to org.http4s.internal.CharPredicate.AlphaNum on line 163, which may be problematic since it appears to be an internal http4s API?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's move 👍, thank you for raising the concern

@bpholt
Copy link
Copy Markdown
Contributor Author

bpholt commented Apr 2, 2025

@kubukoz @Baccata I think this is ready for review (CLA aside), at least for the portions that are specific to AWS. I think we agreed in Discord that the signing changes don't need to be configurable from Smithy traits, so I didn't make any changes in that direction.

I wasn't sure if the changes to fromSmithy4sHttpUri and fromSmithy4sHttpRequest should be propagated out or not. Right now, I think @simpleRestJson would not use the new encoding rules (although it probably should), because SimpleRestJsonCodecs#makeClientCodecs doesn't set encodeReservedCharacters to true, but requests going to AWS will use the new rules since AwsClient does set that flag.

val genPathSegment: Gen[String] =
Gen
.frequency(
1 -> Gen.oneOf("..", "."),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path normalization should remove these when it's enabled, so these dot segments are sometimes added to the path to make sure that happens.

),
1 -> Gen.oneOf(
// this list from https://github.com/http4s/http4s/blob/ac5c5cc40f732df238ef38c8ef8f571b4e66bfe3/core/shared/src/main/scala/org/http4s/Uri.scala#L1069
"!$&'()*+,;=:/?@".toIndexedSeq
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are characters that should get encoded if they're present in a path segment and encoding is enabled.

httpMethod <- Gen.oneOf(SdkHttpMethod.values().toList)
host <- Gen.identifier
path <- Gen.listOfN(3, Gen.identifier).map(_.mkString("/"))
path <- Gen
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was working on this and trying to address some edge cases, I commented out this generator and used a variety of generators specific to the edge case I was working on. For example, things like Gen.const("foo/../."), Gen.const("this:might:be:an:aws:arn"), etc. Unfortunately I didn't save the specifics, but they should be encoded in this generator.

@bpholt
Copy link
Copy Markdown
Contributor Author

bpholt commented Apr 4, 2025

Unfortunately I think this implementation escapes too much. After running the protocol compliance suite from Alloy and Smithy itself, I noticed that the spec actually says only values inserted into the URI via the httpLabel trait should actually be escaped in this way. Notably, literal values that were already in the URI should not be escaped.

(Or at least, that's how I'm now reading the spec: "Patterns with no labels will match only requests containing the exact literal characters declared in the pattern", and then "Every label … MUST have the httpLabel trait". Then, the serialization rules that describe escaping only apply to the httpLabel trait.)

I don't think we have enough information in fromSmithy4sHttpUri to make that distinction, so I'm taking a step back. My current thought is to try to change smithy4s.http.internals.SchemaVisitorPathEncoder and maybe smithy4s.internals.SchemaVisitorPatternEncoder to escape labelled values, using the escaping algorithm used by Smithy.

@bpholt bpholt marked this pull request as draft April 4, 2025 23:07
@bpholt bpholt force-pushed the uri-encoding branch 2 times, most recently from cc8b59e to 37607ab Compare April 8, 2025 21:16
@bpholt bpholt changed the title Add option to URI-encode all reserved characters Correctly URI-encode HTTP Label values Apr 9, 2025
@bpholt bpholt marked this pull request as ready for review April 10, 2025 01:48
@bpholt
Copy link
Copy Markdown
Contributor Author

bpholt commented Apr 10, 2025

FWIW, I think we need to merge disneystreaming/alloy#210 before this will be fully tested by the Alloy compliance tests, but when I ran that locally before rebasing, it passed. (For some reason I can't get it to run now after pulling the latest updates; I get errors like "Unable to resolve trait aws.protocols#ec2QueryName" when executing http4s / Test / managedResources and http4s / Test / envVars. Maybe I'm not publishing Alloy correctly locally? I'm not very familiar with Mill.)

@bpholt
Copy link
Copy Markdown
Contributor Author

bpholt commented Apr 10, 2025

case AwsProtocol.AWS_EC2_QUERY(_) =>
AwsEcsQueryCodecs.make[F](version = service.version)
case AwsProtocol.AWS_JSON_1_0(_) =>
AwsJsonCodecs.make[F]("application/x-amz-json-1.0")
case AwsProtocol.AWS_JSON_1_1(_) =>
AwsJsonCodecs.make[F]("application/x-amz-json-1.1")
case AwsProtocol.AWS_QUERY(_) =>
// TODO retain user encoding when applying compression
AwsQueryCodecs.make[F](version = service.version)
case AwsProtocol.AWS_REST_JSON_1(_) =>
AwsRestJsonCodecs.make[F]("application/json")
case AwsProtocol.AWS_REST_XML(_) =>
AwsRestXmlCodecs.make[F]()

Other than AWS_REST_JSON_1, should these codec constructors use the HTTP label encoding changes I introduced, or should I leave the old behavior in place?

Comment on lines +22 to +23
def encode(a: A, urlEncode: Boolean): List[String]
def encodeGreedy(a: A, urlEncode: Boolean): List[String]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it makes sense to do it at this level to be honest. I'd rather it was a constructor parameter (in the SchemaVisitorPathEncode construct, for instance).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to implement it at the instance level and also maintain the distinction between the old path and the new encodedPath methods in the implementation of HttpEndpoint[I].

I guess I could change SchemaVisitorPathEncoder from an object to a class, and then construct two PathEncoder[I]s in HttpEndpoint.cast, where HttpEndpoint[I]s are constructed, in case downstream code calls the old path method? Is that what you meant?

In addition to not disturbing the public API (most of this work has been to internal APIs, but HttpEndpoint is not marked as internal), we also need to maintain the distinction in behavior in order to continue the existing behavior in the AWS codecs other than AWS_REST_JSON_1, as requested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fd284bd.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that is indeed what I meant 👍

@Baccata
Copy link
Copy Markdown
Contributor

Baccata commented Apr 11, 2025

Other than AWS_REST_JSON_1, should these codec constructors use the HTTP label encoding changes I introduced, or should I leave the old behavior in place?

Let's leave it unchanged for now and revisit on a case-by-case basis (unless we find some evidence by analysis the spec that some protocols are inherently and unilateraly abiding by one rule)

@Baccata
Copy link
Copy Markdown
Contributor

Baccata commented Apr 11, 2025

A few comments, but great work overall !

@bpholt
Copy link
Copy Markdown
Contributor Author

bpholt commented Apr 16, 2025

Other than AWS_REST_JSON_1, should these codec constructors use the HTTP label encoding changes I introduced, or should I leave the old behavior in place?

Let's leave it unchanged for now and revisit on a case-by-case basis (unless we find some evidence by analysis the spec that some protocols are inherently and unilateraly abiding by one rule)

FYI, it looks like AWS_REST_XML also requires encoding; without it, some of the compliance tests fail.

Copy link
Copy Markdown
Member

@kubukoz kubukoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an entry in the changelog file, it'll be 0.18.40 at this point

@bpholt bpholt force-pushed the uri-encoding branch 2 times, most recently from 7f913d6 to 6faed70 Compare July 18, 2025 17:49
)
def path(input: I): List[String]

def encodedPath(input: I): List[String]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised this new method didn't trigger a MiMa error, but I didn't see anything that would have suppressed it, so I'm hoping it's ok.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it either, tbh. At least the hope is that nobody creates HttpEndpoint manually (as in, not via its unapply and cast)

btw. maybe we should seal this trait.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a similar case and it emits ReversedMissingMethod. We actually have a filter for all of these (ProblemFilters.exclude[ReversedMissingMethodProblem]("smithy4s.*") in project/MimaVersionPlugin.scala) because it's a forward-compatibility problem.

personally I don't buy it, it's a binary risk like any other. What saves us is that nobody else should be implementing HttpEndpoint, as per the comment above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, makes sense. I missed MimaVersionPlugin.

personally I don't buy it, it's a binary risk like any other. What saves us is that nobody else should be implementing HttpEndpoint, as per the comment above.

Are you suggesting I change something so that it wouldn't emit a ReversedMissingMethod error if they weren't being suppressed? I'm not sure what the best way to do that is; none of the options I can think of are very good. (e.g. use ???, or delegate to path)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting any particular action in this case, just a more general wondering...ment for the other maintainers (@Baccata I'm looking at you) :)

in this case I think we're pretty safe because in practice this is "kinda sealed"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I made the changes necessary to seal HttpEndpoint, but I'm happy to revert them if that's not the path we want to take.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I wasn't suggesting to actually seal it yet.

I would probably keep it open - the reasoning is, if someone has a custom implementation, their experience will be:

  1. runtime linkage error
  2. they add the missing implementation, whatever it is
  3. no more error

whereas if we seal it now, they may be unable to build with the new smithy4s at all.

Not super realistic, but who knows. I'd leave this kind of change for 0.19

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, works for me. I unsealed it.

@bpholt bpholt force-pushed the uri-encoding branch 2 times, most recently from 4f9bffc to 42e38ff Compare July 22, 2025 20:38
Copy link
Copy Markdown
Member

@kubukoz kubukoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just waiting for https://github.com/disneystreaming/alloy/actions/runs/16455931297 now so that we can bump alloy here

@kubukoz kubukoz merged commit 53ce7d4 into disneystreaming:series/0.18 Jul 23, 2025
11 checks passed
@bpholt bpholt deleted the uri-encoding branch July 23, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants