Skip to content

web-server/dispatch should facilitate handling trailing "/"s on URLs #59

@LiberalArtist

Description

@LiberalArtist

I recently discovered that dispatch-rules and related forms make it hard to handle trailing /s on URLs sensibly.

The URL https://example.com/foo/ (with a trailing /) is treated differently than https://example.com/foo (without a trailing /). Strictly speaking, this is correct: those are two distinct URLs, and in principle one could serve completely unrelated content at each. In practice, though, users don't expect such URLs to be distinct. Serving different non-error content at one than the other is a terrible idea, and even returning a successful response from one variant but, e.g., a 404 error from the other (as my sight was doing) is likely to cause confusion. I expect most programs will want to either redirect the less-preferred variant to the canonical variant or simply treat both variants as equivalent.

Unfortunately, web-server/dispatch doesn't provide a great way to implement such behavior. Here's an example in code of the current state of affairs:

#lang racket

(require web-server/dispatch
         web-server/http
         web-server/servlet-env
         net/url
         rackunit)

(define ((handler str) req)
  (response/output
   (λ (out) (write-string str out))))

(define start
  (dispatch-case
   [() (handler "a")]
   [("") (handler "b")]
   [("foo") (handler "c")]
   [("foo" "") (handler "d")]
   [("foo" "" "") (handler "e")]
   [("foo" "" "bar") (handler "f")]))

(define (do-request str)
  (port->string (get-pure-port (string->url str))))

(check-equal?
 (let ([th (thread
            (λ ()
              (serve/servlet start
                             #:servlet-regexp #rx""
                             #:banner? #f
                             #:launch-browser? #f)))])
   (sleep 5)
   (begin0 (list (do-request "http://localhost:8000")
                 (do-request "http://localhost:8000/")
                 (do-request "http://localhost:8000/foo")
                 (do-request "http://localhost:8000/foo/")
                 (do-request "http://localhost:8000/foo//")
                 (do-request "http://localhost:8000/foo//bar"))
           (kill-thread th)))
 '("b" "b" "c" "d" "e" "f"))

This illustrates a few things:

  • Because of how HTTP requests work, the root URL is always equivalent with or without a trailing / and has a single, empty path element ("").
  • () is allowed as a pattern, but (due to the above) will never match anything. I think it might be better to make this a syntax error (or a warning that will become an error in a future release).
  • A trailing / adds an empty path element ("") to the end of the URL.
  • There is no cleanse-path-like case for multiple adjacent / separators.

I have changed my application to handle trailing / separators by using dispatch-rules+applies instead of dispatch-rules (which involved removing my else clause) and, if the original request does not satisfy the predicate from dispatch-rules+applies but a version with a normalized path would, responding with a redirect to the normalized path.

I think it would be better to extend web-server/dispatch to support this sort of thing, but I'm not sure yet what the best way would be to do so. A few things I've been thinking about so far:

  • The current language for dispatch-rules patterns doesn't have a notion of "splicing" patterns: each string literal or bidi-match-expander use applies to a single path element.
  • Simply giving the same response to all variants seems relatively straight-forward to add as a keyword option. However, that is (at least arguably) less optimal than redirecting to the canonical URL. Redirects open another can of worms, though, since 301 Moved Permanently has issues for methods other than GET and HEAD and 308 Permanent Redirect doesn't have a straightforward fallback for older browsers. I would want to do a 301 Moved Permanently for GET and HEAD requests and 307 Temporary Redirect for other methods, which RFC 7231 suggests is reasonable. The higher-level question is the trade-off between a simple API that "does the right thing" and configurability.
  • Building on that theme, there are some cases other than trailing / support where a similar ability to handle variants on canonical URLs would be nice: for example, case-insensitivity. I can imagine possible extensions to the dispatch-rules API to add a general concept of non-canonical URLs, and it is appealing from one perspective to avoid making trailing / support a baked-in special feature. On the other hand, though, there seems a risk of getting beyond the scope of this library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions