Skip to content

urljoin works incorrectly for two path-relative URLs involving . and .. #96015

@andersk

Description

@andersk

Bug report

urllib.parse.urljoin is usually used to join a normalized absolute URL with a relative URL, and it generally works for that purpose. But if it’s used to join two path-relative URLs, it produces incorrect results in many cases when . or .. is involved.

>>> from urllib.parse import urljoin
>>> urljoin('a', 'b')  # ok
'b'
>>> urljoin('a/', 'b')  # ok
'a/b'
>>> urljoin('a', '.')  # expected . or ./
'/'
>>> urljoin('a', '..')  # expected .. or ../
'/'
>>> urljoin('..', 'b')  # expected ../b
'b'
>>> urljoin('../a', 'b')  # expected ../b
'b'
>>> urljoin('a', '../b')  # expected ../b
'b'
>>> urljoin('../a', '../b')  # expected ../../b
'b'
>>> urljoin('a/..', 'b')  # expected b
'a/b'

There are also some problems when the base is a non-normalized absolute URL:

>>> urljoin('http://host/a/..', 'b')  # expected http://host/b
'http://host/a/b'

Your environment

  • CPython versions tested on: 3.10.5, 3.11.0rc1
  • Operating system and architecture: NixOS 21.11 amd64

Metadata

Metadata

Labels

stdlibStandard Library Python modules in the Lib/ directorytype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions