-
Notifications
You must be signed in to change notification settings - Fork 0
feat: ✨ implement resolve uri #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 15 commits
5f6fae2
6f0ef5d
f943218
a89633d
8072033
7763097
c375a30
26ad174
662c3d6
fd70520
b7ad9a0
df338f3
0044b9c
05fbaa8
5aae406
ae6c6eb
f88fb14
4b632fd
95b9c69
4005455
78da642
c1bcb3f
2b5aeea
43a974e
39a8c76
3c091bf
030789e
fd8c748
0b5d5d6
6a12807
7d1b019
a5cc597
d2fb845
f5a206c
115b19e
779f627
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,7 @@ | ||
| """Functions for the exposed CLI.""" | ||
|
|
||
| from pathlib import Path | ||
| from typing import Any, Optional | ||
| from typing import Optional | ||
|
|
||
| from cyclopts import App, Parameter, config | ||
|
|
||
|
|
@@ -47,8 +47,8 @@ def build( | |
| output_dir: The directory to save the generated files in. | ||
| verbose: If True, prints additional information to the console. | ||
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| resolved_uri = _resolve_uri(uri) | ||
| properties = _read_properties(resolved_uri) # type: ignore | ||
|
||
|
|
||
| # One item per section, rendered from template. | ||
| # Internally uses Jinja2 to render templates with metadata, which | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -3,7 +3,27 @@ | |||||||||
| import json | ||||||||||
| from enum import Enum | ||||||||||
| from pathlib import Path | ||||||||||
| from typing import Any | ||||||||||
| from typing import Annotated, Any | ||||||||||
| from urllib import parse | ||||||||||
|
|
||||||||||
| from pydantic import AnyUrl, FileUrl, TypeAdapter, UrlConstraints | ||||||||||
|
|
||||||||||
| _AnnotatedHttps = Annotated[AnyUrl, UrlConstraints(allowed_schemes=["https"])] | ||||||||||
| _adapter = TypeAdapter(_AnnotatedHttps) | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class HttpsUrl(str): | ||||||||||
| """Type and class with validation for https URLs.""" | ||||||||||
|
|
||||||||||
| @classmethod | ||||||||||
| def __get_pydantic_core_schema__(cls, source, handler): # type: ignore[no-untyped-def] | ||||||||||
| """Initialize adapter core schema.""" | ||||||||||
| return _adapter.core_schema | ||||||||||
|
|
||||||||||
| def __new__(cls, value: str): # type: ignore[no-untyped-def] | ||||||||||
| """Setup validation.""" | ||||||||||
| validated = _adapter.validate_python(value) | ||||||||||
| return str.__new__(cls, validated) | ||||||||||
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| class BuildStyle(Enum): | ||||||||||
|
|
@@ -14,11 +34,53 @@ class BuildStyle(Enum): | |||||||||
| quarto_resource_tables = "quarto_resource_tables" | ||||||||||
|
|
||||||||||
|
|
||||||||||
| # Output maybe str? Path? | ||||||||||
| # Use `match` inside for strictness on URI types? Or use a library for URI parsing? | ||||||||||
| # TODO Extend to parse strings and return either URL or Path | ||||||||||
| def _resolve_uri(uri: str) -> Path: | ||||||||||
| return Path(uri) | ||||||||||
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | ||||||||||
|
||||||||||
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | |
| class URI(Enum): | |
| https = HttpsUrl | |
| file = FileUrl |
e.g. might look like above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding what _resolve_uri returns, but if it returns the address of the datapackage.json, then it cannot really return an enum. Enums are for fixed, finite sets of values. If _resolve_uri only determines whether the address is local or remote, then yeah it can return an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I couldn't get this to work as an enum; it seems like it hides the type information from mypy. Take the following example that contains the same function, either with an enum, a type alias, or a union:
# supertypes.py
from enum import Enum
class Number(Enum):
"""Doc."""
INT = int
FLOAT = float
def fx(x: Number) -> int | float:
"""Doc."""
x / 2
return x
type int_or_float = int | float
def fy(y: int_or_float) -> int | float:
"""Doc."""
y / 2
return y
def fz(z: int | float) -> int | float:
"""Doc."""
z / 2
return zWhen running uv run mypy --pretty supertypes.py, we can see that mypy doesn't understand that a Number is a supertype of int and float, whereas this is understood when creating a new type (and of course also with the union):
supertypes.py:13: error: Unsupported operand types for / ("Number" and "int") [operator]
x / 2
^
supertypes.py:14: error: Incompatible return value type (got "Number", expected "int | float")
[return-value]
return x
^
Found 2 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still gives me the same error for the division:
supertypes.py:36: error: Unsupported operand types for / ("Number" and "int")
[operator]
return x / 2
^
And even if it would work, it doesn't seem that useful for combining types in general if we always need additional logic like that. Doesn't it seem that creating a new type/type alias with type/TypeAlias is more appropriate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading into it more, it seems enums in Python are more basic and with limited features than I expected. So my suggestion here doesn't actually work the way I was envisioning. I was imagining you use the enum to restrict the type of output and assign the URI path to the relevant enum entry. We could do a similar behaviour with a dataclass and treat it like an enum 😛 e.g.
@dataclass(frozen = True)
class URI:
https: HttpsUrl
file: FileUrl
def ... -> URI
match ...
case "file"
URI.file = ...
case ""
URI.file = _check_path(...)
...
return URI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, do you have an MRE for what that would look like? When I try the following, I get the same errors as before:
from dataclasses import dataclass
@dataclass(frozen=True)
class Number:
"""Doc."""
INT = int
FLOAT = float
def fx(x: Number) -> int | float:
"""Doc."""
x / 2
return x
def fw(x: Number) -> int | float:
"""Doc."""
match Number:
case INT:
return x / 2
case FLOAT:
return x / 2supertypes.py:14: error: Unsupported operand types for / ("Number" and "int")
[operator]
x / 2
^
supertypes.py:15: error: Incompatible return value type (got "Number", expected
"int | float") [return-value]
return x
^
supertypes.py:22: error: Unsupported operand types for / ("Number" and "int")
[operator]
return x / 2
^
Found 3 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to butt in, but a URI needs to know 2 things: whether it's local or remote and what its address is.
That corresponds more to:
@dataclass
class Uri:
value: str
local: boolA single URI shouldn't have properties for both https and file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but I thought the idea of this superclass was to take advantage of the validation and type specification that already exists in pydantic's FileUrl and our modified HttpsURL. If we don't want that, we could use something like AnyUrl directly and implement our own validation? Or could we somehow use the validation from the pydandtic classes inside our own Uri dataclass?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl: | |
| def _resolve_uri(uri: str) -> URI: |
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better word would be _parse_uri()...? 🤔
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So picking up from my earlier comment, how about looking forward to what we will do with the addresses and doing sg like:
def _resolve_uri(uri: str) -> dict[str, Any]:
split_uri = parse.urlsplit(uri)
match split_uri.scheme:
case "":
return _load_properties_from_path(uri)
case "file":
return _load_properties_from_path(split_uri.path)
case "https":
return _fetch_properties_from_url(split_uri.geturl())
case "gh" | "github":
return _fetch_properties_from_url(split_uri._replace(...).geturl())
case _:
raise ValueError(...)Both _load_properties_from_path and _fetch_properties_from_url could take a string and try to load the JSON without checking that the string is a good address. Instead, we could anticipate common failure cases (file not found, not JSON, various network errors, etc.) and show nicely worded errors -- probably easier than typing and validating the input addresses.
(Maybe the one thing worth checking is that the gh-flavoured address contains just the organisation and the repo?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like 👍 just rename the whole function to_read_properties instead. And for the gh and github one, convert then onto https and fetch like the other https
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what you wrote in your comment above regarding the split between _resolve_uri and _read_properties. It seemed somewhat forced when I was implementing it too, exactly because of what you said with the uniting and splitting again. I'm totally fine with this alternative approach (or even ducktyping for error handling as long as the messages we get are clear), but my (mis?)understanding of Luke and/or functional programming in general is that it might clash on two accounts:
- Each parameter should be assigned a type that as closely as possible matches what it represents. Preferably errors are handled by this type instead of e.g. pattern matching a string (I understand that we do pattern matching to assign the type, so it seems like the approaches are equally fine to me from this perspective, but not sure if they are equal from a functional programming perspective).
- Each function should just do one thing. With this new approach, we would have a (parent) function that both resolves the URI and then loads the properties. Again, this would be totally fine with me personally, but I thought that in functional programming, we want a function to do one thing (e.g. convert from one type to another) and then pass the output type to another function so that they are more standalone and composable rather than nesting two functions within a parent function.
Again, I might very well have misunderstood both of these "principles" of functional programming and I'm happy with either solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see what you're saying. What if instead we resolve thr URI so a pure path is given the "file" prefix and the gh/github is converted into a full URL. And the https is output as is. Then the downstream read properties function only needs to do two switch statements rather than X number to actually convert them into correct URIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what the initial implementation suggestion is doing, isn't it? Paths and file URIs return FileUrl whereas https URLs and github URIs return HttpsURL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea! Well, almost, aside from how it is being output, that's the only tension point 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I personally think that some problems lend themselves very nicely to one type of solution and some problems to another type. To me, the solution with nested functions feels cleaner and simpler in this case. But I do get the one function -- one responsibility hangup, though of course I could be cheeky and argue that loading the properties from a URI is one responsibility.
Or taking another angle, we could say that loading the properties from a local path and fetching them from a remote address are two distinct tasks. This is true because they expect different inputs (a path vs a string) and are implemented differently. If this is the case, then what we need before loading the properties is to determine whether we have a local or a remote address and to standardise each type of address.
Something like:
@dataclass
class Uri:
value: str
local: bool
def _parse_uri(uri: str) -> Uri:
split_uri = parse.urlsplit(uri)
match split_uri.scheme:
case "":
return Uri(value=uri, local=True)
case "file":
return Uri(value=split_uri.path, local=True)
case "https":
return Uri(value=split_uri.geturl(), local=False)
case "gh" | "github":
return Uri(
value=split_uri._replace(...).geturl(),
local=False,
)
case _:
raise ValueError(...)
parsed_uri = _parse_uri(uri)
if parsed_uri.local:
properties = _load_properties_from_path(parsed_uri.value)
else:
properties = _fetch_properties_from_url(parsed_uri.value)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's a good point, the distinction between local and remote. Since the functions to read the properties for either source will very likely be very differently handled..!
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, append a "file" and check as if it is a path URI.
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this error, but raised #159 for specific gh uri errors
lwjohnst86 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of _check, name it parse or resolve or as. Since we aren't checking here (no bool is output).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, aren't all these function mainly for checking that the input is a valid FileUrl or HttpsUrl and throw an error if it isn't? Do you see it more as a parsing function that also includes checking? If so, should I change it somehow so that there is a separate function for checking and a separate one for parsing (that seems rather redundant though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, but you aren't just checking here, you're also converting things into FileUrl. So the word "check" isn't accurate to the action being applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then no single word would be correct right? The function would need a name like parse_and_check or similar.
In these four _check* functions, it's only _check_path and _check_github_uri that contains any parsing. The main parsing step already happened in parse.urlsplit(uri_or_path) which is outside these functions.
I'm thinking of parsing as breaking something down into parts, and what happens here is more of a conversion between data types (from str to a url type) which includes a check that it is convertable. So a possible name could be _convert_to_fileurl maybe (_transform_to_fileurl could also work but it's longer? I think _as_path as you mentioned could work, but I generally prefer when function starts with a verb that describes what the action they carry out. Now I wonder if we need it to be even more specific: _convert_path_str_to_fileurl vs convert_fileurl_str_to_file_url (becomes messy, I rather have a single convert_to_fileurl and move some logic around I think...)
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should
resolved_uribe typed here or is it enough that the function return is typed already?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to explicitly write it out, at least within a bigger function like this one.