Conversation
A bit ugly/verbose, but I couldn't find a faster way to have it both work as a type and be able to validate potential URLs the same way as HttpUrl, ie without calling other methods or functions.
joelostblom
left a comment
There was a problem hiding this comment.
This is ready for a first review! Mostly looking for feedback on the approach so I know if it makes sense. I'm figuring out how to add tests in the meanwhile.
src/seedcase_flower/cli.py
Outdated
| properties = _resolve_uri(uri) | ||
| # properties = _read_properties(path) |
There was a problem hiding this comment.
I commented this out for now since I don't think it makes sense to do all the changes required for _read_properties to pass type checking in this PR; that will be in the next one together with expanding that function fully
src/seedcase_flower/internals.py
Outdated
| _AnnotatedHttps = Annotated[AnyUrl, UrlConstraints(allowed_schemes=["https"])] | ||
| _adapter = TypeAdapter(_AnnotatedHttps) | ||
|
|
||
|
|
||
| class HttpsUrl(str): | ||
| """Type and class with validation for https URLs.""" | ||
|
|
||
| @classmethod | ||
| def __get_pydantic_core_schema__(cls, source, handler): # type: ignore[no-untyped-def] | ||
| """Initialize adapter core schema.""" | ||
| return _adapter.core_schema | ||
|
|
||
| def __new__(cls, value: str): # type: ignore[no-untyped-def] | ||
| """Setup validation.""" | ||
| validated = _adapter.validate_python(value) | ||
| return str.__new__(cls, validated) |
There was a problem hiding this comment.
This genAI generated snippet seems somewhat verbose, but there does not appear to exist another way of creating an HttpsUrl class that both works as a type and can be called (ie works the same as the existing HttpUrl class in pytdantic). See the following links for discussion and some alternatives that all seem less desirable to me:
Very open to suggestions
There was a problem hiding this comment.
I have no idea what's going on here, nor what's the purpose. Is this to check that the URL is an actual URL?
There was a problem hiding this comment.
So this is just to check the URL, right? As we'll fetch the JSON from the URL immediately, there's an argument for not trying to pre-check the URL at all. If it's bad, the request will fail anyway. (Of course we could add some nice error handling around that.)
There was a problem hiding this comment.
Yes to both check that strings adhere to valid https URL specification and to use it as a type in a function signature. I'm fine with changing and passing around strings, but see my reply below of why I thought we would prefer to create these specific types when we try to follow functional programming.
There was a problem hiding this comment.
Ah, I think I understand it a bit more after reading your comment. As we only need to know whether an address is local or remote, a middle ground would be to have a class that captures that plus a standardised form of the URI (like so). That way we keep function signatures tidy but don't have to care about validating URIs beyond the match statement.
src/seedcase_flower/internals.py
Outdated
| # TODO Extend to parse strings and return either URL or Path | ||
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl |
There was a problem hiding this comment.
@lwjohnst86 As per our suggestion today, I defined a single type here instead of using | or Union in the function signature. I initially tried to achieve this with enum but it doesn't seem possible / I couldn't figure out how, so I went with type instead.
I'm unsure if the name Https_or_FileUrl is actually that helpful, but it also does not seem correct to give a name such as just URL or URI since it is really a subset of these. Even something like URL_subset does not seem that helpful.
src/seedcase_flower/cli.py
Outdated
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| resolved_uri = _resolve_uri(uri) |
There was a problem hiding this comment.
Should resolved_uri be typed here or is it enough that the function return is typed already?
There was a problem hiding this comment.
I'd prefer to explicitly write it out, at least within a bigger function like this one.
src/seedcase_flower/cli.py
Outdated
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| resolved_uri = _resolve_uri(uri) | ||
| properties = _read_properties(resolved_uri) # type: ignore |
There was a problem hiding this comment.
Ignoring type checking for now since this function will be implemented in a separate PR
lwjohnst86
left a comment
There was a problem hiding this comment.
Some small comments to start. Please include tests too!
src/seedcase_flower/internals.py
Outdated
| # TODO Extend to parse strings and return either URL or Path | ||
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl |
There was a problem hiding this comment.
This can be an enum instead. And should be called a URI (where the default assumes a path URI).
There was a problem hiding this comment.
I'm really surprised there aren't packages or something for handling URIs... 🤔
There was a problem hiding this comment.
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | |
| class URI(Enum): | |
| https = HttpsUrl | |
| file = FileUrl |
e.g. might look like above
There was a problem hiding this comment.
Maybe I'm misunderstanding what _resolve_uri returns, but if it returns the address of the datapackage.json, then it cannot really return an enum. Enums are for fixed, finite sets of values. If _resolve_uri only determines whether the address is local or remote, then yeah it can return an enum.
There was a problem hiding this comment.
Yes, I couldn't get this to work as an enum; it seems like it hides the type information from mypy. Take the following example that contains the same function, either with an enum, a type alias, or a union:
# supertypes.py
from enum import Enum
class Number(Enum):
"""Doc."""
INT = int
FLOAT = float
def fx(x: Number) -> int | float:
"""Doc."""
x / 2
return x
type int_or_float = int | float
def fy(y: int_or_float) -> int | float:
"""Doc."""
y / 2
return y
def fz(z: int | float) -> int | float:
"""Doc."""
z / 2
return zWhen running uv run mypy --pretty supertypes.py, we can see that mypy doesn't understand that a Number is a supertype of int and float, whereas this is understood when creating a new type (and of course also with the union):
supertypes.py:13: error: Unsupported operand types for / ("Number" and "int") [operator]
x / 2
^
supertypes.py:14: error: Incompatible return value type (got "Number", expected "int | float")
[return-value]
return x
^
Found 2 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
That still gives me the same error for the division:
supertypes.py:36: error: Unsupported operand types for / ("Number" and "int")
[operator]
return x / 2
^
And even if it would work, it doesn't seem that useful for combining types in general if we always need additional logic like that. Doesn't it seem that creating a new type/type alias with type/TypeAlias is more appropriate here?
There was a problem hiding this comment.
Reading into it more, it seems enums in Python are more basic and with limited features than I expected. So my suggestion here doesn't actually work the way I was envisioning. I was imagining you use the enum to restrict the type of output and assign the URI path to the relevant enum entry. We could do a similar behaviour with a dataclass and treat it like an enum 😛 e.g.
@dataclass(frozen = True)
class URI:
https: HttpsUrl
file: FileUrl
def ... -> URI
match ...
case "file"
URI.file = ...
case ""
URI.file = _check_path(...)
...
return URI
There was a problem hiding this comment.
Hmm, do you have an MRE for what that would look like? When I try the following, I get the same errors as before:
from dataclasses import dataclass
@dataclass(frozen=True)
class Number:
"""Doc."""
INT = int
FLOAT = float
def fx(x: Number) -> int | float:
"""Doc."""
x / 2
return x
def fw(x: Number) -> int | float:
"""Doc."""
match Number:
case INT:
return x / 2
case FLOAT:
return x / 2supertypes.py:14: error: Unsupported operand types for / ("Number" and "int")
[operator]
x / 2
^
supertypes.py:15: error: Incompatible return value type (got "Number", expected
"int | float") [return-value]
return x
^
supertypes.py:22: error: Unsupported operand types for / ("Number" and "int")
[operator]
return x / 2
^
Found 3 errors in 1 file (checked 1 source file)
There was a problem hiding this comment.
Sorry to butt in, but a URI needs to know 2 things: whether it's local or remote and what its address is.
That corresponds more to:
@dataclass
class Uri:
value: str
local: boolA single URI shouldn't have properties for both https and file?
There was a problem hiding this comment.
Agree, but I thought the idea of this superclass was to take advantage of the validation and type specification that already exists in pydantic's FileUrl and our modified HttpsURL. If we don't want that, we could use something like AnyUrl directly and implement our own validation? Or could we somehow use the validation from the pydandtic classes inside our own Uri dataclass?
src/seedcase_flower/internals.py
Outdated
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | ||
|
|
||
|
|
||
| def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl: |
There was a problem hiding this comment.
| def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl: | |
| def _resolve_uri(uri: str) -> URI: |
See comment above.
There was a problem hiding this comment.
Maybe a better word would be _parse_uri()...? 🤔
src/seedcase_flower/internals.py
Outdated
| split_uri = parse.urlsplit(uri_or_path) | ||
| match split_uri.scheme: | ||
| case "": | ||
| return _check_path(uri_or_path) |
There was a problem hiding this comment.
Instead, append a "file" and check as if it is a path URI.
lwjohnst86
left a comment
There was a problem hiding this comment.
Just forgot to comment on some stuff
src/seedcase_flower/cli.py
Outdated
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| resolved_uri = _resolve_uri(uri) |
There was a problem hiding this comment.
I'd prefer to explicitly write it out, at least within a bigger function like this one.
src/seedcase_flower/internals.py
Outdated
| _AnnotatedHttps = Annotated[AnyUrl, UrlConstraints(allowed_schemes=["https"])] | ||
| _adapter = TypeAdapter(_AnnotatedHttps) | ||
|
|
||
|
|
||
| class HttpsUrl(str): | ||
| """Type and class with validation for https URLs.""" | ||
|
|
||
| @classmethod | ||
| def __get_pydantic_core_schema__(cls, source, handler): # type: ignore[no-untyped-def] | ||
| """Initialize adapter core schema.""" | ||
| return _adapter.core_schema | ||
|
|
||
| def __new__(cls, value: str): # type: ignore[no-untyped-def] | ||
| """Setup validation.""" | ||
| validated = _adapter.validate_python(value) | ||
| return str.__new__(cls, validated) |
There was a problem hiding this comment.
I have no idea what's going on here, nor what's the purpose. Is this to check that the URL is an actual URL?
src/seedcase_flower/internals.py
Outdated
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | ||
|
|
||
|
|
||
| def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl: |
There was a problem hiding this comment.
Maybe a better word would be _parse_uri()...? 🤔
martonvago
left a comment
There was a problem hiding this comment.
I like this in general, just added some ideas about how we could simplify things a bit
src/seedcase_flower/cli.py
Outdated
| resolved_uri = _resolve_uri(uri) | ||
| properties = _read_properties(resolved_uri) # type: ignore |
There was a problem hiding this comment.
I don't know if this split between _resolve_uri and _read_properties is the simplest solution.
_resolve_uri can return either a local or a remote address, and these are loaded/fetched differently.
The way it is now, we first split the cases in the match statement, unite them in the output of _resolve_uri, and then presumably split them again in _read_properties. We could cut out the middle man and handle each case right when we split them. (I'll explain more there.)
There was a problem hiding this comment.
That could work! And just drop the _resolve_urifunction completely and just keep the read properties one instead 👍👍
src/seedcase_flower/internals.py
Outdated
| _AnnotatedHttps = Annotated[AnyUrl, UrlConstraints(allowed_schemes=["https"])] | ||
| _adapter = TypeAdapter(_AnnotatedHttps) | ||
|
|
||
|
|
||
| class HttpsUrl(str): | ||
| """Type and class with validation for https URLs.""" | ||
|
|
||
| @classmethod | ||
| def __get_pydantic_core_schema__(cls, source, handler): # type: ignore[no-untyped-def] | ||
| """Initialize adapter core schema.""" | ||
| return _adapter.core_schema | ||
|
|
||
| def __new__(cls, value: str): # type: ignore[no-untyped-def] | ||
| """Setup validation.""" | ||
| validated = _adapter.validate_python(value) | ||
| return str.__new__(cls, validated) |
There was a problem hiding this comment.
So this is just to check the URL, right? As we'll fetch the JSON from the URL immediately, there's an argument for not trying to pre-check the URL at all. If it's bad, the request will fail anyway. (Of course we could add some nice error handling around that.)
src/seedcase_flower/internals.py
Outdated
| # TODO Extend to parse strings and return either URL or Path | ||
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl |
There was a problem hiding this comment.
Maybe I'm misunderstanding what _resolve_uri returns, but if it returns the address of the datapackage.json, then it cannot really return an enum. Enums are for fixed, finite sets of values. If _resolve_uri only determines whether the address is local or remote, then yeah it can return an enum.
src/seedcase_flower/internals.py
Outdated
| type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl | ||
|
|
||
|
|
||
| def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl: |
There was a problem hiding this comment.
So picking up from my earlier comment, how about looking forward to what we will do with the addresses and doing sg like:
def _resolve_uri(uri: str) -> dict[str, Any]:
split_uri = parse.urlsplit(uri)
match split_uri.scheme:
case "":
return _load_properties_from_path(uri)
case "file":
return _load_properties_from_path(split_uri.path)
case "https":
return _fetch_properties_from_url(split_uri.geturl())
case "gh" | "github":
return _fetch_properties_from_url(split_uri._replace(...).geturl())
case _:
raise ValueError(...)Both _load_properties_from_path and _fetch_properties_from_url could take a string and try to load the JSON without checking that the string is a good address. Instead, we could anticipate common failure cases (file not found, not JSON, various network errors, etc.) and show nicely worded errors -- probably easier than typing and validating the input addresses.
(Maybe the one thing worth checking is that the gh-flavoured address contains just the organisation and the repo?)
There was a problem hiding this comment.
I like 👍 just rename the whole function to_read_properties instead. And for the gh and github one, convert then onto https and fetch like the other https
There was a problem hiding this comment.
I agree with what you wrote in your comment above regarding the split between _resolve_uri and _read_properties. It seemed somewhat forced when I was implementing it too, exactly because of what you said with the uniting and splitting again. I'm totally fine with this alternative approach (or even ducktyping for error handling as long as the messages we get are clear), but my (mis?)understanding of Luke and/or functional programming in general is that it might clash on two accounts:
- Each parameter should be assigned a type that as closely as possible matches what it represents. Preferably errors are handled by this type instead of e.g. pattern matching a string (I understand that we do pattern matching to assign the type, so it seems like the approaches are equally fine to me from this perspective, but not sure if they are equal from a functional programming perspective).
- Each function should just do one thing. With this new approach, we would have a (parent) function that both resolves the URI and then loads the properties. Again, this would be totally fine with me personally, but I thought that in functional programming, we want a function to do one thing (e.g. convert from one type to another) and then pass the output type to another function so that they are more standalone and composable rather than nesting two functions within a parent function.
Again, I might very well have misunderstood both of these "principles" of functional programming and I'm happy with either solution here.
There was a problem hiding this comment.
Hmm, I see what you're saying. What if instead we resolve thr URI so a pure path is given the "file" prefix and the gh/github is converted into a full URL. And the https is output as is. Then the downstream read properties function only needs to do two switch statements rather than X number to actually convert them into correct URIs?
There was a problem hiding this comment.
I think that's what the initial implementation suggestion is doing, isn't it? Paths and file URIs return FileUrl whereas https URLs and github URIs return HttpsURL.
There was a problem hiding this comment.
Yea! Well, almost, aside from how it is being output, that's the only tension point 😛
There was a problem hiding this comment.
I mean I personally think that some problems lend themselves very nicely to one type of solution and some problems to another type. To me, the solution with nested functions feels cleaner and simpler in this case. But I do get the one function -- one responsibility hangup, though of course I could be cheeky and argue that loading the properties from a URI is one responsibility.
Or taking another angle, we could say that loading the properties from a local path and fetching them from a remote address are two distinct tasks. This is true because they expect different inputs (a path vs a string) and are implemented differently. If this is the case, then what we need before loading the properties is to determine whether we have a local or a remote address and to standardise each type of address.
Something like:
@dataclass
class Uri:
value: str
local: bool
def _parse_uri(uri: str) -> Uri:
split_uri = parse.urlsplit(uri)
match split_uri.scheme:
case "":
return Uri(value=uri, local=True)
case "file":
return Uri(value=split_uri.path, local=True)
case "https":
return Uri(value=split_uri.geturl(), local=False)
case "gh" | "github":
return Uri(
value=split_uri._replace(...).geturl(),
local=False,
)
case _:
raise ValueError(...)
parsed_uri = _parse_uri(uri)
if parsed_uri.local:
properties = _load_properties_from_path(parsed_uri.value)
else:
properties = _fetch_properties_from_url(parsed_uri.value)There was a problem hiding this comment.
Yea, that's a good point, the distinction between local and remote. Since the functions to read the properties for either source will very likely be very differently handled..!
This reverts commit c1bcb3f.
joelostblom
left a comment
There was a problem hiding this comment.
@martonvago @lwjohnst86 Ready for another look! I've tried to update based on our discussion. If it looks largely good to you, I will add tests next on top of #155
src/seedcase_flower/internals.py
Outdated
| if split_source.scheme == "": | ||
| split_source = split_source._replace(scheme="file") |
There was a problem hiding this comment.
I moved this outside the match logic above to make the cases clearer an not have to repeat the same functions for paths and file uris
src/seedcase_flower/cli.py
Outdated
| @app.command() | ||
| def build( | ||
| uri: str = "datapackage.json", | ||
| source: str = "datapackage.json", |
There was a problem hiding this comment.
Another controversial change... but hear me out!
I think it was a bit confusing that we were calling the CLI arg a uri although it could also be a file or folder path. We then ended up having functions named _convert_to_*uri() that already took a uri variable as the input argument which semantically doesn't make much sense and is confusing to reason about, e.g. uri = convert_to_https_uri(uri)...
We could move this to its own issue if it warrants a lot of discussion but since it is directly tied to the naming of the functions introduced here, I thought it made sense to bring it up here.
There was a problem hiding this comment.
Yes, move this to another issue, as this impacts the design
There was a problem hiding this comment.
Renaming functions is easy, but we don't want this PR to be tied down more than it has been already.
There was a problem hiding this comment.
So for now, keep as URI, not source (I personally am not sure even source is a good name, but thats something to discuss).
There was a problem hiding this comment.
Thanks for considering it! I opened #167 and reverted the changes here
src/seedcase_flower/cli.py
Outdated
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| uri: Uri = _parse_source(source) |
There was a problem hiding this comment.
Same here as above, we're not just resolving a URI in this function, we're parsing an input string into components, editing as needed, then converting to our URI type.
There was a problem hiding this comment.
You can still parse a URI to confirm it is a URI. Like parsing a JSON file. e.g. parse_json(path) or like read_properties(path). Both content JSON/properties, but we still parse it to get into the right format for us.
There was a problem hiding this comment.
Mostly agree... I think it would probably be more suitable to say you parse a URI candidate to check whether it is a URI. But that's a bit aside of my main point here, which is that you cannot parse a path and say you are parsing a URI.
| "The source must be either a path to an existing file/folder " | ||
| "or a URI with one of the following URI prefixes: " | ||
| "`file:`, `https:`, `gh:`, `github:`" | ||
| ) |
There was a problem hiding this comment.
I kept this error, but raised #159 for specific gh uri errors
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| @dataclass(frozen=True) | ||
| class Uri: |
There was a problem hiding this comment.
We used the URI name on the call so I kept it for now, but we could consider calling it URL instead since by the time we are using this class, we have already converted the gh URIs and paths to https URLs and File URLs, respectively. OF course, URI is still valid since it is a superset including URLs, but URL would be more precise to at this point since non-URL URIs (gh:, file:, ...) will never be represented by this class.
There was a problem hiding this comment.
URL is only for HTTP, URI is the umbrella term that includes file:. https://en.wikipedia.org/wiki/Uniform_Resource_Identifier. And this class would contain the value which contains paths and URLs. We could call it something else, but URL won't be accurate.
There was a problem hiding this comment.
Hmmm, I don't think that is correct. My understanding is that a URI can be either a URN or a URL. The difference is whether it uniquely Names a resource, or Locates the resource with an actual address. HTTP is just the scheme. There are many other schemes that specifies valid URLs, e.g. ftp, irc, and file.
The link that you posted actually corroborates this understanding as exemplified by this excerpt from the second paragraph:
URIs which provide a means of locating and retrieving information resources on a network (either on the Internet or on another private network, such as a computer file system or an Intranet) are Uniform Resource Locators (URLs). ... Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs).
There was a problem hiding this comment.
Ok, we must be reading these entries in completely different ways because the way I read the wiki entry on URI is very different from how you are describing it. Like, a file is not a URL, as per wiki "web address". And a URN seems to be it's own URI (urn: vs https:), and urn: is not file:.
Here it's called a file URI: https://en.wikipedia.org/wiki/File_URI_scheme
src/seedcase_flower/internals.py
Outdated
| else: | ||
| # TODO read from remote file | ||
| pass | ||
| return {"placeholder": uri.value} |
There was a problem hiding this comment.
I put up a draft of what I think this function will look like in #161 that I want to mention since it is so connected to the changes in this PR
lwjohnst86
left a comment
There was a problem hiding this comment.
Nice refactor! Some comments _parse_uri()? We can treat the "higher-level" functions within the build() and view() functions almost as the Python interface, so we can do unit testing of them to make sure they work fine. And the tests for build() and view() we treat as integration tests. Unfortunately, Python doesn't have a formal way of classifying/structuring unit tests from integration tests, so this is the best we have.
src/seedcase_flower/cli.py
Outdated
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| uri: Uri = _parse_source(source) |
There was a problem hiding this comment.
You can still parse a URI to confirm it is a URI. Like parsing a JSON file. e.g. parse_json(path) or like read_properties(path). Both content JSON/properties, but we still parse it to get into the right format for us.
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| @dataclass(frozen=True) | ||
| class Uri: |
There was a problem hiding this comment.
URL is only for HTTP, URI is the umbrella term that includes file:. https://en.wikipedia.org/wiki/Uniform_Resource_Identifier. And this class would contain the value which contains paths and URLs. We could call it something else, but URL won't be accurate.
This is a first implementation of the tests we can use to check that the CLI is behaving correctly. I think this covers most of what we currently have in main, at least we reach 💯 % coverage! Merging this before some of the existing PRs e.g. #140 and #152 will make it easier to add new tests specifically for the functionality added there instead of convoluting it with testing the general functionality of the app which is tested here instead. I tried to follow the cyclopts test docs as closely as possible, but let me know if you think something is missing. I also experimented with using the Copilot CLI for the first time to help me get started and understanding how to write the tests. It was quite useful, particularly to get started, but I did have to delete about half of its suggestions because there was a lot of redundancy. Thought I would mention it in case someone catches something odd in here so I can blame it and not me =p Needs a thorough review. ## Checklist - [x] Ran `just run-all` --------- Co-authored-by: Luke W. Johnston <lwjohnst86@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…g sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Thanks for the comments, read for another look!
The added tests were created by copilot and then reviewed and edited by yours truly. Let me know if the style should be different, e.g. no dosctring or testing more than one thing in a single test (they are very granular now, but in general it also seems good to test one thing per test and the small amount of code duplication does not seem to warrant the use of fixtures to me, but if you follow another style I can change it to match!).
It was also fun to see the automatically triggered github security review where one AI is correcting another xD
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| @dataclass(frozen=True) | ||
| class Uri: |
There was a problem hiding this comment.
Hmmm, I don't think that is correct. My understanding is that a URI can be either a URN or a URL. The difference is whether it uniquely Names a resource, or Locates the resource with an actual address. HTTP is just the scheme. There are many other schemes that specifies valid URLs, e.g. ftp, irc, and file.
The link that you posted actually corroborates this understanding as exemplified by this excerpt from the second paragraph:
URIs which provide a means of locating and retrieving information resources on a network (either on the Internet or on another private network, such as a computer file system or an Intranet) are Uniform Resource Locators (URLs). ... Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs).
src/seedcase_flower/cli.py
Outdated
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| uri: Uri = _parse_source(source) |
There was a problem hiding this comment.
Mostly agree... I think it would probably be more suitable to say you parse a URI candidate to check whether it is a URI. But that's a bit aside of my main point here, which is that you cannot parse a path and say you are parsing a URI.
src/seedcase_flower/cli.py
Outdated
| @app.command() | ||
| def build( | ||
| uri: str = "datapackage.json", | ||
| source: str = "datapackage.json", |
There was a problem hiding this comment.
Thanks for considering it! I opened #167 and reverted the changes here
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| uri: Uri = _parse_uri(uri) # type: ignore # TODO fix in read_prop PR |
There was a problem hiding this comment.
I think a good example of why the current naming is wonky is the fact that we are reassigning the uri variable here. uri is indeed the most appropriate name at this point because it is now an instance of our Uri class. But before this line uri doesn't actually contain a uri, which is confusing to me at least.
mypy also complains about this:
src/seedcase_flower/cli.py:54: error: Name "uri" already defined on line 33 [no-redef]
uri: Uri = _parse_uri(uri)
^~~
Found 1 error in 1 file (checked 13 source files)
There was a problem hiding this comment.
One solution to this mypy thing is to use parse_uri within read_properties() e.g. read_properties(parse_uri(uri)). (maybe this is also why I liked resolve as a word better, but I don't have a strong preference). (this is where I really miss the ability to pipe...).
lwjohnst86
left a comment
There was a problem hiding this comment.
Just some initial comments, will look at the tests later :)
| """ | ||
| path: Path = _resolve_uri(uri) | ||
| properties: dict[str, Any] = _read_properties(path) | ||
| uri: Uri = _parse_uri(uri) # type: ignore # TODO fix in read_prop PR |
There was a problem hiding this comment.
One solution to this mypy thing is to use parse_uri within read_properties() e.g. read_properties(parse_uri(uri)). (maybe this is also why I liked resolve as a word better, but I don't have a strong preference). (this is where I really miss the ability to pipe...).
| uri: The path to a local `datapackage.json` file or its parent folder. | ||
| Can also be an `https:` URL to a remote `datapackage.json` or a | ||
| `github:` / `gh:` URI pointing to a repo with a `datapackage.json` | ||
| in the repo root (in the format `gh:org/repo`). |
There was a problem hiding this comment.
| in the repo root (in the format `gh:org/repo`). | |
| in the repo root (in the format `gh:org/repo`). Can also take a reference to a tag or branch for `gh:` or `github:` URIs (e.g. `gh:org/repo@main` or `gh:org/repo@1.0.1). |
Just to highlight that we also accept that.
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| @dataclass(frozen=True) | ||
| class Uri: |
There was a problem hiding this comment.
Ok, we must be reading these entries in completely different ways because the way I read the wiki entry on URI is very different from how you are describing it. Like, a file is not a URL, as per wiki "web address". And a URN seems to be it's own URI (urn: vs https:), and urn: is not file:.
Here it's called a file URI: https://en.wikipedia.org/wiki/File_URI_scheme
Description
Closes #93 #96
Needs a thorough review.
Checklist
just run-all