-
Notifications
You must be signed in to change notification settings - Fork 0
feat: ✨ implement resolve uri #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 27 commits
5f6fae2
6f0ef5d
f943218
a89633d
8072033
7763097
c375a30
26ad174
662c3d6
fd70520
b7ad9a0
df338f3
0044b9c
05fbaa8
5aae406
ae6c6eb
f88fb14
4b632fd
95b9c69
4005455
78da642
c1bcb3f
2b5aeea
43a974e
39a8c76
3c091bf
030789e
fd8c748
0b5d5d6
6a12807
7d1b019
a5cc597
d2fb845
f5a206c
115b19e
779f627
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -6,7 +6,7 @@ | |||||
| from cyclopts import App, Parameter, config | ||||||
|
|
||||||
| # from seedcase_flower.config import Config as FlowerConfig | ||||||
| from seedcase_flower.internals import BuildStyle, _read_properties, _resolve_uri | ||||||
| from seedcase_flower.internals import BuildStyle, Uri, _parse_source, _read_properties | ||||||
|
|
||||||
| app = App( | ||||||
| help="Flower generates human-readable documentation from Data Packages.", | ||||||
|
|
@@ -29,7 +29,7 @@ | |||||
|
|
||||||
| @app.command() | ||||||
| def build( | ||||||
| uri: str = "datapackage.json", | ||||||
| source: str = "datapackage.json", | ||||||
| style: BuildStyle = BuildStyle.quarto_one_page, | ||||||
| template_dir: Optional[Path] = None, | ||||||
| output_dir: Path = Path("docs"), | ||||||
|
|
@@ -38,7 +38,10 @@ def build( | |||||
| """Build human-readable documentation from a `datapackage.json` file. | ||||||
|
|
||||||
| Args: | ||||||
| uri: The URI to a datapackage.json file. | ||||||
| source: The path to a local `datapackage.json` file or its parent folder. | ||||||
| Can also be an `https:` URL to a remote `datapackage.json` or a | ||||||
| `github:` / `gh:` URI pointing to a repo with a `datapackage.json` | ||||||
| in the repo root (in the format `gh:org/repo`). | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Just to highlight that we also accept that. |
||||||
| style: The style used to structure the output. If a template directory | ||||||
| is given, this parameter will be ignored. | ||||||
| template_dir: The directory that contains the Jinja template | ||||||
|
|
@@ -47,8 +50,8 @@ def build( | |||||
| output_dir: The directory to save the generated files in. | ||||||
| verbose: If True, prints additional information to the console. | ||||||
| """ | ||||||
| path: Path = _resolve_uri(uri) | ||||||
| properties: dict[str, Any] = _read_properties(path) | ||||||
| uri: Uri = _parse_source(source) | ||||||
|
||||||
| properties: dict[str, Any] = _read_properties(uri) | ||||||
|
|
||||||
| # One item per section, rendered from template. | ||||||
| # Internally uses Jinja2 to render templates with metadata, which | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,11 @@ | ||
| """Helper functions for private use.""" | ||
|
|
||
| import json | ||
| from dataclasses import dataclass | ||
| from enum import Enum | ||
| from pathlib import Path | ||
| from typing import Any | ||
| from urllib import parse | ||
|
|
||
|
|
||
| class BuildStyle(Enum): | ||
|
|
@@ -14,15 +16,62 @@ class BuildStyle(Enum): | |
| quarto_resource_tables = "quarto_resource_tables" | ||
|
|
||
|
|
||
| # Output maybe str? Path? | ||
| # Use `match` inside for strictness on URI types? Or use a library for URI parsing? | ||
| # TODO Extend to parse strings and return either URL or Path | ||
| def _resolve_uri(uri: str) -> Path: | ||
| return Path(uri) | ||
| @dataclass(frozen=True) | ||
| class Uri: | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We used the URI name on the call so I kept it for now, but we could consider calling it URL instead since by the time we are using this class, we have already converted the gh URIs and paths to https URLs and File URLs, respectively. OF course, URI is still valid since it is a superset including URLs, but URL would be more precise to at this point since non-URL URIs (gh:, file:, ...) will never be represented by this class.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. URL is only for HTTP, URI is the umbrella term that includes
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmmm, I don't think that is correct. My understanding is that a URI can be either a URN or a URL. The difference is whether it uniquely Names a resource, or Locates the resource with an actual address. HTTP is just the scheme. There are many other schemes that specifies valid URLs, e.g. ftp, irc, and file. The link that you posted actually corroborates this understanding as exemplified by this excerpt from the second paragraph:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, we must be reading these entries in completely different ways because the way I read the wiki entry on URI is very different from how you are describing it. Like, a file is not a URL, as per wiki "web address". And a URN seems to be it's own URI ( Here it's called a file URI: https://en.wikipedia.org/wiki/File_URI_scheme
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There seems to be a lot confusion about these two terms that a whole formal document has been written about it: https://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/#uri-partitioning so we aren't alone in this multiple interpretation problem! 😛 |
||
| """A parsed URI with its normalised value and locality flag.""" | ||
|
|
||
| value: str | ||
| local: bool | ||
|
|
||
|
|
||
| def _parse_source(source: str) -> Uri: | ||
| split_source = parse.urlsplit(source) | ||
| if split_source.scheme == "": | ||
| split_source = split_source._replace(scheme="file") | ||
|
||
| match split_source.scheme: | ||
| case "file": | ||
| return _convert_to_file_uri(split_source) | ||
| case "https": | ||
| return _convert_to_https_uri(split_source) | ||
| case "gh" | "github": | ||
| return _convert_to_github_uri(split_source) | ||
| case _: | ||
| raise ValueError( | ||
| "The source must be either a path to an existing file/folder " | ||
| "or a URI with one of the following URI prefixes: " | ||
| "`file:`, `https:`, `gh:`, `github:`" | ||
| ) | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kept this error, but raised #159 for specific gh uri errors |
||
|
|
||
|
|
||
| def _convert_to_file_uri(split_file_source: parse.SplitResult) -> Uri: | ||
| path = Path(split_file_source.path).resolve() | ||
| if path.is_dir(): | ||
| path /= "datapackage.json" | ||
| split_file_source = split_file_source._replace(path=path.as_posix()) | ||
| return Uri(value=split_file_source.geturl(), local=True) | ||
|
|
||
|
|
||
| def _convert_to_https_uri(split_https_source: parse.SplitResult) -> Uri: | ||
| return Uri(value=split_https_source.geturl(), local=False) | ||
|
|
||
|
|
||
| def _convert_to_github_uri(split_gh_source: parse.SplitResult) -> Uri: | ||
| return Uri( | ||
| value=split_gh_source._replace( | ||
| scheme="https", | ||
| netloc="raw.githubusercontent.com", | ||
| path=f"/{split_gh_source.path}/refs/heads/main/datapackage.json", | ||
| ).geturl(), | ||
| local=False, | ||
| ) | ||
|
|
||
|
|
||
| # TODO Extend to also read properties from URLs | ||
| def _read_properties(path: Path) -> dict[str, Any]: | ||
| with open(path) as properties_file: | ||
| datapackage: dict[str, Any] = json.load(properties_file) | ||
| return datapackage | ||
| def _read_properties(uri: Uri) -> dict[str, Any]: | ||
| if uri.local: | ||
| # TODO read from local file | ||
| pass | ||
| else: | ||
| # TODO read from remote file | ||
| pass | ||
| return {"placeholder": uri.value} | ||
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another controversial change... but hear me out!
I think it was a bit confusing that we were calling the CLI arg a
urialthough it could also be a file or folder path. We then ended up having functions named_convert_to_*uri()that already took aurivariable as the input argument which semantically doesn't make much sense and is confusing to reason about, e.g.uri = convert_to_https_uri(uri)...We could move this to its own issue if it warrants a lot of discussion but since it is directly tied to the naming of the functions introduced here, I thought it made sense to bring it up here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, move this to another issue, as this impacts the design☺️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming functions is easy, but we don't want this PR to be tied down more than it has been already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for now, keep as URI, not source (I personally am not sure even source is a good name, but thats something to discuss).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering it! I opened #167 and reverted the changes here