Skip to content
Open
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5f6fae2
feat: :sparkles: Move over relevant code from sprout and cdp
joelostblom Feb 23, 2026
6f0ef5d
feat: :sparkles: Add Https type and class validator
joelostblom Feb 24, 2026
f943218
feat: :sparkles: Implement resolve_uri
joelostblom Feb 24, 2026
a89633d
refactor: ♻️ Use case/match for readabilty
joelostblom Feb 24, 2026
8072033
refactor: ♻️ Make variable names more precise
joelostblom Feb 24, 2026
7763097
fix: 🐛 Remove unnused import
joelostblom Feb 24, 2026
c375a30
fix: 🐛 Fix mypy errors
joelostblom Feb 24, 2026
26ad174
chore: 🔧 Fix typing
joelostblom Feb 24, 2026
662c3d6
fix: 🐛 Remove old tmp code
joelostblom Feb 24, 2026
fd70520
fix: 🐛 Remove unused import
joelostblom Feb 24, 2026
b7ad9a0
fix: 🐛 Pass vulture checks
joelostblom Feb 24, 2026
df338f3
test: ✅ Comment code to be updated in separate PR to pass tests
joelostblom Feb 24, 2026
0044b9c
fix: 🐛 Remove check datapacakge from this PR
joelostblom Feb 24, 2026
05fbaa8
fix: 🐛 Create type alias
joelostblom Feb 25, 2026
5aae406
fix: 🐛 Ignore line in mypy instead of changing logic
joelostblom Feb 25, 2026
ae6c6eb
refactor: ♻️ Change pydantic classes to custom dataclass
joelostblom Feb 26, 2026
f88fb14
fix: 🐛 Name parameters more precisely
joelostblom Feb 26, 2026
4b632fd
fix: 🐛 Remove explicit file exist check
joelostblom Feb 26, 2026
95b9c69
refactor: ♻️ Reduce cases
joelostblom Feb 27, 2026
4005455
refactor: ♻️ Name variables more precisely
joelostblom Feb 27, 2026
78da642
docs: 📝 Clarify what source can be
joelostblom Feb 27, 2026
c1bcb3f
fix: 🐛 Avoid changes to read_properties in this PR
joelostblom Feb 27, 2026
2b5aeea
Revert "fix: 🐛 Avoid changes to read_properties in this PR"
joelostblom Feb 27, 2026
43a974e
feat: ✨ Check types
joelostblom Feb 27, 2026
39a8c76
fix: 🐛 Remove redundant parsin of gh uri
joelostblom Feb 27, 2026
3c091bf
feat: ✨ Improve naming
joelostblom Feb 27, 2026
030789e
feat: ✨ Add read_prop skeleton
joelostblom Feb 27, 2026
fd8c748
refactor: ♻️ Revert to less precise name until we have discussed more
joelostblom Mar 2, 2026
0b5d5d6
Merge branch 'main' into feat/resolve-uri
joelostblom Mar 2, 2026
6a12807
test: ✅ Update test to match new docstring
joelostblom Mar 2, 2026
7d1b019
fix: 🐛 Restore local file reading behavior to not break former test
joelostblom Mar 2, 2026
a5cc597
refactor: ♻️ Rename test function to match refactor
joelostblom Mar 2, 2026
d2fb845
chore: 🔧 Ignore mypy on lines to be fixed in read_prop PR
joelostblom Mar 2, 2026
f5a206c
test: ✅ Add internal tests for _parse_uri
joelostblom Mar 2, 2026
115b19e
build: 🔨 uv update
joelostblom Mar 2, 2026
779f627
Potential fix for code scanning alert no. 14: Incomplete URL substrin…
joelostblom Mar 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions src/seedcase_flower/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from cyclopts import App, Parameter, config

# from seedcase_flower.config import Config as FlowerConfig
from seedcase_flower.internals import BuildStyle, _read_properties, _resolve_uri
from seedcase_flower.internals import BuildStyle, Uri, _parse_uri, _read_properties

app = App(
name="seedcase-flower",
Expand Down Expand Up @@ -39,7 +39,10 @@ def build(
"""Build human-readable documentation from a `datapackage.json` file.

Args:
uri: The URI to a datapackage.json file.
uri: The path to a local `datapackage.json` file or its parent folder.
Can also be an `https:` URL to a remote `datapackage.json` or a
`github:` / `gh:` URI pointing to a repo with a `datapackage.json`
in the repo root (in the format `gh:org/repo`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in the repo root (in the format `gh:org/repo`).
in the repo root (in the format `gh:org/repo`). Can also take a reference to a tag or branch for `gh:` or `github:` URIs (e.g. `gh:org/repo@main` or `gh:org/repo@1.0.1).

Just to highlight that we also accept that.

style: The style used to structure the output. If a template directory
is given, this parameter will be ignored.
template_dir: The directory that contains the Jinja template
Expand All @@ -48,8 +51,8 @@ def build(
output_dir: The directory to save the generated files in.
verbose: If True, prints additional information to the console.
"""
path: Path = _resolve_uri(uri)
properties: dict[str, Any] = _read_properties(path)
uri: Uri = _parse_uri(uri) # type: ignore # TODO fix in read_prop PR
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good example of why the current naming is wonky is the fact that we are reassigning the uri variable here. uri is indeed the most appropriate name at this point because it is now an instance of our Uri class. But before this line uri doesn't actually contain a uri, which is confusing to me at least.

mypy also complains about this:

src/seedcase_flower/cli.py:54: error: Name "uri" already defined on line 33  [no-redef]
        uri: Uri = _parse_uri(uri)
        ^~~
Found 1 error in 1 file (checked 13 source files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution to this mypy thing is to use parse_uri within read_properties() e.g. read_properties(parse_uri(uri)). (maybe this is also why I liked resolve as a word better, but I don't have a strong preference). (this is where I really miss the ability to pipe...).

properties: dict[str, Any] = _read_properties(uri) # type: ignore # TODO fix in read_prop PR

# One item per section, rendered from template.
# Internally uses Jinja2 to render templates with metadata, which
Expand Down
67 changes: 58 additions & 9 deletions src/seedcase_flower/internals.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""Helper functions for private use."""

import json
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from typing import Any
from urllib import parse


class BuildStyle(Enum):
Expand All @@ -14,15 +16,62 @@ class BuildStyle(Enum):
quarto_resource_tables = "quarto_resource_tables"


# Output maybe str? Path?
# Use `match` inside for strictness on URI types? Or use a library for URI parsing?
# TODO Extend to parse strings and return either URL or Path
def _resolve_uri(uri: str) -> Path:
return Path(uri)
@dataclass(frozen=True)
class Uri:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used the URI name on the call so I kept it for now, but we could consider calling it URL instead since by the time we are using this class, we have already converted the gh URIs and paths to https URLs and File URLs, respectively. OF course, URI is still valid since it is a superset including URLs, but URL would be more precise to at this point since non-URL URIs (gh:, file:, ...) will never be represented by this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL is only for HTTP, URI is the umbrella term that includes file:. https://en.wikipedia.org/wiki/Uniform_Resource_Identifier. And this class would contain the value which contains paths and URLs. We could call it something else, but URL won't be accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I don't think that is correct. My understanding is that a URI can be either a URN or a URL. The difference is whether it uniquely Names a resource, or Locates the resource with an actual address. HTTP is just the scheme. There are many other schemes that specifies valid URLs, e.g. ftp, irc, and file.

The link that you posted actually corroborates this understanding as exemplified by this excerpt from the second paragraph:

URIs which provide a means of locating and retrieving information resources on a network (either on the Internet or on another private network, such as a computer file system or an Intranet) are Uniform Resource Locators (URLs). ... Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it; these are Uniform Resource Names (URNs).

Some more examples on this link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we must be reading these entries in completely different ways because the way I read the wiki entry on URI is very different from how you are describing it. Like, a file is not a URL, as per wiki "web address". And a URN seems to be it's own URI (urn: vs https:), and urn: is not file:.

Here it's called a file URI: https://en.wikipedia.org/wiki/File_URI_scheme

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot confusion about these two terms that a whole formal document has been written about it: https://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/#uri-partitioning so we aren't alone in this multiple interpretation problem! 😛

"""A parsed URI with its normalised value and locality flag."""

value: str
local: bool


def _parse_uri(uri: str) -> Uri:
split_uri = parse.urlsplit(uri)
if split_uri.scheme == "":
split_uri = split_uri._replace(scheme="file")
match split_uri.scheme:
case "file":
return _convert_to_file_uri(split_uri)
case "https":
return _convert_to_https_uri(split_uri)
case "gh" | "github":
return _convert_to_github_uri(split_uri)
case _:
raise ValueError(
"The uri must be either a path to an existing file/folder "
"or a URI with one of the following URI prefixes: "
"`file:`, `https:`, `gh:`, `github:`"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this error, but raised #159 for specific gh uri errors



def _convert_to_file_uri(split_file_uri: parse.SplitResult) -> Uri:
path = Path(split_file_uri.path).resolve()
if path.is_dir():
path /= "datapackage.json"
split_file_uri = split_file_uri._replace(path=path.as_posix())
return Uri(value=split_file_uri.geturl(), local=True)


def _convert_to_https_uri(split_https_uri: parse.SplitResult) -> Uri:
return Uri(value=split_https_uri.geturl(), local=False)


def _convert_to_github_uri(split_gh_uri: parse.SplitResult) -> Uri:
return Uri(
value=split_gh_uri._replace(
scheme="https",
netloc="raw.githubusercontent.com",
path=f"/{split_gh_uri.path}/refs/heads/main/datapackage.json",
).geturl(),
local=False,
)


# TODO Extend to also read properties from URLs
def _read_properties(path: Path) -> dict[str, Any]:
with open(path) as properties_file:
datapackage: dict[str, Any] = json.load(properties_file)
return datapackage
def _read_properties(uri: Uri) -> dict[str, Any]:
if uri.local:
path = Path(parse.urlsplit(uri.value).path)
with open(path) as properties_file:
return json.load(properties_file) # type: ignore # TODO fix in read_prop PR
else:
# TODO read from remote file
return {"placeholder": uri.value}
25 changes: 14 additions & 11 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import pytest

from seedcase_flower.cli import app, view
from seedcase_flower.internals import BuildStyle
from seedcase_flower.internals import BuildStyle, Uri

_DATAPACKAGE_DATA = {
"name": "placeholder",
Expand All @@ -29,9 +29,9 @@ def datapackage_path(tmp_path):


@pytest.fixture
def mock_resolve_uri(mocker):
"""Mock _resolve_uri to isolate CLI tests from filesystem resolution."""
return mocker.patch("seedcase_flower.cli._resolve_uri")
def mock_parse_uri(mocker):
"""Mock _parse_uri to isolate CLI tests from filesystem resolution."""
return mocker.patch("seedcase_flower.cli._parse_uri")


@pytest.fixture
Expand All @@ -43,16 +43,16 @@ def mock_read_properties(mocker):
# Testing CLI invocation ====


def test_build_with_mocked_internals(mock_resolve_uri, mock_read_properties):
def test_build_with_mocked_internals(mock_parse_uri, mock_read_properties):
"""Isolate CLI behaviour by mocking internal helpers."""
fake_path = Path("datapackage.json")
mock_resolve_uri.return_value = fake_path
fake_uri = Uri(value="file:///datapackage.json", local=True)
mock_parse_uri.return_value = fake_uri
# Simulate running the app from the command line (but without calling sys.exit())
app(["build", "datapackage.json"], result_action="return_value")

# Checking that the correct values were passed to the internal functions
mock_resolve_uri.assert_called_once_with("datapackage.json")
mock_read_properties.assert_called_once_with(fake_path)
mock_parse_uri.assert_called_once_with("datapackage.json")
mock_read_properties.assert_called_once_with(fake_uri)


# Checking stdout ====
Expand Down Expand Up @@ -122,8 +122,11 @@ def test_build_reads_uri_from_flower_toml(tmp_path, monkeypatch):
Build human-readable documentation from a datapackage.json file.

╭─ Parameters ───────────────────────────────────────────────────────────────────────────╮
│ URI --uri The URI to a datapackage.json file. [default: │
│ datapackage.json] │
│ URI --uri The path to a local datapackage.json file or its parent │
│ folder. Can also be an https: URL to a remote │
│ datapackage.json or a github: / gh: URI pointing to a │
│ repo with a datapackage.json in the repo root (in the │
│ format gh:org/repo). [default: datapackage.json] │
│ STYLE --style The style used to structure the output. If a template │
│ directory is given, this parameter will be ignored. │
│ [choices: quarto-one-page, quarto-resource-listing, │
Expand Down
117 changes: 117 additions & 0 deletions tests/test_internals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
"""Tests for internal helper functions."""

import pytest

from seedcase_flower.internals import Uri, _parse_uri

# _parse_uri: plain path (no scheme) ====


def test_parse_uri_plain_file_path_is_local(tmp_path):
"""A plain file path with no scheme should return a local Uri."""
result = _parse_uri(str(tmp_path / "datapackage.json"))
assert result.local is True


def test_parse_uri_plain_file_path_has_file_scheme(tmp_path):
"""A plain file path should be normalised to a file:// URI."""
result = _parse_uri(str(tmp_path / "datapackage.json"))
assert result.value.startswith("file://")


def test_parse_uri_directory_path_appends_datapackage_json(tmp_path):
"""Passing a directory path should append datapackage.json to the URI."""
result = _parse_uri(str(tmp_path))
assert result.value.endswith("datapackage.json")


def test_parse_uri_directory_path_is_local(tmp_path):
"""Passing a directory path should return a local Uri."""
result = _parse_uri(str(tmp_path))
assert result.local is True


# _parse_uri: file:// scheme ====


def test_parse_uri_file_scheme_is_local(tmp_path):
"""A file:// URI should return a local Uri."""
result = _parse_uri(f"file://{tmp_path / 'datapackage.json'}")
assert result.local is True


def test_parse_uri_file_scheme_preserves_path(tmp_path):
"""A file:// URI pointing to a file should preserve the path."""
file = tmp_path / "datapackage.json"
result = _parse_uri(f"file://{file}")
assert str(file) in result.value


# _parse_uri: https:// scheme ====


def test_parse_uri_https_is_not_local():
"""An https:// URI should return a non-local Uri."""
result = _parse_uri("https://example.com/datapackage.json")
assert result.local is False


def test_parse_uri_https_preserves_url():
"""An https:// URI should be returned unchanged."""
url = "https://example.com/datapackage.json"
result = _parse_uri(url)
assert result.value == url


# _parse_uri: gh:// / github:// scheme ====


def test_parse_uri_gh_scheme_converts_to_raw_githubusercontent():
"""A gh:// URI should be converted to a raw.githubusercontent.com URL."""
result = _parse_uri("gh://owner/repo")
assert "raw.githubusercontent.com" in result.value


def test_parse_uri_gh_scheme_is_not_local():
"""A gh:// URI should return a non-local Uri."""
result = _parse_uri("gh://owner/repo")
assert result.local is False


def test_parse_uri_gh_scheme_appends_datapackage_json():
"""A gh:// URI should point to the datapackage.json on the main branch."""
result = _parse_uri("gh://owner/repo")
assert result.value.endswith("datapackage.json")


def test_parse_uri_github_scheme_converts_to_raw_githubusercontent():
"""A github:// URI should be converted to a raw.githubusercontent.com URL."""
result = _parse_uri("github://owner/repo")
assert "raw.githubusercontent.com" in result.value


def test_parse_uri_github_scheme_is_not_local():
"""A github:// URI should return a non-local Uri."""
result = _parse_uri("github://owner/repo")
assert result.local is False


def test_parse_uri_github_scheme_appends_datapackage_json():
"""A github:// URI should point to the datapackage.json on the main branch."""
result = _parse_uri("github://owner/repo")
assert result.value.endswith("datapackage.json")


# _parse_uri: unsupported scheme ====


def test_parse_uri_unsupported_scheme_raises_value_error():
"""An unsupported URI scheme should raise a ValueError."""
with pytest.raises(ValueError, match="uri must be either"):
_parse_uri("ftp://example.com/datapackage.json")


def test_parse_uri_returns_uri_instance(tmp_path):
"""_parse_uri should always return a Uri instance."""
result = _parse_uri(str(tmp_path / "datapackage.json"))
assert isinstance(result, Uri)
3 changes: 3 additions & 0 deletions tools/vulture-allowlist.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@
_check_jsonpath # unused method (src/seedcase_flower/section.py:83)
quarto_resource_tables # unused variable (src/seedcase_flower/internals.py:14)
cls # unused variable (src/seedcase_flower/section.py:85)
target # unused variable (src/seedcase_flower/.venv/lib/python3.12/site-packages/_virtualenv.py:50)
handler # unused variable (src/seedcase_flower/internals.py:20)
source # unused variable (src/seedcase_flower/internals.py:20)