Skip to content
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
5f6fae2
feat: :sparkles: Move over relevant code from sprout and cdp
joelostblom Feb 23, 2026
6f0ef5d
feat: :sparkles: Add Https type and class validator
joelostblom Feb 24, 2026
f943218
feat: :sparkles: Implement resolve_uri
joelostblom Feb 24, 2026
a89633d
refactor: ♻️ Use case/match for readabilty
joelostblom Feb 24, 2026
8072033
refactor: ♻️ Make variable names more precise
joelostblom Feb 24, 2026
7763097
fix: 🐛 Remove unnused import
joelostblom Feb 24, 2026
c375a30
fix: 🐛 Fix mypy errors
joelostblom Feb 24, 2026
26ad174
chore: 🔧 Fix typing
joelostblom Feb 24, 2026
662c3d6
fix: 🐛 Remove old tmp code
joelostblom Feb 24, 2026
fd70520
fix: 🐛 Remove unused import
joelostblom Feb 24, 2026
b7ad9a0
fix: 🐛 Pass vulture checks
joelostblom Feb 24, 2026
df338f3
test: ✅ Comment code to be updated in separate PR to pass tests
joelostblom Feb 24, 2026
0044b9c
fix: 🐛 Remove check datapacakge from this PR
joelostblom Feb 24, 2026
05fbaa8
fix: 🐛 Create type alias
joelostblom Feb 25, 2026
5aae406
fix: 🐛 Ignore line in mypy instead of changing logic
joelostblom Feb 25, 2026
ae6c6eb
refactor: ♻️ Change pydantic classes to custom dataclass
joelostblom Feb 26, 2026
f88fb14
fix: 🐛 Name parameters more precisely
joelostblom Feb 26, 2026
4b632fd
fix: 🐛 Remove explicit file exist check
joelostblom Feb 26, 2026
95b9c69
refactor: ♻️ Reduce cases
joelostblom Feb 27, 2026
4005455
refactor: ♻️ Name variables more precisely
joelostblom Feb 27, 2026
78da642
docs: 📝 Clarify what source can be
joelostblom Feb 27, 2026
c1bcb3f
fix: 🐛 Avoid changes to read_properties in this PR
joelostblom Feb 27, 2026
2b5aeea
Revert "fix: 🐛 Avoid changes to read_properties in this PR"
joelostblom Feb 27, 2026
43a974e
feat: ✨ Check types
joelostblom Feb 27, 2026
39a8c76
fix: 🐛 Remove redundant parsin of gh uri
joelostblom Feb 27, 2026
3c091bf
feat: ✨ Improve naming
joelostblom Feb 27, 2026
030789e
feat: ✨ Add read_prop skeleton
joelostblom Feb 27, 2026
fd8c748
refactor: ♻️ Revert to less precise name until we have discussed more
joelostblom Mar 2, 2026
0b5d5d6
Merge branch 'main' into feat/resolve-uri
joelostblom Mar 2, 2026
6a12807
test: ✅ Update test to match new docstring
joelostblom Mar 2, 2026
7d1b019
fix: 🐛 Restore local file reading behavior to not break former test
joelostblom Mar 2, 2026
a5cc597
refactor: ♻️ Rename test function to match refactor
joelostblom Mar 2, 2026
d2fb845
chore: 🔧 Ignore mypy on lines to be fixed in read_prop PR
joelostblom Mar 2, 2026
f5a206c
test: ✅ Add internal tests for _parse_uri
joelostblom Mar 2, 2026
115b19e
build: 🔨 uv update
joelostblom Mar 2, 2026
779f627
Potential fix for code scanning alert no. 14: Incomplete URL substrin…
joelostblom Mar 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions src/seedcase_flower/cli.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Functions for the exposed CLI."""

from pathlib import Path
from typing import Any, Optional
from typing import Optional

from cyclopts import App, Parameter, config

Expand Down Expand Up @@ -47,8 +47,8 @@ def build(
output_dir: The directory to save the generated files in.
verbose: If True, prints additional information to the console.
"""
path: Path = _resolve_uri(uri)
properties: dict[str, Any] = _read_properties(path)
resolved_uri = _resolve_uri(uri)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should resolved_uri be typed here or is it enough that the function return is typed already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to explicitly write it out, at least within a bigger function like this one.

properties = _read_properties(resolved_uri) # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring type checking for now since this function will be implemented in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this split between _resolve_uri and _read_properties is the simplest solution.
_resolve_uri can return either a local or a remote address, and these are loaded/fetched differently.
The way it is now, we first split the cases in the match statement, unite them in the output of _resolve_uri, and then presumably split them again in _read_properties. We could cut out the middle man and handle each case right when we split them. (I'll explain more there.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work! And just drop the _resolve_urifunction completely and just keep the read properties one instead 👍👍


# One item per section, rendered from template.
# Internally uses Jinja2 to render templates with metadata, which
Expand Down
74 changes: 68 additions & 6 deletions src/seedcase_flower/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,27 @@
import json
from enum import Enum
from pathlib import Path
from typing import Any
from typing import Annotated, Any
from urllib import parse

from pydantic import AnyUrl, FileUrl, TypeAdapter, UrlConstraints

_AnnotatedHttps = Annotated[AnyUrl, UrlConstraints(allowed_schemes=["https"])]
_adapter = TypeAdapter(_AnnotatedHttps)


class HttpsUrl(str):
"""Type and class with validation for https URLs."""

@classmethod
def __get_pydantic_core_schema__(cls, source, handler): # type: ignore[no-untyped-def]
"""Initialize adapter core schema."""
return _adapter.core_schema

def __new__(cls, value: str): # type: ignore[no-untyped-def]
"""Setup validation."""
validated = _adapter.validate_python(value)
return str.__new__(cls, validated)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This genAI generated snippet seems somewhat verbose, but there does not appear to exist another way of creating an HttpsUrl class that both works as a type and can be called (ie works the same as the existing HttpUrl class in pytdantic). See the following links for discussion and some alternatives that all seem less desirable to me:

Very open to suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what's going on here, nor what's the purpose. Is this to check that the URL is an actual URL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is just to check the URL, right? As we'll fetch the JSON from the URL immediately, there's an argument for not trying to pre-check the URL at all. If it's bad, the request will fail anyway. (Of course we could add some nice error handling around that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to both check that strings adhere to valid https URL specification and to use it as a type in a function signature. I'm fine with changing and passing around strings, but see my reply below of why I thought we would prefer to create these specific types when we try to follow functional programming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I understand it a bit more after reading your comment. As we only need to know whether an address is local or remote, a middle ground would be to have a class that captures that plus a standardised form of the URI (like so). That way we keep function signatures tidy but don't have to care about validating URIs beyond the match statement.



class BuildStyle(Enum):
Expand All @@ -14,11 +34,53 @@ class BuildStyle(Enum):
quarto_resource_tables = "quarto_resource_tables"


# Output maybe str? Path?
# Use `match` inside for strictness on URI types? Or use a library for URI parsing?
# TODO Extend to parse strings and return either URL or Path
def _resolve_uri(uri: str) -> Path:
return Path(uri)
type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lwjohnst86 As per our suggestion today, I defined a single type here instead of using | or Union in the function signature. I initially tried to achieve this with enum but it doesn't seem possible / I couldn't figure out how, so I went with type instead.

I'm unsure if the name Https_or_FileUrl is actually that helpful, but it also does not seem correct to give a name such as just URL or URI since it is really a subset of these. Even something like URL_subset does not seem that helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be an enum instead. And should be called a URI (where the default assumes a path URI).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really surprised there aren't packages or something for handling URIs... 🤔

Copy link
Member

@lwjohnst86 lwjohnst86 Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type HttpsUrl_or_FileUrl = HttpsUrl | FileUrl
class URI(Enum):
https = HttpsUrl
file = FileUrl

e.g. might look like above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding what _resolve_uri returns, but if it returns the address of the datapackage.json, then it cannot really return an enum. Enums are for fixed, finite sets of values. If _resolve_uri only determines whether the address is local or remote, then yeah it can return an enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I couldn't get this to work as an enum; it seems like it hides the type information from mypy. Take the following example that contains the same function, either with an enum, a type alias, or a union:

# supertypes.py

from enum import Enum


class Number(Enum):
    """Doc."""

    INT = int
    FLOAT = float


def fx(x: Number) -> int | float:
    """Doc."""
    x / 2
    return x


type int_or_float = int | float


def fy(y: int_or_float) -> int | float:
    """Doc."""
    y / 2
    return y


def fz(z: int | float) -> int | float:
    """Doc."""
    z / 2
    return z

When running uv run mypy --pretty supertypes.py, we can see that mypy doesn't understand that a Number is a supertype of int and float, whereas this is understood when creating a new type (and of course also with the union):

supertypes.py:13: error: Unsupported operand types for / ("Number" and "int")  [operator]
        x / 2
        ^
supertypes.py:14: error: Incompatible return value type (got "Number", expected "int | float") 
[return-value]
        return x
               ^
Found 2 errors in 1 file (checked 1 source file)

Copy link
Contributor Author

@joelostblom joelostblom Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still gives me the same error for the division:

supertypes.py:36: error: Unsupported operand types for / ("Number" and "int") 
[operator]
                return x / 2
                       ^

And even if it would work, it doesn't seem that useful for combining types in general if we always need additional logic like that. Doesn't it seem that creating a new type/type alias with type/TypeAlias is more appropriate here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading into it more, it seems enums in Python are more basic and with limited features than I expected. So my suggestion here doesn't actually work the way I was envisioning. I was imagining you use the enum to restrict the type of output and assign the URI path to the relevant enum entry. We could do a similar behaviour with a dataclass and treat it like an enum 😛 e.g.

@dataclass(frozen = True)
class URI:
	https: HttpsUrl
	file: FileUrl

def ... -> URI
		match ...
			case "file"
					URI.file = ...
			case ""
					URI.file = _check_path(...)
			...
		return URI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you have an MRE for what that would look like? When I try the following, I get the same errors as before:

from dataclasses import dataclass


@dataclass(frozen=True)
class Number:
    """Doc."""

    INT = int
    FLOAT = float


def fx(x: Number) -> int | float:
    """Doc."""
    x / 2
    return x


def fw(x: Number) -> int | float:
    """Doc."""
    match Number:
        case INT:
            return x / 2
        case FLOAT:
            return x / 2
supertypes.py:14: error: Unsupported operand types for / ("Number" and "int") 
[operator]
        x / 2
        ^
supertypes.py:15: error: Incompatible return value type (got "Number", expected
"int | float")  [return-value]
        return x
               ^
supertypes.py:22: error: Unsupported operand types for / ("Number" and "int") 
[operator]
                return x / 2
                       ^
Found 3 errors in 1 file (checked 1 source file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to butt in, but a URI needs to know 2 things: whether it's local or remote and what its address is.
That corresponds more to:

@dataclass
class Uri:
    value: str
    local: bool

A single URI shouldn't have properties for both https and file?

Copy link
Contributor Author

@joelostblom joelostblom Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but I thought the idea of this superclass was to take advantage of the validation and type specification that already exists in pydantic's FileUrl and our modified HttpsURL. If we don't want that, we could use something like AnyUrl directly and implement our own validation? Or could we somehow use the validation from the pydandtic classes inside our own Uri dataclass?



def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _resolve_uri(uri_or_path: str) -> HttpsUrl_or_FileUrl:
def _resolve_uri(uri: str) -> URI:

See comment above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better word would be _parse_uri()...? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So picking up from my earlier comment, how about looking forward to what we will do with the addresses and doing sg like:

def _resolve_uri(uri: str) -> dict[str, Any]:
    split_uri = parse.urlsplit(uri)
    match split_uri.scheme:
        case "":
            return _load_properties_from_path(uri)
        case "file":
            return _load_properties_from_path(split_uri.path)
        case "https":
            return _fetch_properties_from_url(split_uri.geturl())
        case "gh" | "github":
            return _fetch_properties_from_url(split_uri._replace(...).geturl())
        case _:
            raise ValueError(...)

Both _load_properties_from_path and _fetch_properties_from_url could take a string and try to load the JSON without checking that the string is a good address. Instead, we could anticipate common failure cases (file not found, not JSON, various network errors, etc.) and show nicely worded errors -- probably easier than typing and validating the input addresses.

(Maybe the one thing worth checking is that the gh-flavoured address contains just the organisation and the repo?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 👍 just rename the whole function to_read_properties instead. And for the gh and github one, convert then onto https and fetch like the other https

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what you wrote in your comment above regarding the split between _resolve_uri and _read_properties. It seemed somewhat forced when I was implementing it too, exactly because of what you said with the uniting and splitting again. I'm totally fine with this alternative approach (or even ducktyping for error handling as long as the messages we get are clear), but my (mis?)understanding of Luke and/or functional programming in general is that it might clash on two accounts:

  • Each parameter should be assigned a type that as closely as possible matches what it represents. Preferably errors are handled by this type instead of e.g. pattern matching a string (I understand that we do pattern matching to assign the type, so it seems like the approaches are equally fine to me from this perspective, but not sure if they are equal from a functional programming perspective).
  • Each function should just do one thing. With this new approach, we would have a (parent) function that both resolves the URI and then loads the properties. Again, this would be totally fine with me personally, but I thought that in functional programming, we want a function to do one thing (e.g. convert from one type to another) and then pass the output type to another function so that they are more standalone and composable rather than nesting two functions within a parent function.

Again, I might very well have misunderstood both of these "principles" of functional programming and I'm happy with either solution here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you're saying. What if instead we resolve thr URI so a pure path is given the "file" prefix and the gh/github is converted into a full URL. And the https is output as is. Then the downstream read properties function only needs to do two switch statements rather than X number to actually convert them into correct URIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what the initial implementation suggestion is doing, isn't it? Paths and file URIs return FileUrl whereas https URLs and github URIs return HttpsURL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea! Well, almost, aside from how it is being output, that's the only tension point 😛

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I personally think that some problems lend themselves very nicely to one type of solution and some problems to another type. To me, the solution with nested functions feels cleaner and simpler in this case. But I do get the one function -- one responsibility hangup, though of course I could be cheeky and argue that loading the properties from a URI is one responsibility.

Or taking another angle, we could say that loading the properties from a local path and fetching them from a remote address are two distinct tasks. This is true because they expect different inputs (a path vs a string) and are implemented differently. If this is the case, then what we need before loading the properties is to determine whether we have a local or a remote address and to standardise each type of address.
Something like:

@dataclass
class Uri:
    value: str
    local: bool

def _parse_uri(uri: str) -> Uri:
    split_uri = parse.urlsplit(uri)
    match split_uri.scheme:
        case "":
            return Uri(value=uri, local=True)
        case "file":
            return Uri(value=split_uri.path, local=True)
        case "https":
            return Uri(value=split_uri.geturl(), local=False)
        case "gh" | "github":
            return Uri(
                value=split_uri._replace(...).geturl(),
                local=False,
            )
        case _:
            raise ValueError(...)

parsed_uri = _parse_uri(uri)
if parsed_uri.local:
    properties = _load_properties_from_path(parsed_uri.value)
else:
    properties = _fetch_properties_from_url(parsed_uri.value)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's a good point, the distinction between local and remote. Since the functions to read the properties for either source will very likely be very differently handled..!

split_uri = parse.urlsplit(uri_or_path)
match split_uri.scheme:
case "":
return _check_path(uri_or_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, append a "file" and check as if it is a path URI.

case "file":
return _check_file_uri(split_uri)
case "https":
return _check_https_uri(split_uri)
case "gh" | "github":
return _check_github_uri(split_uri)
case _:
raise ValueError(
"The URI must be either a path to an existing file/folder "
"or have one of the following URI prefixes: "
"`file:`, `https:`, `gh:`, `github:`"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this error, but raised #159 for specific gh uri errors



def _check_path(uri_or_path: str) -> FileUrl:
Copy link
Member

@lwjohnst86 lwjohnst86 Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of _check, name it parse or resolve or as. Since we aren't checking here (no bool is output).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, aren't all these function mainly for checking that the input is a valid FileUrl or HttpsUrl and throw an error if it isn't? Do you see it more as a parsing function that also includes checking? If so, should I change it somehow so that there is a separate function for checking and a separate one for parsing (that seems rather redundant though)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but you aren't just checking here, you're also converting things into FileUrl. So the word "check" isn't accurate to the action being applied.

Copy link
Contributor Author

@joelostblom joelostblom Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then no single word would be correct right? The function would need a name like parse_and_check or similar.

In these four _check* functions, it's only _check_path and _check_github_uri that contains any parsing. The main parsing step already happened in parse.urlsplit(uri_or_path) which is outside these functions.

I'm thinking of parsing as breaking something down into parts, and what happens here is more of a conversion between data types (from str to a url type) which includes a check that it is convertable. So a possible name could be _convert_to_fileurl maybe (_transform_to_fileurl could also work but it's longer? I think _as_path as you mentioned could work, but I generally prefer when function starts with a verb that describes what the action they carry out. Now I wonder if we need it to be even more specific: _convert_path_str_to_fileurl vs convert_fileurl_str_to_file_url (becomes messy, I rather have a single convert_to_fileurl and move some logic around I think...)

path = Path(uri_or_path).resolve()
if path.is_dir():
path = path / "datapackage.json"
if not path.exists():
raise OSError(f"{path} does not exist.")
return FileUrl(path.as_uri())


def _check_file_uri(split_uri: parse.SplitResult) -> FileUrl:
return FileUrl(split_uri.geturl())


def _check_https_uri(split_uri: parse.SplitResult) -> HttpsUrl:
return HttpsUrl(split_uri.geturl())


def _check_github_uri(split_uri: parse.SplitResult) -> HttpsUrl:
return HttpsUrl(
split_uri._replace(
scheme="https",
netloc="raw.githubusercontent.com",
path=f"/{split_uri.path}/refs/heads/main/datapackage.json",
).geturl()
)


# TODO Extend to also read properties from URLs
Expand Down
3 changes: 3 additions & 0 deletions tools/vulture-allowlist.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,6 @@
_check_jsonpath # unused method (src/seedcase_flower/section.py:83)
quarto_resource_tables # unused variable (src/seedcase_flower/internals.py:14)
cls # unused variable (src/seedcase_flower/section.py:85)
target # unused variable (src/seedcase_flower/.venv/lib/python3.12/site-packages/_virtualenv.py:50)
handler # unused variable (src/seedcase_flower/internals.py:20)
source # unused variable (src/seedcase_flower/internals.py:20)
24 changes: 12 additions & 12 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading