Skip to content

Conversation

@dybucc
Copy link
Contributor

@dybucc dybucc commented Dec 7, 2025

The extractor needed some fine tuning to only pick up top level docstrings instead of (possibly wrong) locally scoped docstrings outside the reach of library users.

The function signature parser should now be a tad bit more efficient. It performs no operations on the whole array that gets passed beyond those strictly required by however as many elements (lines) are part of the function signature. Prior to this, there were a bunch of enumerate and join calls that wouldn't exactly be efficient for, say, arrays containing thousands of lines for some of the source code.

Further work will continue in the function parameter parsing, by possibly modifying the parameters of the parser itself, such that without modifying the resulting typst query output, we avoid performing two passes through the argument list; one in the initial function signature parsing, and another one in the parameter list parsing.

Once work on the function signature is done, the next step will be to fix the actual docstring parser, such that it picks up on newlines in function parameter documentation. An example of a docstring that I expect the parser to work through nicely is given in #986. Only after this is done, will I try to move on to seeing what can be done with the type syntax incompatibilities between the manual and the web documentation.

I also wanted to ask whether it's a good idea to be running the Python script for HTML generation directly and not through an isolated environment, by possibly using the nice integration just has with uv for script recipes [1].

@dybucc dybucc changed the title refactor: Rework extractor and fn signature parser refactor: Rework docstring parser Dec 7, 2025
@dybucc dybucc marked this pull request as draft December 7, 2025 08:15
@dybucc dybucc force-pushed the docstring-parser-rework branch 2 times, most recently from ffddcbe to 05dd1fe Compare December 7, 2025 08:21
The extractor needed some fine tuning to only pick up top level
docstrings instead of (possibly wrong) locally scoped docstrings outside
the reach of library users.

The function signature parser should now be a tad bit more efficient. It
performs no operations on the whole array that gets passed beyond those
strictly required by however as many elements (lines) are part of the
function signature. Prior to this, there were a bunch of `enumerate` and
`join` calls that wouldn't exactly be efficient for, say, arrays
containing thousands of lines for some of the source code.
@dybucc dybucc force-pushed the docstring-parser-rework branch from 05dd1fe to 540ec7a Compare December 7, 2025 08:27
@johannes-wolf johannes-wolf self-requested a review December 7, 2025 14:32
@johannes-wolf
Copy link
Member

I also wanted to ask whether it's a good idea to be running the Python script for HTML generation directly and not through an isolated environment, by possibly using the nice integration just has with uv for script recipes [1].

What exactly do you mean?

@dybucc
Copy link
Contributor Author

dybucc commented Dec 7, 2025

I also wanted to ask whether it's a good idea to be running the Python script for HTML generation directly and not through an isolated environment, by possibly using the nice integration just has with uv for script recipes [1].

What exactly do you mean?

I mean the genhtml.py script could be part of the justfile as a script recipe, instead of being run as a separate command. Then there's the fact the current recipe for documentation generation just tries to run some python executable on the system.

The genhtml.py script could be made into a Python recipe directly inside the justfile or imported through a Just module. Then the output of the typst query could be stored as part of a variable inside a Just variable.

For both of these, Just provides nice built-ins to (1) require a program to exist in the user's PATH for execution and reporting so if that's not the case, (2) isolating Python script execution with uv while also requiring uv to be installed, and (3) further isolating (in the structural sense) from the codebase those aspects that are mostly related/used in our command runner of choice.

Granted, transitioning from an independent Python script to a script recipe in Just would be a bit of work, but it should reduce the cognitive load on the script by not expecting it to work independent of justfile string replacement facilities prior to running.

An example of this that I can quickly get my hands on is a justfile I made for a simple C++ project, that I attach below. See the _compile recipe for an example of what I'm talking about with Just's string facilities being used in conjunction with Python scripts, and see variables like lldb for an example of what I mean with having Just force a program to exist in the user's PATH before attempting some invocation.

set unstable := true
set shell := ["fish", "-c"]
set script-interpreter := ["uv", "run", "--script"]
set quiet := true

alias c := clean
alias d := doc
alias cc := compile
alias r := run
alias dbg := debug

src_dir := if path_exists(justfile_directory() / "src") == "true" { justfile_directory() / "src" } else { error("src directory not found") }
build_dir := justfile_directory() / "build"
target_out := build_dir / "final_program"
lsd := require("lsd")
src_files := prepend(src_dir / "", replace(shell(lsd + " --icon=never -1 " + src_dir), "\n", ' '))
obj_files := replace(replace_regex(src_files, '([[:alpha:]]+)\.cc', '${1}.o'), src_dir, build_dir)
clangd_flags := if path_exists(justfile_directory() / "compile_flags.txt") == "true" { justfile_directory() / "compile_flags.txt" } else { error("compile_flags not found") }
cxx := require("clang++")
cxxflags := trim(replace(replace_regex(read(clangd_flags), '(?m)^-I(.*)?\n', ''), "\n", ' '))
ldflags := trim(env("LDFLAGS", "") + " -pie")
cppflags := env("CPPFLAGS", "") + " " + trim(replace_regex(read(clangd_flags), '(?m)^-[^I](.*)', ''))
doxygen := require("doxygen")
doc_dir := if path_exists(justfile_directory() / "doc") == "true" { justfile_directory() / "doc" } else { error("doc directory not found") }
doxyfile := if path_exists(doc_dir / "configDoxygen.cfg") == "true" { doc_dir / "configDoxygen.cfg" } else { error("doxyfile not found") }
lldb := require("lldb")

[private]
default:
    just --list --unsorted --justfile {{ justfile() }}

# generates doxygen documentation
[macos]
doc:
    {{ doxygen }} {{ doxyfile }}
    {{ doc_dir / "html" }} && pwd | pbcopy

# cleans up build artifacts and older docs
clean:
    rm -rf {{ build_dir }}
    rm -rf {{ doc_dir / "html" }}

# build current project (non-incrementally)
[macos]
compile: _compile
    {{ cxx }} \
    {{ obj_files }} \
    -o {{ target_out }} \
    {{ ldflags }}

[script]
_compile: clean
    # /// script
    # dependencies = ["sh"]
    # ///
    import sh

    cxx = sh.Command({{ quote(cxx) }})
    cppflags = [{{ replace(quote(cppflags), " ", "', '") }}]
    cxxflags = [{{ replace(quote(cxxflags), " ", "', '") }}]

    input = [{{ replace(quote(src_files), " ", "', '") }}]
    output = [{{ replace(quote(obj_files), " ", "', '") }}]

    sh.mkdir("-p", {{ quote(build_dir) }})

    for i, file in enumerate(input):
        cxx(*cppflags, *cxxflags, c=file, o=output[i])

# run the thing
[no-quiet]
run *args: compile
    {{ target_out }} {{ args }}

# debug the thing
[no-quiet]
debug: compile
    {{ lldb }} {{ target_out }}

@johannes-wolf
Copy link
Member

Hm, I have no opinion on this. But I would like to keep tools separate – the script should work without just. I guess expecting a working python executable on the host machine is fine, tbh.

@dybucc
Copy link
Contributor Author

dybucc commented Dec 7, 2025

Hm, I have no opinion on this. But I would like to keep tools separate – the script should work without just. I guess expecting a working python executable on the host machine is fine, tbh.

Either way, there's still pending work on this PR before moving on to anything related to the web documentation. I'll see then if I can make some changes to the justfile without making the genhtml.py file rely on it.

Following the plan in the PR this branch is part of, the function
signature parser rework is done. The conclusion is that the signature
gets parsed now all in a single pass instead of having to separately
consider the function and parameter span, and then parse the parameter
span.

The gains in efficency on the prior commit are still kept, so no
operations are performed on the whole array; only those elements of the
array with all the source lines corresponding to the function are
parsed.

There is, though, a small performance loss in the fact that a custom
runtime regex is built to more accurately have the whitespace
indentation of some parameters' default values be represented in the
final output. Because the project doesn't use an autoformatter, if some
lines contain an indentation that is beyond a single multiple of 2
(according to the .editorconfig file) then the parser will correctly
recognize that the contents of the named argument, if the named argument
is not a string or Typst content type, should be deindented by however
as much whitespace was detected at the start of the parameter name.

Now the parameter parser also correctly implements function default
arguments.
@johannes-wolf
Copy link
Member

I cherry picked your commit onto master.

@dybucc
Copy link
Contributor Author

dybucc commented Jan 3, 2026

I just pushed the changes I've been making throughout the last few weeks to the parser.

These include merging both the function signature parser and the argument parser to extract
information on the arguments within the string. I did this after realizing the additional bits of
information that would've gotten passed to the argument parser wouldn't really make that big of a
difference when trying to reduce the amount of passes through that specific string and its
corresponding substrings.

I also introduced support for more Typst type values if some future function uses as a named
parameter with default values an anonymous function. This should now work, as I've tested it and it
works just fine.

The docstring parser proper is mostly done. Unlike the above, it's not yet working seamlessly with
the existing docstrings, but the only thing left is to better support some edge cases like badly
documented parameters, and Typst native syntax lists.

I've also experimented with some diagnostics in the docstring parsing process, but I ended up
leaving only one, as that seems like the only thing that would make real sense without making too
many assumptions. I think once I get the unified syntax for types working in both the PDF manual and
the web docs, I'll see about adding some trace information to the docstring for panics that may get
thrown.

The highlight of the refactored docstring parser is that it now also supports mutline parameter
types by supporting a slightly larger grammar. What I've not yet added support for is having
multiline default parameter values, which would be nice to pair up with the existing support for
multiline default values already implemented in the function signature parser.

To recap:

  1. The docstring parser is almost done.
  2. The unified type syntax is not yet done. I've not thought much about it, so ideas are very much
    appreciated.
  3. The docstring parser is going to get two types of documentation:
    • Documentation for future and present contributors to document CeTZ well.
    • Documentation for the parser itself, so the overhead for future maintainers of the parser is
      less.

The parser is mostly done.

This commit will be ammended/fixed up once the docstring parser is
completely done, so more details can be found in the accompanying PR.
@dybucc dybucc force-pushed the docstring-parser-rework branch from 454c568 to 8a2e9fb Compare January 4, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants