Skip to content

[herd-www] Support every Arm ARM release for all catalogues#1686

Merged
relokin merged 2 commits intoherd:masterfrom
TiberiuBucur:herd-www
Mar 30, 2026
Merged

[herd-www] Support every Arm ARM release for all catalogues#1686
relokin merged 2 commits intoherd:masterfrom
TiberiuBucur:herd-www

Conversation

@TiberiuBucur
Copy link
Copy Markdown
Contributor

@TiberiuBucur TiberiuBucur commented Jan 27, 2026

This change adds the aarch64.cat file in every Arm ARM release (so far L.b and M.a) as options in the dropdown menu for all the relevant books in the herd-www UI. In turn, upon selecting this file, jerd.ml includes the appropriate directory of dependencies to use for the selected cat model (in the case of ArmARM-L.b/aarch64.cat, it will be herd/libdir/aarch64/ArmARM-L.b).

In order to make this work, we also needed to add the mechanism to disable caching of cat file contents inside the ParseModel module. This is because, in the browser, herd is invoked from the Ocaml compiled to JavaScript through the use of Js_of_ocaml, rather than from a CLI (which obviously does not exist in a browser). Therefore, upon reading different versions of the same file (with the same name), herd looks up the file name in an in-memory cache, and loads that version, if it has seen it already. This becomes an issue when switching between cat files that require different dependencies.

@TiberiuBucur TiberiuBucur marked this pull request as ready for review January 28, 2026 16:19
Copy link
Copy Markdown
Collaborator

@fsestini fsestini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments down below, plus some minor comments in the code itself.

Models caching

I see the issue with the caching behaviour, and I think adding a way to
configure this would make sense. However, I have reservations about it being
configured via a globally stateful API (namely use_memoization).

Global mutable state makes the caching configuration order‑dependent and
non‑local, therefore any call to use_memoization affects all uses of any
ParseModel.Make instance everywhere. This means in particular that if some
lib or herd code down the call stack decides to set use_memoization to
true, it will end up invalidating your assumptions about jerd's parser
caching behaviour.

I would rather much prefer if use_memoization were instead implemented as a
functor parameter of ParseModel.Make, i.e. as a field of ParseModel.Config.
This way, each parser instance's caching configuration is immutable, local, and
there's no risk of forgetting to set that flag in the correct order, because
the type system will tell you.

For jerd itsel, the key issue is that some internal lib/herd modules
(e.g. lib/interpreter and herd/getModel) instantiate their own parsers via
ParseModel.Make, so we need a way to control or disable their caching. If
caching becomes a functor parameter, we can thread that flag through their
config types (e.g. add use_memoization to Interpreter.Config) and pass it along
when they build their ParseModel.Make instance. This way, jerd can disable
caching for herd invocations by setting the appropriate value in Config
here.

PS: nitpick, but personally I would say use_cache rather than
use_memoization, as in my experience "memoization" usually implies the
function being memoized is deterministic.

PS2: not in scope for this PR, but in the future, if more fine-grained control
over caching is needed, it could also be worth exploring some refactoring
around ParseModel so that the cache reference is created per-instance
(i.e., every invocation of ParseModel.Make creates its own cache reference
that is specific to that parser), rather than once globally.

OCaml version

It seem that on opam switches using an older versions of the OCaml compiler, there ends up being a conflict between zarith 1.13 (needed for lib/SVEScalar.ml) and zarith_stubs_js (needed by herd-www). I think we should at least add a note to the README that the project will only build on newer compilers (I think it has to be at least >=4.14.0 judging from here).

Cat links in the web UI

Maybe I'm making some mistakes in the build process, but I couldn't quite get the links to the cat files to work.

I'm building and running with:

cd herd-www
make
cd www
python3 -m http.server

On localhost:8000, I can use the dropdown menu to select things like ArmARM-M.a/aarch64.cat, however when I click on the "Show me" button I get a link to http://localhost:8000/weblib/aarch64.cat.html no matter what version/snapshot of the model I select. I would assume the html page to be different based on the selected snapshot.

Perhaps related, but if I list the contents of weblib, I get a flat list of files where names like aarch64.cat.html or aarch64hwreqs.cat.html appear only once. I was instead expecting to see different subdirectories, one per model version, each with its own copy of aarch64hwreqs, aarch64deps, etc..


let _ =
let mr = ref StringMap.empty in
let k0,do_rec =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this is just a script, but I find that this argument parsing block could
be a bit easier to follow. If we are ok with setting environment directories
with something like a repeated -env name=dir option, then that block could be
replaced by a, IMO, simpler and more declarative Arg-based list:

let opts =
  [
    ("-rec", Arg.Unit (fun () -> do_rec := true), "recurse");
    ("-norec", Arg.Unit (fun () -> do_rec := false), "no recurse");
    ("-env", Arg.String (fun s ->
      match String.split_on_char '=' s with
      | [name; dir] -> envs := (name, dir) :: !envs
      | _ -> Printf.eprintf), "...")
  ]

let () = Arg.parse opts (fun dir -> dirs := dir :: !dirs) "..."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a tradeoff between clarity in the script option parsing, or in the Makefile that invokes the script. Since the cat environments are stored as a list in a Makefile variable, it would have been a bit more indirect to pass them using repeated -env name=dir options. What do you think the best tradeoff is?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a question of consistency with other tools herd7, diy*7 etc. Those tools use the style @fsestini mentioned. If you can align up with those tool, it will be helpful.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TiberiuBucur We can still let ENV_PAIRS be a list, in the Makefile. We'd be creating a list of -env arguments instead. The change would amount to changing the lines:

ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),$(notdir $(d)) $(d))
ocaml ./generate_includes.ml -norec -envs $(ENV_PAIRS) -- $(CATINCLUDES) > $@

to be instead something like:

ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),-env $(notdir $(d))=$(d))
ocaml ./generate_includes.ml -norec $(ENV_PAIRS) $(CATINCLUDES) > $@

Personally I think the latter version is equally clear, but YMMV.

In light of our latest meeting, however, I'd say we can probably park this for now until the PR stabilises. The reason being that since we discussed potentially revisiting the webapp and build process to only use files from the weblib, we might end up getting rid of some of these scripts anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes as you suggested. We can revisit this in the future if we decide to only use /weblib for loading any sub-model.

j = json.dumps({
'record' : m.record,
'cats' : m.cats,
'cats' : base_cats + extra_cats,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me understand what these extra_cats are about? I was under the impression that the web UI would just select the whole cat library from one of the snapshot dirs in herd/libdir/aarch64, so I don't see why we would need to fetch cat files from other places.

Copy link
Copy Markdown
Contributor Author

@TiberiuBucur TiberiuBucur Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the available top-level cats that are present in the dropdown for each catalogue.
If you look inside herd-www/catalogue/aarch64 (or any other book), you will see a shelf.py script which defines a bunch of cat files:

cats = [
    "cats/aarch64.cat",
    "cats/aarch64-v08.cat",
    "cats/aarch64-v07.cat",
    "cats/aarch64-v06.cat",
    "cats/aarch64-v05.cat",
    "cats/aarch64-v04.cat",
    "cats/aarch64-v03.cat",
    "cats/aarch64-v02.cat",
    "cats/aarch64-v01.cat",
    "cats/aarch64-v00.cat",
    "cats/sc.cat",
    ]

These were all the options displayed in the dropdown before, and are copied from the catalogue directory in the root of the repo.
The extra cats are just the Arm ARM releases that I copied under herd-www/catalogue/$(book). Ultimately they need to be included in shelf.json (which is created by running catalogue_to_json.py) too, so that JavaScript picks them up and displays them in the dropdown.

In the future they could be included in the catalogue under the root of the repo too, which would make this logic redundant (since the Makefile copies whatever is in there). But I didn't want to make such a change without consulting with everyone.

@TiberiuBucur
Copy link
Copy Markdown
Contributor Author

TiberiuBucur commented Jan 29, 2026

@fsestini FYI

I would rather much prefer if use_memoization were instead implemented as a
functor parameter of ParseModel.Make, i.e. as a field of ParseModel.Config.
This way, each parser instance's caching configuration is immutable, local, and
there's no risk of forgetting to set that flag in the correct order, because
the type system will tell you.

That was my first idea, but since that required changing the parameter of the functor, it meant code in many more places (including herd) had to be modified. I prioritised the least intrusive method. But I see how this can become an issue if one chooses to re-enable caching inside herd. I think a better idea would have been just to give the option of flushing the cache in between runs, instead of enabling or disabling caching.

PS2: not in scope for this PR, but in the future, if more fine-grained control
over caching is needed, it could also be worth exploring some refactoring
around ParseModel so that the cache reference is created per-instance
(i.e., every invocation of ParseModel.Make creates its own cache reference
that is specific to that parser), rather than once globally.

That is contrary to the intended behaviour developed by Luc, as I understand it. Since ParseModel.Make is instantiated in multiple places (eg. herd.ml, getModel.ml, interpreter.ml etc.), and they might all call the same file, if the Hashtbl were per-module, the caching would not be as effective. Perhaps when this was developed no one thought multiple instantiations of ParseModel.Make would use different libdirs.

@fsestini
Copy link
Copy Markdown
Collaborator

@TiberiuBucur @relokin I had a second look at ParseModel based on our offline discussion. Hopefully I can provide some context to understand the libdir caching issue better.

As @TiberiuBucur mentioned, ParseModel.Make.find_parse
does not seem to be caching files based on a path that includes the libdir component, but only based on the "base name".
We can highlight this with some debug logging:

  let find_parse fname =
    try Hashtbl.find t fname
    with Not_found ->
      let key = fname in
      Format.printf "find_parse: key: %s@." fname;  (* log the Hashtbl key *)
      let fname = O.libfind fname in
      Format.printf "find_parse: fname: %s@." fname;   (* log the file being read from disk *)
      let r = Misc.input_protect (do_parse fname) fname in
      Hashtbl.add t key (fname,r) ;
      fname,r

Then if we run this

dune exec herd/herd.exe -- -set-libdir ./herd/libdir/aarch64/ArmARM-L.b MP.litmus

we get something like:

find_parse: key: aarch64.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/aarch64.cat
find_parse: key: stdlib.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/stdlib.cat
find_parse: key: aarch64hwreqs.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/aarch64hwreqs.cat
...

(A similar result is printed if we use the -I option instead.)

What the logs seem to suggest is that the file being read from disk is indeed resolved using the libdir, via let fname = O.libfind fname, however, the key that is used to index into the cache is not resolved using the libdir.
This means that if herd fetches aarch64.cat from ./herd/libdir/aarch64/ArmARM-L.b (caching it under the key aarch64.cat), and subsequently we change the libdir to be ./herd/libdir/aarch64/ArmARM-M.a, any calls to find_parse asking for aarch64.cat will still end up returning the old, incorrect, cached file from ./herd/libdir/aarch64/ArmARM-L.b.
The implementation of find_parse is fine in a typical CLI run of herd7, because in that scenario we assume the libdir won't change mid-execution. However, when the same instance of herd is invoked multiple times with different
libdirs (which, if I understand correctly, is what jerd does), then find_parse will not work correctly.

This PR proposes one possible solution: introducing extra configuration to selectively disable ParseModel’s caching. Ideally, find_parse itself would be able to handle libdir changes without extra config, but either way this seems to imply touching ParseModel/find_parse somehow. I realise @relokin had concerns about doing so, and I’m not necessarily attached to any particular approach, though these are the best options I can see right now given my current understanding.

@TiberiuBucur
Copy link
Copy Markdown
Contributor Author

As @TiberiuBucur mentioned, ParseModel.Make.find_parse does not seem to be caching files based on a path that includes the libdir component, but only based on the "base name". We can highlight this with some debug logging:

  let find_parse fname =
    try Hashtbl.find t fname
    with Not_found ->
      let key = fname in
      Format.printf "find_parse: key: %s@." fname;  (* log the Hashtbl key *)
      let fname = O.libfind fname in
      Format.printf "find_parse: fname: %s@." fname;   (* log the file being read from disk *)
      let r = Misc.input_protect (do_parse fname) fname in
      Hashtbl.add t key (fname,r) ;
      fname,r

Then if we run this

dune exec herd/herd.exe -- -set-libdir ./herd/libdir/aarch64/ArmARM-L.b MP.litmus

we get something like:

find_parse: key: aarch64.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/aarch64.cat
find_parse: key: stdlib.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/stdlib.cat
find_parse: key: aarch64hwreqs.cat
find_parse: fname: ./herd/libdir/aarch64/ArmARM-L.b/aarch64hwreqs.cat
...

That's exactly what I was saying in my comment above. We discovered it individually at the same time :))

@fsestini
Copy link
Copy Markdown
Collaborator

fsestini commented Jan 30, 2026

That was my first idea, but since that required changing the parameter of the functor, it meant code in many more places (including herd) had to be modified. I prioritised the least intrusive method.

Thanks for clarifying. That makes sense, and I can see why minimising the surface area of the change was a priority. I’d just be a bit cautious about equating “less intrusive” purely with the size or spread of the patch. For instance, I personally tend to view a larger amount of straightforward, easy-to-review code as less intrusive (in a maintenance sense) than a very small change that introduces behaviour which is more subtle or harder to reason about.

That said, your more recent patch to ParseModel looks like a better direction than the approach being discussed here, so this may be largely moot. Still, I’d be interested to hear whether you experimented with the functor configuration approach and found it to be particularly complex in practice.

But I see how this can become an issue if one chooses to re-enable caching inside herd. I think a better idea would have
been just to give the option of flushing the cache in between runs, instead of enabling or disabling caching.

I can see how a cache-flushing option would help in this scenario. My concern is that it may only be a relatively small improvement over the current global use_memoization switch, in that it still exposes control over a globally stateful and order-dependent API.

In particular, code structured along the lines of:

flush cache;
call herd;  // assumes the cache is empty

would rely on fairly strong assumptions about execution order. In a concurrent setting, you’d need to be careful that no other task runs in between and repopulates the global cache before call herd executes.

Even if jerd isn’t invoking multiple herd tasks concurrently as of now, browser-based JS code tends to be fairly concurrent and event-driven, and it seems plausible that future changes could end up violating these assumptions in subtle ways.

@@ -1,13 +1,16 @@
CAT2HTML7=$(if $(shell which cat2html7), cat2html7, ../_build/default/tools/cat2html.exe)
BOOKS=aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults bpf x86 linux
AARCH64_BOOKS= aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The white spaces here are a bit strange, some have two some have one. Can you also unify it, since you are changing this line of code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I assumed they were there on purpose, to signify some level of separation between the vanilla aarch64, the extensions, and other non-arm architectures.

@TiberiuBucur
Copy link
Copy Markdown
Contributor Author

On localhost:8000, I can use the dropdown menu to select things like ArmARM-M.a/aarch64.cat, however when I click on the "Show me" button I get a link to http://localhost:8000/weblib/aarch64.cat.html no matter what version/snapshot of the model I select. I would assume the html page to be different based on the selected snapshot.

@fsestini @ShaleXIONG I fixed this. Now we're copying the ArmARM releases dependencies into weblib and making the right requests as well. As I suspected, the weblib is only used for the "Show me" button, and not while running herd.

@TiberiuBucur
Copy link
Copy Markdown
Contributor Author

TiberiuBucur commented Feb 6, 2026

OCaml version

It seem that on opam switches using an older versions of the OCaml compiler, there ends up being a conflict between zarith 1.13 (needed for lib/SVEScalar.ml) and zarith_stubs_js (needed by herd-www). I think we should at least add a note to the README that the project will only build on newer compilers (I think it has to be at least >=4.14.0 judging from here).

@fsestini I checked this on my machine as well and yes, it seems like the oldest Ocaml compiler that can satisfy both the top level dependencies and the herd-www dependencies is version 4.14.0. I understand we might not want to update the whole project to rely on this newer Ocaml version, but I believe we can have a .opam file inside herd-www with the required dependencies, which would at least make this mismatch less probable for automatic builds. A comment in a README might not be enough for whoever is responsible for the deployment of herd-www. However, it obviously has not been a problem so far, since the herd interface on developer.arm.com is functional. What do you think?

@fsestini fsestini self-requested a review February 17, 2026 15:56
Copy link
Copy Markdown
Collaborator

@fsestini fsestini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

In a follow up PR, it would be good to add some debugging/sanity-check logs showing which cat files are being loaded by herd for any given execution. I temporarily hacked such logging locally for the purpose of reviewing this PR, but it would be better if this functionality was already available out of the box under some debug flag.

As for the OCaml version, I don't mind either extending the README or adding an opam file.

Comment on lines +28 to +33
$(if $(filter $1,$(AARCH64_BOOKS)), \
$(foreach d,$(ARM_ARM_RELEASES), \
mkdir -p $(WWW_CATALOGUE)/$1/cats/$(notdir $(d)) && \
cp $(d)/aarch64.cat $(WWW_CATALOGUE)/$1/cats/$(notdir $(d))/aarch64.cat; \
) \
,)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow this.

With this statement we end up making three copies of some cat files. For example, let's take herd/libdir/aarch64/M.a/aarch64.cat. The Makefile copies it to:

  • herd-www/www/catalogue/${book}/
  • herd-www/www/weblib

and then it also inlines it into a StringMap in catIncludes.ml. Have we figured out why do we need to have three copies?

Copy link
Copy Markdown
Contributor Author

@TiberiuBucur TiberiuBucur Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. herd-www/www/catalogue/$(book)/cats - this is where the top level cat file(s) is/are fetched from when loading the website.
  2. herd-www/www/weblib contains all dependencies loaded when clicking "Show me" in the UI. This is populated by all files in every $(book)/cats (including aarch64-v0*.cat), as well as the ones in herd/libdir. Since the aarch64.cat in $(book)/cats is a symlink of the one in libdir, we could get rid of /cats and serve files only from /weblib.
  3. catIncludes.ml is used when running herd (in browser) resolves dependencies. It acts as an in-memory pseudo-filesystem, due to herd's inability to issue requests to the server. Getting rid of this would be a significant piece of work, and my personal opinion is the costs would outweigh the benefits. I'd be happy to have a conversation about this to better understand the needs here.

This structure was very much the status-quo before the start of this project. Should we target some improvements in this PR or in a future one?

Copy link
Copy Markdown
Collaborator

@fsestini fsestini Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On point (3): while getting rid of the in-memory filesystem entirely would require some serious changes in herd7 that we might want to consider carefully, we don't necessarily need to go that far right now. IMO it would still be a step forward if such in-memory filesystem could be populated at runtime by fetching files from the weblib. In other words, the in-memory fs would still there as a data structure, but the actual .cat assets would be uniquely stored only in weblib and served from there to the whole webapp. This setup would also allow to fetch assets on demand based on user requests, rather than having a single monolithic file that embeds everything.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsestini I'm not sure I fully understand what you envision here. Since herd7 cannot issue requests, are you suggesting we would issue requests to the whole weblib (or only the appropriate path if, say, we are using ArmARM-L.b) when we click run, but before we switch to the Ocaml side? If so, how exactly would you build the in-memory filesystem from there?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fsestini I'm not sure I fully understand what you envision here. Since herd7 cannot issue requests, are you suggesting we would issue requests to the whole weblib (or only the appropriate path if, say, we are using ArmARM-L.b) when we click run, but before we switch to the Ocaml side? If so, how exactly would you build the in-memory filesystem from there?

I’m not sure that distinguishing between an “OCaml side” and a “non-OCaml side” is particularly helpful here. In practice, all OCaml code is compiled to JavaScript via js_of_ocaml, so when the web app runs in the browser there isn’t really a separate “OCaml side.”
A more meaningful distinction might be between source code that is part of herd-www and can be changed easily, and source code that lives outside of herd-www.

The code that builds the pseudo-filesystem lives on herd-www side, in webInput.ml:

let autoloader ~prefix ~path = ... (* lookup file from hardcoded map in catIncludes.ml *)

let register_autoloader () =
  Js_of_ocaml.Sys_js.unmount ~path:webpath ;
  Js_of_ocaml.Sys_js.mount ~path:webpath autoloader

Provided we ensure that the weblib directory is populated with all required files and that the server exposes those files as static assets, we could adjust the autoloader to fetch them from the server instead of looking them up in the hardcoded map in catIncludes.ml:

let autoloader ~prefix ~path = ... (* issue HTTP GET request to server for URL /weblib/prefix/path *)

Files can be fetched from the server via standard browser APIs (like the Fetch API).

In terms of infrastructure, for this approach to work, we would need:

  1. The weblib directory to contain all necessary files
  2. The server to expose the contents of /weblib under a known URL (e.g. /weblib/...).

Point (2) is already the case when running locally via python -m http.server. It also seems to be the case in the deployed environment (for example: https://developer.arm.com/herd7/weblib/aarch64.cat).

Hopefully this clarifies things a bit.

Copy link
Copy Markdown
Contributor Author

@TiberiuBucur TiberiuBucur Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for your suggestion @fsestini. I managed to get rid of the auto-generated StringMap altogether. We are now sending GET requests to the server instead. Let me know how this looks now. I'm thinking we could refine the Makefile a little bit, but I would do this in a different PR, so that we can get this merged in time for M.b. @relokin FYI

Comment on lines +40 to +47
@echo '** INSTALL ' $(WWW_LIB) && mkdir -p $(WWW_LIB)
@ocaml generate_names.ml -norec $(ENV_PAIRS) $(CATINCLUDES) | \
while IFS=$$'\t' read -r rel src; do \
mkdir -p "$(WWW_LIB)/$$(dirname "$$rel")" && \
cp "$$src" "$(WWW_LIB)/$$rel" && \
$(CAT2HTML7) "$(WWW_LIB)/$$rel" ; \
done
@find $(WWW_LIB) -type d -exec cp ./cat.css {}/cat.css \;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changing anything is it?

I wouldn't be against rewriting this code but I don't think you are making it simpler - which would be an ideal outcome. For example, why did the the copy of cat.css become a find?

Copy link
Copy Markdown
Contributor Author

@TiberiuBucur TiberiuBucur Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doing something slightly different. Since dependencies for Arm ARM releases are different, they are kept under weblib/$(Arm_ARM_RELEASE). The rel argument represents the release, while the src argument is the path from which the cat file is copied. They are returned as a pair by generate_names.ml. So this copies, for example, herd/libdir/aarch64/ArmARM-L.b/aarch64hwreqs.cat (the src) into herd-www/www/weblib/ArmARM-L.b (the $(WWW_LIB)/$$rel).

The cp has become a find because the find now looks for all directories under $(WWW_LIB) - these should be $(WWW_LIB) itself, as well as all ArmARM releases - and copies cat.css in each one of them. If you'd prefer, I could make this clearer by explicitly copying cat.css in the ArmARM release directories, which could be defined in a variable.

Copy link
Copy Markdown
Collaborator

@fsestini fsestini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TiberiuBucur . The HTTP requests code looks good, although missing some details: see my comments in the code.

However, it seems we are still fetching some of the cat files from catalogue/XXX/cats, and I'm not quite sure why that still needs to be the case, considering we can now get all the necessary cats from weblib. Since IIUC the idea is to have the litmus tests selection and the model selection to be orthogonal, why can't we just get rid of the catalogue cats completely?

Moreover I think generate_includes.py is not needed anymore, given these latest changes to webInput.ml. I also wonder why we need cat_includes as well.

Some (Js.to_string txt)
else
None
with _ -> None in
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this "catch all" with pattern here? What is the specific exception we want to handle?

req##send Js.null;
if req##.status = 200 then
let txt =
Js.Opt.get req##.responseText (fun () -> Js.string "") in
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we returning an empty string on missing response text? If the response text is null and we are not expecting it to be, then I think we should raise some error and return None instead of silently setting the path to an empty string.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually setting the file content, not the path. Initially I thought it would be better to fail gracefully, since this is a web interface, but now I notice that it would probably still fail ungracefully because herd expects the cat file it loads here to contain something, so it might return "Relation x undefined" or something like that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, s/path/file.

I don’t follow why it would be preferable to let herd fail later, indirectly, with an error message that may not even be relevant to the actual problem. If responseText being null is an error in this context, then it seems better to handle it here reliably and report it explicitly, rather than silently turning it into an empty string and hoping for a downstream failure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think about the code now and feel free to resolve the comments if it's okay with you.

Js.Opt.get req##.responseText (fun () -> Js.string "") in
Some (Js.to_string txt)
else
None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, I think we should print some informative error to the browser console, on failed requests.

@TiberiuBucur
Copy link
Copy Markdown
Contributor Author

However, it seems we are still fetching some of the cat files from catalogue/XXX/cats, and I'm not quite sure why that still needs to be the case, considering we can now get all the necessary cats from weblib. Since IIUC the idea is to have the litmus tests selection and the model selection to be orthogonal, why can't we just get rid of the catalogue cats completely?

As mentioned before, this is a separate issue, since the JavaScript code reads the models to be loaded for each "book" from the shelf.json file. Basically, if we are to use weblib for everything (which is ideal), the paths in shelf.py would have to refer backwards in the file system (they would include ../), which is not ideal in browsers. To do this properly, we would need to modify the JavaScript code quite a bit. I am planning on changing it, but I have an idea this will cause some difficult merge conflicts with the Oban update branch, therefore I wanted to implement this optimisation after that is merged.

Moreover I think generate_includes.py is not needed anymore, given these latest changes to webInput.ml. I also wonder why we need cat_includes as well.

Turns out that these have been orphaned a long while ago (2019 and 2020 respectively). I assume it's safe to delete them. @relokin , do you have any opinions?

@fsestini
Copy link
Copy Markdown
Collaborator

As mentioned before, this is a separate issue, since the JavaScript code reads the models to be loaded for each "book" from the shelf.json file. Basically, if we are to use weblib for everything (which is ideal), the paths in shelf.py would have to refer backwards in the file system (they would include ../), which is not ideal in browsers. To do this properly, we would need to modify the JavaScript code quite a bit. I am planning on changing it, but I have an idea this will cause some difficult merge conflicts with the Oban update branch, therefore I wanted to implement this optimisation after that is merged.

I understand the concern about merge conflicts, and I’m fine with waiting for the Oban patch to go in first.
But I think it’s worth being explicit that we should work on the catalogue refactor with high priority. The broader goal here is to converge on a single source of truth for cat files, which I think we agree should be weblib. Your patch to remove the hardcoded map is a very good step forward, but we are not really done until we also remove the catalogue as a separate source of cats. Another reason for prioritising it is that as you rightly say it will likely affect the architecture of the webapp in significant ways, so we should get it done early to avoid more potential merge conflicts.

Copy link
Copy Markdown
Collaborator

@fsestini fsestini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of tiny suggestions, but otherwise I think this is fine to merge as is, subject to the follow-up PRs discussed above.

@relokin
Copy link
Copy Markdown
Member

relokin commented Mar 27, 2026

@TiberiuBucur can you please rebase on master and then I am going to merge this PR.

Before, the hashtbl used to only contain as a key the name
of the file to be looked up. This causes issues when multiple
libdirs with conflicting file names are used across the same
run of herd. This eliminates this issues by adapting the keys
of this hashtbl so that they contain the full path of the
file, thus avoiding conflicts.
Note that this also removes the auto-generated catIncludes.ml
in favour of a fetch API written through js_of_ocaml, which
requests the dependencies from the server whilst running herd.
A pseudo-filesystem is still mounted, but opening it now sends
the aforementioned GET requests, instead of looking up the file
contents in a StringMap.
@relokin relokin merged commit 8170ef3 into herd:master Mar 30, 2026
5 checks passed
@relokin
Copy link
Copy Markdown
Member

relokin commented Mar 30, 2026

Thanks Tiberiu, merged into master. Unfortunately, I didn't realise but this branch was a few commits behind master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants