[herd-www] Support every Arm ARM release for all catalogues#1686
[herd-www] Support every Arm ARM release for all catalogues#1686relokin merged 2 commits intoherd:masterfrom
Conversation
9811ecf to
a07760c
Compare
There was a problem hiding this comment.
Some general comments down below, plus some minor comments in the code itself.
Models caching
I see the issue with the caching behaviour, and I think adding a way to
configure this would make sense. However, I have reservations about it being
configured via a globally stateful API (namely use_memoization).
Global mutable state makes the caching configuration order‑dependent and
non‑local, therefore any call to use_memoization affects all uses of any
ParseModel.Make instance everywhere. This means in particular that if some
lib or herd code down the call stack decides to set use_memoization to
true, it will end up invalidating your assumptions about jerd's parser
caching behaviour.
I would rather much prefer if use_memoization were instead implemented as a
functor parameter of ParseModel.Make, i.e. as a field of ParseModel.Config.
This way, each parser instance's caching configuration is immutable, local, and
there's no risk of forgetting to set that flag in the correct order, because
the type system will tell you.
For jerd itsel, the key issue is that some internal lib/herd modules
(e.g. lib/interpreter and herd/getModel) instantiate their own parsers via
ParseModel.Make, so we need a way to control or disable their caching. If
caching becomes a functor parameter, we can thread that flag through their
config types (e.g. add use_memoization to Interpreter.Config) and pass it along
when they build their ParseModel.Make instance. This way, jerd can disable
caching for herd invocations by setting the appropriate value in Config
here.
PS: nitpick, but personally I would say use_cache rather than
use_memoization, as in my experience "memoization" usually implies the
function being memoized is deterministic.
PS2: not in scope for this PR, but in the future, if more fine-grained control
over caching is needed, it could also be worth exploring some refactoring
around ParseModel so that the cache reference is created per-instance
(i.e., every invocation of ParseModel.Make creates its own cache reference
that is specific to that parser), rather than once globally.
OCaml version
It seem that on opam switches using an older versions of the OCaml compiler, there ends up being a conflict between zarith 1.13 (needed for lib/SVEScalar.ml) and zarith_stubs_js (needed by herd-www). I think we should at least add a note to the README that the project will only build on newer compilers (I think it has to be at least >=4.14.0 judging from here).
Cat links in the web UI
Maybe I'm making some mistakes in the build process, but I couldn't quite get the links to the cat files to work.
I'm building and running with:
cd herd-www
make
cd www
python3 -m http.server
On localhost:8000, I can use the dropdown menu to select things like ArmARM-M.a/aarch64.cat, however when I click on the "Show me" button I get a link to http://localhost:8000/weblib/aarch64.cat.html no matter what version/snapshot of the model I select. I would assume the html page to be different based on the selected snapshot.
Perhaps related, but if I list the contents of weblib, I get a flat list of files where names like aarch64.cat.html or aarch64hwreqs.cat.html appear only once. I was instead expecting to see different subdirectories, one per model version, each with its own copy of aarch64hwreqs, aarch64deps, etc..
|
|
||
| let _ = | ||
| let mr = ref StringMap.empty in | ||
| let k0,do_rec = |
There was a problem hiding this comment.
I get this is just a script, but I find that this argument parsing block could
be a bit easier to follow. If we are ok with setting environment directories
with something like a repeated -env name=dir option, then that block could be
replaced by a, IMO, simpler and more declarative Arg-based list:
let opts =
[
("-rec", Arg.Unit (fun () -> do_rec := true), "recurse");
("-norec", Arg.Unit (fun () -> do_rec := false), "no recurse");
("-env", Arg.String (fun s ->
match String.split_on_char '=' s with
| [name; dir] -> envs := (name, dir) :: !envs
| _ -> Printf.eprintf), "...")
]
let () = Arg.parse opts (fun dir -> dirs := dir :: !dirs) "..."There was a problem hiding this comment.
It was a tradeoff between clarity in the script option parsing, or in the Makefile that invokes the script. Since the cat environments are stored as a list in a Makefile variable, it would have been a bit more indirect to pass them using repeated -env name=dir options. What do you think the best tradeoff is?
There was a problem hiding this comment.
There is also a question of consistency with other tools herd7, diy*7 etc. Those tools use the style @fsestini mentioned. If you can align up with those tool, it will be helpful.
There was a problem hiding this comment.
@TiberiuBucur We can still let ENV_PAIRS be a list, in the Makefile. We'd be creating a list of -env arguments instead. The change would amount to changing the lines:
ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),$(notdir $(d)) $(d))
ocaml ./generate_includes.ml -norec -envs $(ENV_PAIRS) -- $(CATINCLUDES) > $@
to be instead something like:
ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),-env $(notdir $(d))=$(d))
ocaml ./generate_includes.ml -norec $(ENV_PAIRS) $(CATINCLUDES) > $@
Personally I think the latter version is equally clear, but YMMV.
In light of our latest meeting, however, I'd say we can probably park this for now until the PR stabilises. The reason being that since we discussed potentially revisiting the webapp and build process to only use files from the weblib, we might end up getting rid of some of these scripts anyway.
There was a problem hiding this comment.
I made the changes as you suggested. We can revisit this in the future if we decide to only use /weblib for loading any sub-model.
| j = json.dumps({ | ||
| 'record' : m.record, | ||
| 'cats' : m.cats, | ||
| 'cats' : base_cats + extra_cats, |
There was a problem hiding this comment.
Can you please help me understand what these extra_cats are about? I was under the impression that the web UI would just select the whole cat library from one of the snapshot dirs in herd/libdir/aarch64, so I don't see why we would need to fetch cat files from other places.
There was a problem hiding this comment.
This is about the available top-level cats that are present in the dropdown for each catalogue.
If you look inside herd-www/catalogue/aarch64 (or any other book), you will see a shelf.py script which defines a bunch of cat files:
cats = [
"cats/aarch64.cat",
"cats/aarch64-v08.cat",
"cats/aarch64-v07.cat",
"cats/aarch64-v06.cat",
"cats/aarch64-v05.cat",
"cats/aarch64-v04.cat",
"cats/aarch64-v03.cat",
"cats/aarch64-v02.cat",
"cats/aarch64-v01.cat",
"cats/aarch64-v00.cat",
"cats/sc.cat",
]
These were all the options displayed in the dropdown before, and are copied from the catalogue directory in the root of the repo.
The extra cats are just the Arm ARM releases that I copied under herd-www/catalogue/$(book). Ultimately they need to be included in shelf.json (which is created by running catalogue_to_json.py) too, so that JavaScript picks them up and displays them in the dropdown.
In the future they could be included in the catalogue under the root of the repo too, which would make this logic redundant (since the Makefile copies whatever is in there). But I didn't want to make such a change without consulting with everyone.
|
@fsestini FYI
That was my first idea, but since that required changing the parameter of the functor, it meant code in many more places (including
That is contrary to the intended behaviour developed by Luc, as I understand it. Since |
|
@TiberiuBucur @relokin I had a second look at As @TiberiuBucur mentioned, Then if we run this we get something like: (A similar result is printed if we use the What the logs seem to suggest is that the file being read from disk is indeed resolved using the libdir, via This PR proposes one possible solution: introducing extra configuration to selectively disable |
a07760c to
bce9aa6
Compare
That's exactly what I was saying in my comment above. We discovered it individually at the same time :)) |
Thanks for clarifying. That makes sense, and I can see why minimising the surface area of the change was a priority. I’d just be a bit cautious about equating “less intrusive” purely with the size or spread of the patch. For instance, I personally tend to view a larger amount of straightforward, easy-to-review code as less intrusive (in a maintenance sense) than a very small change that introduces behaviour which is more subtle or harder to reason about. That said, your more recent patch to
I can see how a cache-flushing option would help in this scenario. My concern is that it may only be a relatively small improvement over the current global In particular, code structured along the lines of: would rely on fairly strong assumptions about execution order. In a concurrent setting, you’d need to be careful that no other task runs in between and repopulates the global cache before Even if |
herd-www/Makefile
Outdated
| @@ -1,13 +1,16 @@ | |||
| CAT2HTML7=$(if $(shell which cat2html7), cat2html7, ../_build/default/tools/cat2html.exe) | |||
| BOOKS=aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults bpf x86 linux | |||
| AARCH64_BOOKS= aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults | |||
There was a problem hiding this comment.
The white spaces here are a bit strange, some have two some have one. Can you also unify it, since you are changing this line of code.
There was a problem hiding this comment.
Oh, I assumed they were there on purpose, to signify some level of separation between the vanilla aarch64, the extensions, and other non-arm architectures.
bce9aa6 to
120c511
Compare
@fsestini @ShaleXIONG I fixed this. Now we're copying the ArmARM releases dependencies into |
120c511 to
f59e35f
Compare
@fsestini I checked this on my machine as well and yes, it seems like the oldest Ocaml compiler that can satisfy both the top level dependencies and the herd-www dependencies is version |
f59e35f to
05844c0
Compare
fsestini
left a comment
There was a problem hiding this comment.
LGTM.
In a follow up PR, it would be good to add some debugging/sanity-check logs showing which cat files are being loaded by herd for any given execution. I temporarily hacked such logging locally for the purpose of reviewing this PR, but it would be better if this functionality was already available out of the box under some debug flag.
As for the OCaml version, I don't mind either extending the README or adding an opam file.
| $(if $(filter $1,$(AARCH64_BOOKS)), \ | ||
| $(foreach d,$(ARM_ARM_RELEASES), \ | ||
| mkdir -p $(WWW_CATALOGUE)/$1/cats/$(notdir $(d)) && \ | ||
| cp $(d)/aarch64.cat $(WWW_CATALOGUE)/$1/cats/$(notdir $(d))/aarch64.cat; \ | ||
| ) \ | ||
| ,) |
There was a problem hiding this comment.
I am not sure I follow this.
With this statement we end up making three copies of some cat files. For example, let's take herd/libdir/aarch64/M.a/aarch64.cat. The Makefile copies it to:
herd-www/www/catalogue/${book}/herd-www/www/weblib
and then it also inlines it into a StringMap in catIncludes.ml. Have we figured out why do we need to have three copies?
There was a problem hiding this comment.
herd-www/www/catalogue/$(book)/cats- this is where the top level cat file(s) is/are fetched from when loading the website.herd-www/www/weblibcontains all dependencies loaded when clicking "Show me" in the UI. This is populated by all files in every$(book)/cats(includingaarch64-v0*.cat), as well as the ones inherd/libdir. Since theaarch64.catin$(book)/catsis a symlink of the one in libdir, we could get rid of/catsand serve files only from/weblib.catIncludes.mlis used when running herd (in browser) resolves dependencies. It acts as an in-memory pseudo-filesystem, due to herd's inability to issue requests to the server. Getting rid of this would be a significant piece of work, and my personal opinion is the costs would outweigh the benefits. I'd be happy to have a conversation about this to better understand the needs here.
This structure was very much the status-quo before the start of this project. Should we target some improvements in this PR or in a future one?
There was a problem hiding this comment.
On point (3): while getting rid of the in-memory filesystem entirely would require some serious changes in herd7 that we might want to consider carefully, we don't necessarily need to go that far right now. IMO it would still be a step forward if such in-memory filesystem could be populated at runtime by fetching files from the weblib. In other words, the in-memory fs would still there as a data structure, but the actual .cat assets would be uniquely stored only in weblib and served from there to the whole webapp. This setup would also allow to fetch assets on demand based on user requests, rather than having a single monolithic file that embeds everything.
There was a problem hiding this comment.
@fsestini I'm not sure I fully understand what you envision here. Since herd7 cannot issue requests, are you suggesting we would issue requests to the whole weblib (or only the appropriate path if, say, we are using ArmARM-L.b) when we click run, but before we switch to the Ocaml side? If so, how exactly would you build the in-memory filesystem from there?
There was a problem hiding this comment.
@fsestini I'm not sure I fully understand what you envision here. Since herd7 cannot issue requests, are you suggesting we would issue requests to the whole weblib (or only the appropriate path if, say, we are using ArmARM-L.b) when we click run, but before we switch to the Ocaml side? If so, how exactly would you build the in-memory filesystem from there?
I’m not sure that distinguishing between an “OCaml side” and a “non-OCaml side” is particularly helpful here. In practice, all OCaml code is compiled to JavaScript via js_of_ocaml, so when the web app runs in the browser there isn’t really a separate “OCaml side.”
A more meaningful distinction might be between source code that is part of herd-www and can be changed easily, and source code that lives outside of herd-www.
The code that builds the pseudo-filesystem lives on herd-www side, in webInput.ml:
let autoloader ~prefix ~path = ... (* lookup file from hardcoded map in catIncludes.ml *)
let register_autoloader () =
Js_of_ocaml.Sys_js.unmount ~path:webpath ;
Js_of_ocaml.Sys_js.mount ~path:webpath autoloader
Provided we ensure that the weblib directory is populated with all required files and that the server exposes those files as static assets, we could adjust the autoloader to fetch them from the server instead of looking them up in the hardcoded map in catIncludes.ml:
let autoloader ~prefix ~path = ... (* issue HTTP GET request to server for URL /weblib/prefix/path *)
Files can be fetched from the server via standard browser APIs (like the Fetch API).
In terms of infrastructure, for this approach to work, we would need:
- The
weblibdirectory to contain all necessary files - The server to expose the contents of
/weblibunder a known URL (e.g./weblib/...).
Point (2) is already the case when running locally via python -m http.server. It also seems to be the case in the deployed environment (for example: https://developer.arm.com/herd7/weblib/aarch64.cat).
Hopefully this clarifies things a bit.
There was a problem hiding this comment.
Thanks very much for your suggestion @fsestini. I managed to get rid of the auto-generated StringMap altogether. We are now sending GET requests to the server instead. Let me know how this looks now. I'm thinking we could refine the Makefile a little bit, but I would do this in a different PR, so that we can get this merged in time for M.b. @relokin FYI
| @echo '** INSTALL ' $(WWW_LIB) && mkdir -p $(WWW_LIB) | ||
| @ocaml generate_names.ml -norec $(ENV_PAIRS) $(CATINCLUDES) | \ | ||
| while IFS=$$'\t' read -r rel src; do \ | ||
| mkdir -p "$(WWW_LIB)/$$(dirname "$$rel")" && \ | ||
| cp "$$src" "$(WWW_LIB)/$$rel" && \ | ||
| $(CAT2HTML7) "$(WWW_LIB)/$$rel" ; \ | ||
| done | ||
| @find $(WWW_LIB) -type d -exec cp ./cat.css {}/cat.css \; |
There was a problem hiding this comment.
This is not changing anything is it?
I wouldn't be against rewriting this code but I don't think you are making it simpler - which would be an ideal outcome. For example, why did the the copy of cat.css become a find?
There was a problem hiding this comment.
It's doing something slightly different. Since dependencies for Arm ARM releases are different, they are kept under weblib/$(Arm_ARM_RELEASE). The rel argument represents the release, while the src argument is the path from which the cat file is copied. They are returned as a pair by generate_names.ml. So this copies, for example, herd/libdir/aarch64/ArmARM-L.b/aarch64hwreqs.cat (the src) into herd-www/www/weblib/ArmARM-L.b (the $(WWW_LIB)/$$rel).
The cp has become a find because the find now looks for all directories under $(WWW_LIB) - these should be $(WWW_LIB) itself, as well as all ArmARM releases - and copies cat.css in each one of them. If you'd prefer, I could make this clearer by explicitly copying cat.css in the ArmARM release directories, which could be defined in a variable.
fsestini
left a comment
There was a problem hiding this comment.
Thanks @TiberiuBucur . The HTTP requests code looks good, although missing some details: see my comments in the code.
However, it seems we are still fetching some of the cat files from catalogue/XXX/cats, and I'm not quite sure why that still needs to be the case, considering we can now get all the necessary cats from weblib. Since IIUC the idea is to have the litmus tests selection and the model selection to be orthogonal, why can't we just get rid of the catalogue cats completely?
Moreover I think generate_includes.py is not needed anymore, given these latest changes to webInput.ml. I also wonder why we need cat_includes as well.
herd-www/webInput.ml
Outdated
| Some (Js.to_string txt) | ||
| else | ||
| None | ||
| with _ -> None in |
There was a problem hiding this comment.
Can we avoid this "catch all" with pattern here? What is the specific exception we want to handle?
herd-www/webInput.ml
Outdated
| req##send Js.null; | ||
| if req##.status = 200 then | ||
| let txt = | ||
| Js.Opt.get req##.responseText (fun () -> Js.string "") in |
There was a problem hiding this comment.
Why are we returning an empty string on missing response text? If the response text is null and we are not expecting it to be, then I think we should raise some error and return None instead of silently setting the path to an empty string.
There was a problem hiding this comment.
This is actually setting the file content, not the path. Initially I thought it would be better to fail gracefully, since this is a web interface, but now I notice that it would probably still fail ungracefully because herd expects the cat file it loads here to contain something, so it might return "Relation x undefined" or something like that.
There was a problem hiding this comment.
Yes, sorry, s/path/file.
I don’t follow why it would be preferable to let herd fail later, indirectly, with an error message that may not even be relevant to the actual problem. If responseText being null is an error in this context, then it seems better to handle it here reliably and report it explicitly, rather than silently turning it into an empty string and hoping for a downstream failure.
There was a problem hiding this comment.
Let me know what you think about the code now and feel free to resolve the comments if it's okay with you.
| Js.Opt.get req##.responseText (fun () -> Js.string "") in | ||
| Some (Js.to_string txt) | ||
| else | ||
| None |
There was a problem hiding this comment.
Similarly here, I think we should print some informative error to the browser console, on failed requests.
As mentioned before, this is a separate issue, since the JavaScript code reads the models to be loaded for each "book" from the
Turns out that these have been orphaned a long while ago (2019 and 2020 respectively). I assume it's safe to delete them. @relokin , do you have any opinions? |
I understand the concern about merge conflicts, and I’m fine with waiting for the Oban patch to go in first. |
fsestini
left a comment
There was a problem hiding this comment.
Left a couple of tiny suggestions, but otherwise I think this is fine to merge as is, subject to the follow-up PRs discussed above.
|
@TiberiuBucur can you please rebase on master and then I am going to merge this PR. |
Before, the hashtbl used to only contain as a key the name of the file to be looked up. This causes issues when multiple libdirs with conflicting file names are used across the same run of herd. This eliminates this issues by adapting the keys of this hashtbl so that they contain the full path of the file, thus avoiding conflicts.
Note that this also removes the auto-generated catIncludes.ml in favour of a fetch API written through js_of_ocaml, which requests the dependencies from the server whilst running herd. A pseudo-filesystem is still mounted, but opening it now sends the aforementioned GET requests, instead of looking up the file contents in a StringMap.
|
Thanks Tiberiu, merged into master. Unfortunately, I didn't realise but this branch was a few commits behind master. |
This change adds the
aarch64.catfile in every Arm ARM release (so far L.b and M.a) as options in the dropdown menu for all the relevant books in the herd-www UI. In turn, upon selecting this file,jerd.mlincludes the appropriate directory of dependencies to use for the selected cat model (in the case ofArmARM-L.b/aarch64.cat, it will beherd/libdir/aarch64/ArmARM-L.b).In order to make this work, we also needed to add the mechanism to disable caching of cat file contents inside the
ParseModelmodule. This is because, in the browser, herd is invoked from the Ocaml compiled to JavaScript through the use ofJs_of_ocaml, rather than from a CLI (which obviously does not exist in a browser). Therefore, upon reading different versions of the same file (with the same name), herd looks up the file name in an in-memory cache, and loads that version, if it has seen it already. This becomes an issue when switching between cat files that require different dependencies.