[herd-www] Support every Arm ARM release for all catalogues#1686
[herd-www] Support every Arm ARM release for all catalogues#1686TiberiuBucur wants to merge 2 commits intoherd:masterfrom
Conversation
9811ecf to
a07760c
Compare
There was a problem hiding this comment.
Some general comments down below, plus some minor comments in the code itself.
Models caching
I see the issue with the caching behaviour, and I think adding a way to
configure this would make sense. However, I have reservations about it being
configured via a globally stateful API (namely use_memoization).
Global mutable state makes the caching configuration order‑dependent and
non‑local, therefore any call to use_memoization affects all uses of any
ParseModel.Make instance everywhere. This means in particular that if some
lib or herd code down the call stack decides to set use_memoization to
true, it will end up invalidating your assumptions about jerd's parser
caching behaviour.
I would rather much prefer if use_memoization were instead implemented as a
functor parameter of ParseModel.Make, i.e. as a field of ParseModel.Config.
This way, each parser instance's caching configuration is immutable, local, and
there's no risk of forgetting to set that flag in the correct order, because
the type system will tell you.
For jerd itsel, the key issue is that some internal lib/herd modules
(e.g. lib/interpreter and herd/getModel) instantiate their own parsers via
ParseModel.Make, so we need a way to control or disable their caching. If
caching becomes a functor parameter, we can thread that flag through their
config types (e.g. add use_memoization to Interpreter.Config) and pass it along
when they build their ParseModel.Make instance. This way, jerd can disable
caching for herd invocations by setting the appropriate value in Config
here.
PS: nitpick, but personally I would say use_cache rather than
use_memoization, as in my experience "memoization" usually implies the
function being memoized is deterministic.
PS2: not in scope for this PR, but in the future, if more fine-grained control
over caching is needed, it could also be worth exploring some refactoring
around ParseModel so that the cache reference is created per-instance
(i.e., every invocation of ParseModel.Make creates its own cache reference
that is specific to that parser), rather than once globally.
OCaml version
It seem that on opam switches using an older versions of the OCaml compiler, there ends up being a conflict between zarith 1.13 (needed for lib/SVEScalar.ml) and zarith_stubs_js (needed by herd-www). I think we should at least add a note to the README that the project will only build on newer compilers (I think it has to be at least >=4.14.0 judging from here).
Cat links in the web UI
Maybe I'm making some mistakes in the build process, but I couldn't quite get the links to the cat files to work.
I'm building and running with:
cd herd-www
make
cd www
python3 -m http.server
On localhost:8000, I can use the dropdown menu to select things like ArmARM-M.a/aarch64.cat, however when I click on the "Show me" button I get a link to http://localhost:8000/weblib/aarch64.cat.html no matter what version/snapshot of the model I select. I would assume the html page to be different based on the selected snapshot.
Perhaps related, but if I list the contents of weblib, I get a flat list of files where names like aarch64.cat.html or aarch64hwreqs.cat.html appear only once. I was instead expecting to see different subdirectories, one per model version, each with its own copy of aarch64hwreqs, aarch64deps, etc..
|
|
||
| let _ = | ||
| let mr = ref StringMap.empty in | ||
| let k0,do_rec = |
There was a problem hiding this comment.
I get this is just a script, but I find that this argument parsing block could
be a bit easier to follow. If we are ok with setting environment directories
with something like a repeated -env name=dir option, then that block could be
replaced by a, IMO, simpler and more declarative Arg-based list:
let opts =
[
("-rec", Arg.Unit (fun () -> do_rec := true), "recurse");
("-norec", Arg.Unit (fun () -> do_rec := false), "no recurse");
("-env", Arg.String (fun s ->
match String.split_on_char '=' s with
| [name; dir] -> envs := (name, dir) :: !envs
| _ -> Printf.eprintf), "...")
]
let () = Arg.parse opts (fun dir -> dirs := dir :: !dirs) "..."There was a problem hiding this comment.
It was a tradeoff between clarity in the script option parsing, or in the Makefile that invokes the script. Since the cat environments are stored as a list in a Makefile variable, it would have been a bit more indirect to pass them using repeated -env name=dir options. What do you think the best tradeoff is?
There was a problem hiding this comment.
There is also a question of consistency with other tools herd7, diy*7 etc. Those tools use the style @fsestini mentioned. If you can align up with those tool, it will be helpful.
There was a problem hiding this comment.
@TiberiuBucur We can still let ENV_PAIRS be a list, in the Makefile. We'd be creating a list of -env arguments instead. The change would amount to changing the lines:
ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),$(notdir $(d)) $(d))
ocaml ./generate_includes.ml -norec -envs $(ENV_PAIRS) -- $(CATINCLUDES) > $@
to be instead something like:
ENV_PAIRS := $(foreach d,$(ARM_ARM_RELEASES),-env $(notdir $(d))=$(d))
ocaml ./generate_includes.ml -norec $(ENV_PAIRS) $(CATINCLUDES) > $@
Personally I think the latter version is equally clear, but YMMV.
In light of our latest meeting, however, I'd say we can probably park this for now until the PR stabilises. The reason being that since we discussed potentially revisiting the webapp and build process to only use files from the weblib, we might end up getting rid of some of these scripts anyway.
There was a problem hiding this comment.
I made the changes as you suggested. We can revisit this in the future if we decide to only use /weblib for loading any sub-model.
| j = json.dumps({ | ||
| 'record' : m.record, | ||
| 'cats' : m.cats, | ||
| 'cats' : base_cats + extra_cats, |
There was a problem hiding this comment.
Can you please help me understand what these extra_cats are about? I was under the impression that the web UI would just select the whole cat library from one of the snapshot dirs in herd/libdir/aarch64, so I don't see why we would need to fetch cat files from other places.
There was a problem hiding this comment.
This is about the available top-level cats that are present in the dropdown for each catalogue.
If you look inside herd-www/catalogue/aarch64 (or any other book), you will see a shelf.py script which defines a bunch of cat files:
cats = [
"cats/aarch64.cat",
"cats/aarch64-v08.cat",
"cats/aarch64-v07.cat",
"cats/aarch64-v06.cat",
"cats/aarch64-v05.cat",
"cats/aarch64-v04.cat",
"cats/aarch64-v03.cat",
"cats/aarch64-v02.cat",
"cats/aarch64-v01.cat",
"cats/aarch64-v00.cat",
"cats/sc.cat",
]
These were all the options displayed in the dropdown before, and are copied from the catalogue directory in the root of the repo.
The extra cats are just the Arm ARM releases that I copied under herd-www/catalogue/$(book). Ultimately they need to be included in shelf.json (which is created by running catalogue_to_json.py) too, so that JavaScript picks them up and displays them in the dropdown.
In the future they could be included in the catalogue under the root of the repo too, which would make this logic redundant (since the Makefile copies whatever is in there). But I didn't want to make such a change without consulting with everyone.
|
@fsestini FYI
That was my first idea, but since that required changing the parameter of the functor, it meant code in many more places (including
That is contrary to the intended behaviour developed by Luc, as I understand it. Since |
|
@TiberiuBucur @relokin I had a second look at As @TiberiuBucur mentioned, Then if we run this we get something like: (A similar result is printed if we use the What the logs seem to suggest is that the file being read from disk is indeed resolved using the libdir, via This PR proposes one possible solution: introducing extra configuration to selectively disable |
| try Hashtbl.find t fname | ||
| with Not_found -> | ||
| let key = fname in |
There was a problem hiding this comment.
Aaaah, I see what the issue is!
The Hashtbl lookup was performed with the pure file name. That is, before path resolution through libfind was performed. The solution is just to add let fname = O.libfind fname before the lookup and it works 😄
FYI @fsestini @ShaleXIONG
There was a problem hiding this comment.
I think you would also at least need adjust the cache insertion (Hashtbl.add t key (fname,r)) to use a libfind-qualified key. Otherwise, the cache is never used.
Before, the hashtbl used to only contain as a key the name of the file to be looked up. This causes issues when multiple libdirs with conflicting file names are used across the same run of herd. This eliminates this issues by adapting the keys of this hashtbl so that they contain the full path of the file, thus avoiding conflicts.
a07760c to
bce9aa6
Compare
That's exactly what I was saying in my comment above. We discovered it individually at the same time :)) |
Thanks for clarifying. That makes sense, and I can see why minimising the surface area of the change was a priority. I’d just be a bit cautious about equating “less intrusive” purely with the size or spread of the patch. For instance, I personally tend to view a larger amount of straightforward, easy-to-review code as less intrusive (in a maintenance sense) than a very small change that introduces behaviour which is more subtle or harder to reason about. That said, your more recent patch to
I can see how a cache-flushing option would help in this scenario. My concern is that it may only be a relatively small improvement over the current global In particular, code structured along the lines of: would rely on fairly strong assumptions about execution order. In a concurrent setting, you’d need to be careful that no other task runs in between and repopulates the global cache before Even if |
herd-www/Makefile
Outdated
| @@ -1,13 +1,16 @@ | |||
| CAT2HTML7=$(if $(shell which cat2html7), cat2html7, ../_build/default/tools/cat2html.exe) | |||
| BOOKS=aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults bpf x86 linux | |||
| AARCH64_BOOKS= aarch64 aarch64-ifetch aarch64-mixed aarch64-MTE aarch64-MTE-mixed aarch64-VMSA aarch64-ETS2 aarch64-faults | |||
There was a problem hiding this comment.
The white spaces here are a bit strange, some have two some have one. Can you also unify it, since you are changing this line of code.
There was a problem hiding this comment.
Oh, I assumed they were there on purpose, to signify some level of separation between the vanilla aarch64, the extensions, and other non-arm architectures.
bce9aa6 to
120c511
Compare
@fsestini @ShaleXIONG I fixed this. Now we're copying the ArmARM releases dependencies into |
120c511 to
f59e35f
Compare
This change adds the
aarch64.catfile in every Arm ARM release (so far L.b and M.a) as options in the dropdown menu for all the relevant books in the herd-www UI. In turn, upon selecting this file,jerd.mlincludes the appropriate directory of dependencies to use for the selected cat model (in the case ofArmARM-L.b/aarch64.cat, it will beherd/libdir/aarch64/ArmARM-L.b).In order to make this work, we also needed to add the mechanism to disable caching of cat file contents inside the
ParseModelmodule. This is because, in the browser, herd is invoked from the Ocaml compiled to JavaScript through the use ofJs_of_ocaml, rather than from a CLI (which obviously does not exist in a browser). Therefore, upon reading different versions of the same file (with the same name), herd looks up the file name in an in-memory cache, and loads that version, if it has seen it already. This becomes an issue when switching between cat files that require different dependencies.