@@ -8,6 +8,8 @@ This developer guide includes more complex interactions like contributing
88registry entries and building containers. If you haven't read :ref: `getting_started-installation `
99you should do that first.
1010
11+ .. _getting_started-developer-environment :
12+
1113
1214Environment
1315===========
@@ -34,6 +36,7 @@ You can also install as a hook:
3436
3537 $ pre-commit install
3638
39+ .. _getting_started-developer-commands :
3740
3841
3942Developer Commands
@@ -65,6 +68,7 @@ And you could easily pipe this to a file. Here is how we generate this programma
6568 shpc docgen --registry ../shpc-registry --registry-url https://github.com/singularityhub/shpc-registry $module > "_library/${name}.md"
6669 done
6770
71+ .. _getting_started-creating-filesystem-registry :
6872
6973Creating a FileSystem Registry
7074==============================
@@ -145,6 +149,9 @@ It's reasonable that you can store your recipes alongside these files, in the ``
145149folder. If you see a conflict and want to request allowing for a custom install path
146150for recipes, please open an issue.
147151
152+ .. _getting_started-creating-remote-registry :
153+
154+
148155Creating a Remote Registry
149156==========================
150157
@@ -239,6 +246,9 @@ a separate directory based on version.
239246
240247 So different versions could exist alongside one another.
241248
249+ .. _getting_started-development-registry-yaml-files :
250+
251+
242252Registry Yaml Files
243253===================
244254
@@ -841,6 +851,8 @@ or an admin - it all comes down to who has permission to write to the modules
841851and containers folder, and of course use it.
842852
843853
854+ .. _getting_started-development-github-action :
855+
844856GitHub Action
845857=============
846858
@@ -904,4 +916,252 @@ The reason we allow this additional listing is because the cache often misses be
904916to extract a listing of aliases for some container, and we still wait to add it to the registry
905917(albeit without aliases).
906918
907- We will have a full developer tutorial coming soon - stay tuned!
919+
920+ Developer Tutorial
921+ ==================
922+
923+ This is currently a small tutorial that will include some of the lessons above and
924+ show you how to:
925+
926+ 1. Create a new remote registry on GitHub with automated updates
927+ 2. Create a new container executable cache
928+ 3. Automate updates of the cache to your registry
929+
930+ Prepare a Remote Registry
931+ -------------------------
932+
933+ To start, `create a new repository <https://docs.github.com/en/get-started/quickstart/create-a-repo >`_
934+ and follow the instructions in :ref: `getting_started-creating-remote-registry ` to
935+ create a remote registry. We will briefly show you the most basic clone and adding
936+ a few entries to it here.
937+
938+ .. code-block :: console
939+
940+ # Clone the shpc-registry as a template
941+ $ git clone https://github.com/singularityhub/shpc-registry /tmp/my-registry
942+ $ cd /tmp/my-registry
943+
944+ The easiest way to delete the entries (to make way for your own) is to use shpc itself!
945+ Here is how we can use ``shpc show `` to remove the entries. First, make sure that
946+ shpc is installed (:ref: `getting_started-installation `) and ensure your registry
947+ is the only one in the config registry section. You can use ``shpc config edit ``
948+ to quickly see it. It should look like this:
949+
950+ .. code-block :: yaml
951+
952+ # Please preserve the flat list format for the yaml loader
953+ registry : [/tmp/my-registry]
954+
955+ Do a sanity check to make sure your active config is the one you think it is:
956+
957+ .. code-block :: console
958+
959+ $ shpc config get registry
960+ registry ['/tmp/my-registry']
961+
962+ Next, you can use ``shpc remove `` to remove all registry entries, and we
963+ recommend deleting quay.io first since most entries live there and it will
964+ speed up the subsequent operation.
965+
966+ .. code-block :: console
967+
968+ $ rm -rf quay.io/biocontainers
969+ $ shpc remove # answer yes to confirmation
970+
971+ Save your changes.
972+
973+ .. code-block :: console
974+
975+ $ git commit -a -s -m 'emptying template registry'
976+
977+ After this you will have only a skeleton set of files, and most importantly,
978+ the .github directory with automation workflows. Feel free to remove or edit files
979+ such as the ``FUNDING.yml `` and ``ISSUE_TEMPLATE ``. Next, fetch to get GitHub pages.
980+
981+ .. code-block :: console
982+
983+ $ git fetch
984+
985+ At this point you can edit the ``.git/config `` to be your new remote.
986+
987+ .. code-block :: console
988+
989+ # Update the remote to be your new repository
990+ vim .git/config
991+
992+ You should only do this after you've fetched, as you will no longer be connected to the original
993+ remote! Now that you've changed the remote and commit, push your changes and then push to your main branch. We do this
994+ push before gh-pages so "main" becomes the primary branch.
995+
996+ $ git push origin main
997+
998+ Then you can checkout the gh-pages branch to do the same cleanup and push.
999+
1000+ .. code-block :: console
1001+
1002+ $ git checkout gh-pages
1003+
1004+ This cleanup is easier - just delete the markdown files in ``_library ``.
1005+
1006+ .. code-block :: console
1007+
1008+ $ rm -rf _library/*.md
1009+
1010+ And then commit and push to gh-pages.
1011+
1012+ .. code-block :: console
1013+
1014+ $ git commit -a -s -m 'emptying template registry gh-pages'
1015+ $ git push origin gh-pages
1016+
1017+
1018+ Manually Add Registry Entries
1019+ -----------------------------
1020+
1021+ Great! Now you have an empty registry on your filesystem that will serve as a remote.
1022+ Make sure you are back on the main branch:
1023+
1024+ .. code-block :: console
1025+
1026+ $ git checkout main
1027+
1028+ While it's possible to manually add entries (e.g., ``shpc add docker://python ``)
1029+ this will miss out on aliases. Instead, navigate to your GitHub repository
1030+ and try running the ``Actions --> Generate New Container --> Run Workflow `` and
1031+ enter your container name (with tag), and a URL and description. This will
1032+ run a workflow to derive aliases and open a pull request to your repository (make
1033+ sure in your repository settings you allow actions to open pull requests).
1034+
1035+ Remember that any container, once it goes into the registry, will have tags
1036+ and digests automatically updated via the "Update Containers" action workflow.
1037+
1038+ Creating a Cache
1039+ ----------------
1040+
1041+ Instead of manually adding entries, let's create an automated way to populate
1042+ entries from a cache. You can read more about the algorithm we use to derive aliases
1043+ in the `shpc-registry-cache <https://github.com/singularityhub/shpc-registry-cache >`_
1044+ repository, along with cache generation details. You will primarily need two things:
1045+
1046+ 1. A text listing of containers to add to the cache, ideally automatically generated
1047+ 2. A workflow that uses it to update your cache.
1048+
1049+ Both of these files should be in a GitHub repository that you create. E.g.,:
1050+
1051+ .. code-block :: console
1052+
1053+ containers.txt
1054+ .github/
1055+ └── workflows
1056+ └── update-cache.yaml
1057+
1058+ For the main shpc registry cache linked above, we derive a list of biocontainers.txt
1059+ on the fly from the current depot listing. You might do the same for a collection of
1060+ interest, or just to try it out, create a small listing of your own containers
1061+ in a ``containers.txt `` e.g.,:
1062+
1063+ .. code-block :: console
1064+
1065+ python
1066+ rocker/r-ver
1067+ julia
1068+
1069+ You can find further dummy examples in the `container-executable-discovery <https://github.com/singularityhub/container-executable-discovery/ >`_
1070+ repository along with variables that the action accepts. As an example of our
1071+ small text file above, we might have:
1072+
1073+ .. code-block :: yaml
1074+
1075+ name : Update Cache
1076+
1077+ on :
1078+ workflow_dispatch :
1079+ schedule :
1080+ # Weekly, monday and thursday
1081+ - cron : 0 0 * * 1,4
1082+
1083+ jobs :
1084+ update-cache :
1085+ runs-on : ubuntu-latest
1086+ steps :
1087+ - name : Checkout
1088+ uses : actions/checkout@v3
1089+
1090+ - name : Update Cache Action
1091+ uses : singularityhub/container-executable-discovery@main
1092+ with :
1093+ token : ${{ secrets.GITHUB_TOKEN }}
1094+ repo-letter-prefix : true
1095+ listing : ./containers.txt
1096+ dry_run : ${{ github.event_name == 'pull_request' }}
1097+
1098+
1099+ And this would use out containers.txt listing to populate the cache in the repository
1100+ we've created. Keep in mind that caches are useful beyond Singularity Registry HPC -
1101+ knowing the paths and executables within a container is useful for other applied and
1102+ research projects too!
1103+
1104+
1105+ Updating a Registry from a Cache
1106+ --------------------------------
1107+
1108+ Once you have a cache, it's fairly easy to use another action provided by shpc
1109+ directly from it. This is the :ref: `getting_started-development-github-action ` mentioned
1110+ above. The full example provided there does two things:
1111+
1112+ 1. Updates your registry from the cache entries
1113+ 2. Derives an additional listing to add containers that were missed in the cache.
1114+
1115+ And you will want to put the workflow alongside your newly created registry.
1116+ The reason for the second point is that there are reasons we are unable to extract
1117+ container binaries to the filesystem. In the case of any kind of failure, we might
1118+ not have an entry in the cache, however we still want to add it to our registry!
1119+ With the addition of the ``listing `` variable and the step to derive the listing
1120+ of BioContainers in the example above, we are still able to add these missing
1121+ containers, albeit without aliases. Here is an example just updating
1122+ from the cache (no extra listing):
1123+
1124+
1125+ .. code-block :: yaml
1126+
1127+ name : Update BioContainers
1128+
1129+ on :
1130+ pull_request : []
1131+ schedule :
1132+ - cron : 0 0 1 * *
1133+
1134+ jobs :
1135+ auto-scan :
1136+ runs-on : ubuntu-latest
1137+ steps :
1138+ - name : Checkout
1139+ uses : actions/checkout@v3
1140+
1141+ # registry defaults to PWD, branch defaults to main
1142+ - name : Update Containers
1143+ uses : singularityhub/singularity-hpc/actions/cache-update@main
1144+ with :
1145+ token : ${{ secrets.GITHUB_TOKEN }}
1146+ # Change this to your cache path
1147+ cache : https://github.com/singularityhub/shpc-registry-cache
1148+ min-count-inclusion : 10
1149+ max-count-inclusion : 1000
1150+ additional-count-inclusion : 25
1151+ # Defaults to shpc docs, this gets formatted to include the entry_name
1152+ url_format_string : " https://biocontainers.pro/tools/%s"
1153+ pull_request : " ${{ github.event_name != 'pull_request' }}"
1154+
1155+
1156+ The url format string expects a container identifier somewhere, and feel free
1157+ to link to your registry base if you are unable to do this. You will want to change
1158+ the ``cache `` to be your remove cache repository, and then adjust the parameters to
1159+ your liking:
1160+
1161+ - **min-count-inclusion **: is the threshold count by which under we include ALL aliases. A rare alias is likely to appear fewer times across all containers.
1162+ - **additional-count-inclusion **: an additional number of containers to add after the initial set under ``min-count-inclusion `` is added (defaults to 25)
1163+ - **max-count-inclusion **: don't add counts over this threshold (set to 1000 for biocontainers).
1164+
1165+ Since the cache will generate a global counts.json and skips.json, this means the size of your cache
1166+ can influence the aliases chosen. It's recommended to create your entire cache first and then to
1167+ add it to your registry to update.
0 commit comments