Skip to content

Commit 42ac0fb

Browse files
authored
add tutorial for creating registry, cache, and using it (#618)
* add tutorial for creating registry, cache, and using it Signed-off-by: vsoch <[email protected]>
1 parent 7b49fa6 commit 42ac0fb

File tree

12 files changed

+408
-14
lines changed

12 files changed

+408
-14
lines changed

.circleci/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
test_singularity_hpc:
111111
working_directory: ~/repo
112112
machine:
113-
image: ubuntu-2004:202008-01
113+
image: ubuntu-2004:2022.10.1
114114
steps:
115115
- run: *exit_early
116116
- checkout

.github/workflows/test-container.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ jobs:
6868
if: ${{ env.keepgoing == 'true' }}
6969
name: Install Singularity
7070
with:
71-
singularity-version: 3.6.4
71+
singularity-version: 3.7.1
7272

7373
- name: Create conda environment
7474
if: ${{ env.keepgoing == 'true' }}

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
if: ${{ matrix.container_tech == 'singularity' }}
4545
name: Install Singularity
4646
with:
47-
singularity-version: 3.6.4
47+
singularity-version: 3.7.1
4848

4949
- name: Create conda environment
5050
run: conda create --quiet -c conda-forge --name shpc spython

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ The versions coincide with releases on pip. Only major versions will be released
1515

1616
## [0.0.x](https://github.com/singularityhub/singularity-hpc/tree/main) (0.0.x)
1717
- GitHub action to update a registry from a cache or listing (0.1.17)
18+
- Support for "remove" command to more easily remove / uninstall entries
1819
- Fix bugs uninstalling all tags of a module (0.1.16)
1920
- support for install using registry recipe and local image (0.1.15)
2021
- fix views .view_module modulefile and loading (0.1.14)

docs/getting_started/developer-guide.rst

Lines changed: 261 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ This developer guide includes more complex interactions like contributing
88
registry entries and building containers. If you haven't read :ref:`getting_started-installation`
99
you should do that first.
1010

11+
.. _getting_started-developer-environment:
12+
1113

1214
Environment
1315
===========
@@ -34,6 +36,7 @@ You can also install as a hook:
3436
3537
$ pre-commit install
3638
39+
.. _getting_started-developer-commands:
3740

3841

3942
Developer Commands
@@ -65,6 +68,7 @@ And you could easily pipe this to a file. Here is how we generate this programma
6568
shpc docgen --registry ../shpc-registry --registry-url https://github.com/singularityhub/shpc-registry $module > "_library/${name}.md"
6669
done
6770
71+
.. _getting_started-creating-filesystem-registry:
6872

6973
Creating a FileSystem Registry
7074
==============================
@@ -145,6 +149,9 @@ It's reasonable that you can store your recipes alongside these files, in the ``
145149
folder. If you see a conflict and want to request allowing for a custom install path
146150
for recipes, please open an issue.
147151

152+
.. _getting_started-creating-remote-registry:
153+
154+
148155
Creating a Remote Registry
149156
==========================
150157

@@ -239,6 +246,9 @@ a separate directory based on version.
239246
240247
So different versions could exist alongside one another.
241248

249+
.. _getting_started-development-registry-yaml-files:
250+
251+
242252
Registry Yaml Files
243253
===================
244254

@@ -841,6 +851,8 @@ or an admin - it all comes down to who has permission to write to the modules
841851
and containers folder, and of course use it.
842852

843853

854+
.. _getting_started-development-github-action:
855+
844856
GitHub Action
845857
=============
846858

@@ -904,4 +916,252 @@ The reason we allow this additional listing is because the cache often misses be
904916
to extract a listing of aliases for some container, and we still wait to add it to the registry
905917
(albeit without aliases).
906918

907-
We will have a full developer tutorial coming soon - stay tuned!
919+
920+
Developer Tutorial
921+
==================
922+
923+
This is currently a small tutorial that will include some of the lessons above and
924+
show you how to:
925+
926+
1. Create a new remote registry on GitHub with automated updates
927+
2. Create a new container executable cache
928+
3. Automate updates of the cache to your registry
929+
930+
Prepare a Remote Registry
931+
-------------------------
932+
933+
To start, `create a new repository <https://docs.github.com/en/get-started/quickstart/create-a-repo>`_
934+
and follow the instructions in :ref:`getting_started-creating-remote-registry` to
935+
create a remote registry. We will briefly show you the most basic clone and adding
936+
a few entries to it here.
937+
938+
.. code-block:: console
939+
940+
# Clone the shpc-registry as a template
941+
$ git clone https://github.com/singularityhub/shpc-registry /tmp/my-registry
942+
$ cd /tmp/my-registry
943+
944+
The easiest way to delete the entries (to make way for your own) is to use shpc itself!
945+
Here is how we can use ``shpc show`` to remove the entries. First, make sure that
946+
shpc is installed (:ref:`getting_started-installation`) and ensure your registry
947+
is the only one in the config registry section. You can use ``shpc config edit``
948+
to quickly see it. It should look like this:
949+
950+
.. code-block:: yaml
951+
952+
# Please preserve the flat list format for the yaml loader
953+
registry: [/tmp/my-registry]
954+
955+
Do a sanity check to make sure your active config is the one you think it is:
956+
957+
.. code-block:: console
958+
959+
$ shpc config get registry
960+
registry ['/tmp/my-registry']
961+
962+
Next, you can use ``shpc remove`` to remove all registry entries, and we
963+
recommend deleting quay.io first since most entries live there and it will
964+
speed up the subsequent operation.
965+
966+
.. code-block:: console
967+
968+
$ rm -rf quay.io/biocontainers
969+
$ shpc remove # answer yes to confirmation
970+
971+
Save your changes.
972+
973+
.. code-block:: console
974+
975+
$ git commit -a -s -m 'emptying template registry'
976+
977+
After this you will have only a skeleton set of files, and most importantly,
978+
the .github directory with automation workflows. Feel free to remove or edit files
979+
such as the ``FUNDING.yml`` and ``ISSUE_TEMPLATE``. Next, fetch to get GitHub pages.
980+
981+
.. code-block:: console
982+
983+
$ git fetch
984+
985+
At this point you can edit the ``.git/config`` to be your new remote.
986+
987+
.. code-block:: console
988+
989+
# Update the remote to be your new repository
990+
vim .git/config
991+
992+
You should only do this after you've fetched, as you will no longer be connected to the original
993+
remote! Now that you've changed the remote and commit, push your changes and then push to your main branch. We do this
994+
push before gh-pages so "main" becomes the primary branch.
995+
996+
$ git push origin main
997+
998+
Then you can checkout the gh-pages branch to do the same cleanup and push.
999+
1000+
.. code-block:: console
1001+
1002+
$ git checkout gh-pages
1003+
1004+
This cleanup is easier - just delete the markdown files in ``_library``.
1005+
1006+
.. code-block:: console
1007+
1008+
$ rm -rf _library/*.md
1009+
1010+
And then commit and push to gh-pages.
1011+
1012+
.. code-block:: console
1013+
1014+
$ git commit -a -s -m 'emptying template registry gh-pages'
1015+
$ git push origin gh-pages
1016+
1017+
1018+
Manually Add Registry Entries
1019+
-----------------------------
1020+
1021+
Great! Now you have an empty registry on your filesystem that will serve as a remote.
1022+
Make sure you are back on the main branch:
1023+
1024+
.. code-block:: console
1025+
1026+
$ git checkout main
1027+
1028+
While it's possible to manually add entries (e.g., ``shpc add docker://python``)
1029+
this will miss out on aliases. Instead, navigate to your GitHub repository
1030+
and try running the ``Actions --> Generate New Container --> Run Workflow`` and
1031+
enter your container name (with tag), and a URL and description. This will
1032+
run a workflow to derive aliases and open a pull request to your repository (make
1033+
sure in your repository settings you allow actions to open pull requests).
1034+
1035+
Remember that any container, once it goes into the registry, will have tags
1036+
and digests automatically updated via the "Update Containers" action workflow.
1037+
1038+
Creating a Cache
1039+
----------------
1040+
1041+
Instead of manually adding entries, let's create an automated way to populate
1042+
entries from a cache. You can read more about the algorithm we use to derive aliases
1043+
in the `shpc-registry-cache <https://github.com/singularityhub/shpc-registry-cache>`_
1044+
repository, along with cache generation details. You will primarily need two things:
1045+
1046+
1. A text listing of containers to add to the cache, ideally automatically generated
1047+
2. A workflow that uses it to update your cache.
1048+
1049+
Both of these files should be in a GitHub repository that you create. E.g.,:
1050+
1051+
.. code-block:: console
1052+
1053+
containers.txt
1054+
.github/
1055+
└── workflows
1056+
└── update-cache.yaml
1057+
1058+
For the main shpc registry cache linked above, we derive a list of biocontainers.txt
1059+
on the fly from the current depot listing. You might do the same for a collection of
1060+
interest, or just to try it out, create a small listing of your own containers
1061+
in a ``containers.txt`` e.g.,:
1062+
1063+
.. code-block:: console
1064+
1065+
python
1066+
rocker/r-ver
1067+
julia
1068+
1069+
You can find further dummy examples in the `container-executable-discovery <https://github.com/singularityhub/container-executable-discovery/>`_
1070+
repository along with variables that the action accepts. As an example of our
1071+
small text file above, we might have:
1072+
1073+
.. code-block:: yaml
1074+
1075+
name: Update Cache
1076+
1077+
on:
1078+
workflow_dispatch:
1079+
schedule:
1080+
# Weekly, monday and thursday
1081+
- cron: 0 0 * * 1,4
1082+
1083+
jobs:
1084+
update-cache:
1085+
runs-on: ubuntu-latest
1086+
steps:
1087+
- name: Checkout
1088+
uses: actions/checkout@v3
1089+
1090+
- name: Update Cache Action
1091+
uses: singularityhub/container-executable-discovery@main
1092+
with:
1093+
token: ${{ secrets.GITHUB_TOKEN }}
1094+
repo-letter-prefix: true
1095+
listing: ./containers.txt
1096+
dry_run: ${{ github.event_name == 'pull_request' }}
1097+
1098+
1099+
And this would use out containers.txt listing to populate the cache in the repository
1100+
we've created. Keep in mind that caches are useful beyond Singularity Registry HPC -
1101+
knowing the paths and executables within a container is useful for other applied and
1102+
research projects too!
1103+
1104+
1105+
Updating a Registry from a Cache
1106+
--------------------------------
1107+
1108+
Once you have a cache, it's fairly easy to use another action provided by shpc
1109+
directly from it. This is the :ref:`getting_started-development-github-action` mentioned
1110+
above. The full example provided there does two things:
1111+
1112+
1. Updates your registry from the cache entries
1113+
2. Derives an additional listing to add containers that were missed in the cache.
1114+
1115+
And you will want to put the workflow alongside your newly created registry.
1116+
The reason for the second point is that there are reasons we are unable to extract
1117+
container binaries to the filesystem. In the case of any kind of failure, we might
1118+
not have an entry in the cache, however we still want to add it to our registry!
1119+
With the addition of the ``listing`` variable and the step to derive the listing
1120+
of BioContainers in the example above, we are still able to add these missing
1121+
containers, albeit without aliases. Here is an example just updating
1122+
from the cache (no extra listing):
1123+
1124+
1125+
.. code-block:: yaml
1126+
1127+
name: Update BioContainers
1128+
1129+
on:
1130+
pull_request: []
1131+
schedule:
1132+
- cron: 0 0 1 * *
1133+
1134+
jobs:
1135+
auto-scan:
1136+
runs-on: ubuntu-latest
1137+
steps:
1138+
- name: Checkout
1139+
uses: actions/checkout@v3
1140+
1141+
# registry defaults to PWD, branch defaults to main
1142+
- name: Update Containers
1143+
uses: singularityhub/singularity-hpc/actions/cache-update@main
1144+
with:
1145+
token: ${{ secrets.GITHUB_TOKEN }}
1146+
# Change this to your cache path
1147+
cache: https://github.com/singularityhub/shpc-registry-cache
1148+
min-count-inclusion: 10
1149+
max-count-inclusion: 1000
1150+
additional-count-inclusion: 25
1151+
# Defaults to shpc docs, this gets formatted to include the entry_name
1152+
url_format_string: "https://biocontainers.pro/tools/%s"
1153+
pull_request: "${{ github.event_name != 'pull_request' }}"
1154+
1155+
1156+
The url format string expects a container identifier somewhere, and feel free
1157+
to link to your registry base if you are unable to do this. You will want to change
1158+
the ``cache`` to be your remove cache repository, and then adjust the parameters to
1159+
your liking:
1160+
1161+
- **min-count-inclusion**: is the threshold count by which under we include ALL aliases. A rare alias is likely to appear fewer times across all containers.
1162+
- **additional-count-inclusion**: an additional number of containers to add after the initial set under ``min-count-inclusion`` is added (defaults to 25)
1163+
- **max-count-inclusion**: don't add counts over this threshold (set to 1000 for biocontainers).
1164+
1165+
Since the cache will generate a global counts.json and skips.json, this means the size of your cache
1166+
can influence the aliases chosen. It's recommended to create your entire cache first and then to
1167+
add it to your registry to update.

docs/getting_started/installation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ you use pip or another setup approach, and to install a `known release <https://
2020

2121
.. code:: console
2222
23-
# Install release 0.0.24
24-
$ git clone -b 0.0.24 [email protected]:singularityhub/singularity-hpc
23+
# Install release ${RELEASE}
24+
$ git clone -b ${RELEASE} [email protected]:singularityhub/singularity-hpc
2525
$ cd singularity-hpc
2626
$ pip install -e .[all]
2727

docs/getting_started/user-guide.rst

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1263,6 +1263,7 @@ Namespaces currently work with:
12631263
- uninstall
12641264
- show
12651265
- add
1266+
- remove
12661267
- check
12671268

12681269

@@ -1890,6 +1891,34 @@ And that's it! The container module will use the same namespace, ``vanessa/pokem
18901891
and we do this purposefully as a design decision. Note that ``add`` previously would add the container directly to the module
18911892
directory, and as of version 0.0.49 it's been updated to generate the container.yaml first.
18921893

1894+
.. _getting_started-commands-remove:
1895+
1896+
Remove
1897+
------
1898+
1899+
As of version ``0.1.17`` you can easily remove a container.yaml entry too!
1900+
This remove command takes a pattern, and not providing one will remove all entries
1901+
from the registry (useful if you want to create a new one but preserve the automation).
1902+
Here is how to remove a specific namespace of container yamls:
1903+
1904+
.. code-block:: console
1905+
1906+
$ shpc remove quay.io/biocontainers
1907+
Searching for container.yaml matching quay.io/biocontainers to remove...
1908+
Are you sure you want to remove 8367 container.yaml recipes? (yes/no)?
1909+
1910+
1911+
To remove all modules:
1912+
1913+
.. code-block:: console
1914+
1915+
$ shpc remove
1916+
Searching for container yaml to remove...
1917+
Are you sure you want to remove 264 container.yaml recipes? (yes/no)? yes
1918+
Removal complete!
1919+
1920+
This command can be useful if you want to start with a populated registry
1921+
as a template for your own registry.
18931922

18941923
Get
18951924
---

0 commit comments

Comments
 (0)