Conversation
|
Related to http://daccs-jenkins.crim.ca/job/DACCS-iac-birdhouse/1293/ |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1293/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-67.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1295/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-36.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
@fmigneault Is it related to recent CanarieAPI changes? |
birdhouse/pavics-compose.sh
Outdated
| for extra_compose in "$adir"/docker-compose-extra-*.yml; do | ||
| service_name=$(basename "$extra_compose") | ||
| service_name=${service_name#docker-compose-extra-} | ||
| service_name=${service_name#.yml} | ||
| if echo "$CONFIGURED_COMPONENTS" | grep " $service_name "; then | ||
| COMPOSE_CONF_LIST="${COMPOSE_CONF_LIST} -f $extra_compose" | ||
| fi |
There was a problem hiding this comment.
I'd like to propose using the same directory-based strategy as used for component configs instead of using derived components parts in file names of docker-compose extra fragments.
Assuming let's say that enabled components are [magpie, twitcher, weaver, finch].
Instead of looking for the following:
birdhouse/components/finch/docker-compose-extra.yml
birdhouse/components/finch/docker-compose-extra-magpie.yml
birdhouse/components/weaver/docker-compose-extra.yml
birdhouse/components/weaver/docker-compose-extra-finch.yml
birdhouse/components/weaver/docker-compose-extra-magpie.yml
The script would look for:
birdhouse/components/finch/config/finch/docker-compose-extra.yml
birdhouse/components/finch/config/magpie/docker-compose-extra.yml
birdhouse/components/weaver/config/weaver/docker-compose-extra.yml
birdhouse/components/weaver/config/finch/docker-compose-extra.yml
birdhouse/components/weaver/config/magpie/docker-compose-extra.yml
The reason why I want to make this proposition is because:
- It is easier to know which part of the path corresponds to which component activation logic when parsing.
- It is easier mentally to always think with directories, not having to switch between dir-based/named-based method since other files in configs, nginx, etc. are using this method.
- Multiple components (eg:
weaver,cowbird) already have nestedconfig/<dependency>directories for other configurations they augment. Moving thedocker-compose-extra.ymlin those directories produces fewer changes.
There was a problem hiding this comment.
Sure that makes sense. I'll change that over today.
Not sure in this case, but there is no change of CanarieAPI since #284 is not integrated. |
I don't dislike the idea of improving shared dependencies, but better roles should be defined for databases. Everything is too easily accessible/mixed together, and they should be kept separated in some critical cases, such as user logins and permissions with magpie. Technically, each service should have its own user to query the database with limited table/collections accessible. I would not separate components into "components" and "shared_dependencies". weaver:
- mongodb
- "[ finch | malleefowl | hummingbird | catagog ]":
- postgres
magpie:
- postgres
- "[ finch | malleefowl | hummingbird | catagog ]":
- postgres
|
|
@fmigneault I like the idea of using "services" and "extended-configurations" as labels. Services can then depend on other services and we don't need to make a distinction between optional and non-optional. As well as databases, we also need to think about shared volumes, specifically the What if we did the following:
At this point, I propose that we make these changes in a separate PR though. This PR is already making a ton of changes and I think we could separate these changes out. I really like the idea of having each service have an associated user that interacts with magpie to limit what it can access. I'm going to add a comment to the relevant issue so that we can make sure that we implement that. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1297/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
Same issue related to proxy <> raven |
|
@matprov I'm hoping to get a local test machine up and running today so I should be able to more easily debug the proxy <> raven issue and fix that shortly |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1298/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1311/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-88.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/986/NOTEBOOK TEST RESULTS |
|
I have a few question about the failing tests:
|
|
|
|
@fmigneault I'm actually able to see the jenkins report myself now but thanks for posting the relevant errors. Some followup questions:
Thanks! Also, let me know if it's easier to answer these questions in person and I can give you a call |
|
A few thoughts:
- Sometimes the NASA server fails;
- pyesgf was recently added to the jupyter environment;
If only those two are failing, I'm not worried you broke anything or that
anything crucial is missing.
…On Fri, Mar 3, 2023 at 9:16 AM mishaschwartz ***@***.***> wrote:
@fmigneault <https://github.com/fmigneault> I'm actually able to see the
jenkins report myself now but thanks for posting the relevant errors.
Some followup questions:
- WebMapService("https://neo.gsfc.nasa.gov/wms/wms") seems to fail
everywhere (this branch, master, pavics.ouranos.ca), are there
additional credentials I need to add manually to the tutorial notebooks to
access the wms at this URL. Is this passing or failing on your pavics
deployment at CRIM as well?
- ModuleNotFoundError: No module named 'pyesgf' indicates that there
is a jupyter kernel image that isn't being created is that correct? Can you
help me figure out where the jupyterhub service finds and loads these
kernel images? There may be a step that this PR is skipping.
- hummingbird isn't responding because the magpie permissions form
admin users for the wps birds are all empty so weaver is not allowed to
connect. Where are admin permissions set? There doesn't seem to be a
permissions.cfg file that I can find that specifies admin permissions
specifically unless I'm missing something.
Thanks! Also, let me know if it's easier to answer these questions in
person and I can give you a call
—
Reply to this email directly, view it on GitHub
<#296 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAT2Q6JUADB2E47VA6ALODW2H4KVANCNFSM6AAAAAAVDQSQQQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
David Huard, PhD
Spécialiste Scénarios et services climatiques, Ouranos
|
@mishaschwartz David was correct,
So these 2 notebooks failure can be safely ignored for now. |
|
@mishaschwartz A note about backward compat. It is okay to rename However, please add symlinks back at the old locations so existing I think CRIM automated pipeline are also activating some of those |
Are the tests supposed to be run with the |
I think yes if the pipeline is setup following this procedure: https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/README.rst#optional-prepare-instance-to-run-automated-end-to-end-test-suite Also need |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1317/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-88.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/992/NOTEBOOK TEST RESULTS |
That makes sense, but I would like to make those changes in a different PR. I think that this one is complex enough already |
Indeed.
These are the components we enable to run the E2E tests export EXTRA_CONF_DIRS="
./components/monitoring
./components/weaver
./components/cowbird
./components/stac
./optional-components/canarie-api-full-monitoring
./optional-components/all-public-access
./optional-components/testthredds
./optional-components/secure-thredds
./optional-components/secure-data-proxy
./optional-components/wps-healthchecks
./optional-components/database-external-ports
./optional-components/test-weaver
./optional-components/test-geoserver-secured-access
"I would rather avoid creating symlinks. It feels this will only complicate and pollute the structure. The |
Totally agree to delay the rename of It's okay to not do the rename as well since it's something rather disruptive and we can simply clarify the intend of But I wonder, you've created all the component support files ( Seeing how disruptive a component path rename can be, I'd rather have them all at their final destination under |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1361/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-35.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
|
I still have double I thought you fixed that. But it's not major/blocking, can be fixed in another PR. |
It should be automatically registered by each bird (fixed now). The user can set it in env.local too if they want.
Turns out I fixed it poorly. Should be properly fixed now |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1362/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-67.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1026/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1363/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-166.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1027/NOTEBOOK TEST RESULTS |
|
Jenkins finally works on my machine, not sure what happened, I just keep pulling your newer changes. I previously had Thredds, Finch, Humming misbehaving. Going to finally be able to test autodeploy now. |
|
|
||
| services: | ||
| malleefowl: | ||
| generic_bird: |
| ALREADY_EVALED='' | ||
| for i in ${DELAYED_EVAL}; do | ||
| if echo "ALREADY_EVALED" | grep -qE "^\s*$i\s*$"; then | ||
| if echo "$ALREADY_EVALED" | grep -qE "^\s*$i\s*$"; then |
There was a problem hiding this comment.
LOL, I made this same kind of mistake often too :)
There was a problem hiding this comment.
I'm a little embarrassed by that one 😞
| PROXY_SECURE_PORT=443 HOSTNAME=${PAVICS_FQDN} docker-compose ${COMPOSE_CONF_LIST} restart proxy | ||
|
|
||
| # run postgres post-startup setup script | ||
| # Note: this must run before the post-docker-compose-up scripts since some may expect postgres databases to exist |
There was a problem hiding this comment.
We still have an exception here humm. Maybe if this scenario come back often, we can have post-docker-compose-up-preparation plugin entrypoint for this kind of scenario.
tlvu
left a comment
There was a problem hiding this comment.
A few more reviews while waiting for autodeploy to trigger :)
| $WEAVER_WPS_WORKDIR | ||
| $WEAVER_MANAGER_LOG_LEVEL | ||
| $WEAVER_WORKER_LOG_LEVEL | ||
| $WEAVER_WPS_PROVIDERS |
There was a problem hiding this comment.
I forgot the reason why WEAVER_WPS_PROVIDERS is not needed in EXTRA_VARS anymore.
Each bird can add itself but that does not means WEAVER_WPS_PROVIDERS is not needed to be in EXTRA_VARS anymore.
There was a problem hiding this comment.
It's not actually used in any of the template files. Instead it is used in the post-docker-compose-up script which runs in the same environment as pavics-compose.sh so this environment variable is already present (if set by a bird or env.local).
If it isn't set (when there are no birds in the stack for example) then the post-docker-compose-up script just skips the step where it registers the wps providers for weaver.
| 'stats': { | ||
| 'method': '.*', | ||
| 'route': '/(magpie|geoserver|thredds|wpsoutputs|jupyter)/.*' | ||
| 'route': '(?!)' |
There was a problem hiding this comment.
To help the next reader, add a comment # will be the value of CANARIE_STATS_ROUTES
| 'stats': { | ||
| 'method': '.*', | ||
| 'route': '/project-api/.*' | ||
| 'route': '(?!)' |
There was a problem hiding this comment.
Maybe just revert it to the original value, since we could keep old components but simply not enable them by default. Just document in the components/README.rst they are legacy components, not maintained anymore.
| - ./config/flyingpigeon/wps.cfg:/wps.cfg | ||
| - /tmp | ||
| depends_on: | ||
| - postgres |
There was a problem hiding this comment.
This should technically be in a postgres fragment but since ./config/postgres is specified in COMPONENT_DEPENDENCIES, I think we can keep this here. Not going to create a compose fragment file just for 2 lines.
There was a problem hiding this comment.
This is odd, I wonder why this duplicate logic for creating database !
Is it possible postgres-setup.sh only execute on empty DB startup? Or vice-versa? And this duplicate logic is to ensure when DB already running and we add a new bird, its DB is also created? I gotta hunt down for that PR.
There was a problem hiding this comment.
Found that PR https://github.com/Ouranosinc/PAVICS/pull/173. I think we can drop this create-wps-databases.sql as it is only for when the DB is completely empty. postgres-setup.sh can handle incremental addition of new birds.
| # Useful to have a custom homepage. | ||
| export PROXY_ROOT_LOCATION="return 302 https://\$host/jupyter/hub/login;" | ||
|
|
||
| export INCLUDE_FOR_PORT_80='$([ x"$ALLOW_UNSECURE_HTTP" = x"True" ] && echo "include /etc/nginx/conf.d/all-services.include;" || echo "include /etc/nginx/conf.d/redirect-to-https.include;")' |
There was a problem hiding this comment.
So you like this delayed eval feature? :D
Had delayed eval not exist, probably pre-docker-compose-up might be able to do this.
birdhouse/config/proxy/default.env
Outdated
| COMPONENT_DEPENDENCIES=" | ||
| $COMPONENT_DEPENDENCIES | ||
| ./config/canarie-api | ||
| ./config/twitcher |
There was a problem hiding this comment.
Typo from previous edit? I do not think proxy needs twitcher. Both of them can function independently but they are just nicer together. It's not the same level of coupling like between twitcher and magpie.
|
|
||
| docker exec -t postgres psql --username postgres -f /docker-entrypoint-initdb.d/create-wps-databases.sql | ||
|
|
||
| docker exec -t postgres /postgres-setup.sh |
There was a problem hiding this comment.
Nice, unified both ways to initializes the various DB.
There was a problem hiding this comment.
This is really odd, this config/postgres-magpie/.gitignore file is deleted, the old files are there but git status do not report them unknown !
There was a problem hiding this comment.
I added an entry for postgres-magpie in birdhouse/config/.gitignore
tlvu
left a comment
There was a problem hiding this comment.
@mishaschwartz I am done with my Autodeploy test. All good.
I've documented what I wanted to test and all the results directly in the PR description so we can easily refer back later if some further investigations needed.
I have not tried to pick and choose yet but even if I find something that's not blocking this one. We can fix later.
Even my new comments, none of them blocking except the one about Proxy depending on Twitcher which I find quite strange, hopefully just a typo left. All the rest can be fixed later.
This PR, although 100% back-compat, deserves a minor bump (make bump minor) before merge, since the structure is drastically changed.
Again thanks a tons for your immense effort in making this much more modular. I think all the orgs deploying this stack will greatly benefit from this change.
|
@tlvu Thanks for the long review and especially catching all my mistakes! I've made the changes you suggested and bumped the version to 1.24. As soon as the tests are finished running it should be good to go. Since this is my first PR for this project.. what is the procedure for merging PRs? Do you want to do it or should I? Should I make a new tag for version 1.24 once it's merged? |
Release instruction here https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/README.rst#release-procedure You are at the step "ready to merge the PR immediately". You don't have to wait for the tests to finish, unless you suspect your last minute code change might break something. If you wanted to wait, you should have not done the Let me know if the procedure is clear enough. Else I can do the merge for you if you prefer. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1365/Result : failure BIRDHOUSE_DEPLOY_BRANCH : pluggable-components-part-2 DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-67.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1029/NOTEBOOK TEST RESULTS |
|
@mishaschwartz Congrats ! Making the grand entrance with a gigantic PR as your first PR ! :D |
|
Went live on Ouranos production at 2023-03-23 10h. List of re-created containers: The rest unchanged. |
| [1.24.0](https://github.com/bird-house/birdhouse-deploy/tree/1.24.0) (2023-03-22) | ||
| ------------------------------------------------------------------------------------------------------------------ | ||
| ## Fixes | ||
| - The default stack was not configurable. This meant that if someone wanted to deploy a |
There was a problem hiding this comment.
@mishaschwartz Note for next PR, repeat the title of the PR in CHANGES.md for a one-line summary of the change.
|
Jenkins run after go-live on Ouranos prod: only known error |
Overview
Makes all components part of the "pluggable architecture" meaning that no components are required to be configured, each components can be added to the stack as needed.
See previous discussion for this PR at #295
Changes
Non-breaking changes
The components that will be added to the stack are only those whose configuration directory is listed in either
DEFAULT_CONF_DIRSorEXTRA_CONF_DIRS.Breaking changes
None
Related Issue / Discussion
Additional Information
In a separate PR we should:
configs,components, andoptional_componentsdirectories. Now that these distinctions are no longer relevant it might make more sense to divide them as follows:components: all components (optional or otherwise; eg: finch, weaver, twitcher)shared_dependencies: docker containers required to run multiple components (eg: postgres, mongodb)component_enhancements: additional configurations that modify an existing component (eg: all-public-access, database-external-ports)Tests
masterbranch pass Jenkins to prepare for upgrade testmasterdo not know about, which would break autodeploy delete-ignore.sh.txt./pavics-compose.sh up -dmaster-up.txt./pavics-compose.sh psnot all container healthy master-ps.txt./pavics-compose.sh up -dfor all services responding so Weaver can add provider master2-up.txt./pavics-compose.sh psready to run Jenkins master-ps-2.txtesgf-dap.ipynbandWMS_example.ipynbbaseline-master-PAVICS-e2e-workflow-tests-current-production-version-114-consoleText.txt./pavics-compose.sh up -dautodeploy.log./pavics-compose.sh psafter autodeploy new-ps-after-autodeploy.txtesgf-dap.ipynbandWMS_example.ipynbnew-branch-PAVICS-e2e-workflow-tests-current-production-version-115-consoleText.txtmaster-ps-2.txtandnew-ps-after-autodeploy.txt, same number of containers so nothing missing (default enabled components are good).gitignore