Skip to content

Conversation

@mrnicegyu11
Copy link
Member

@mrnicegyu11 mrnicegyu11 commented Dec 1, 2025

What do these changes do?

  • Adds dedicated machines for storage and api workers, which they can max out w.r.t. CPU especially, without impacting the rest of the platform
  • Resource specs of these machines are best guesses
  • Is osparc-simcore able to smoothly handle no worker being present for, e.g. 24h, if the machine is down on the weekend, without loosing "celery jobs"? Because, to keep costs in order, for now there is no High Availability for these simcoreworker machines

This is a draft for discussion. Check out the linked ops-config PR as well.

Bonus:

  • Removes unused "pgbackup.enable=true" label

Related issue/s

#1229

Related PR/s

https://git.speag.com/oSparc/osparc-ops-deployment-configuration/-/merge_requests/1699

Checklist

  • I tested and it works

mrnicegyu11 and others added 30 commits October 15, 2024 16:18
Merge remote-tracking branch 'upstream/main'
…oundation#979)

* Introduce longhorn chart

* Further longhorn configuration

* Longhorn: further settings configuration

* Fix longhorn configuration bugs

Extra: introduce longhorn pv vales for portainer

* Add comment for deletion longhorn

* Further longhorn configuration

* Add README.md for Longhorn wit FAQ

* Update Longhorn readme

* Update readme

* Futher LH configuration

* Update LH's Readme

* Update Longhorn Readme

* Improve LH's Readme

* LH: Reduce reserved default disk space to 5%

Since we use a dedicated disk for LH, we can go ahead with 5%

* Use values to set Longhorn storage class

* Update LH's Readme

* LH Readme: add requirements reference

* PR Review: bring back portainer s3 pv

* LH: decrease portinaer volume size
Merge remote-tracking branch 'upstream/main'
mrnicegyu11 and others added 13 commits October 8, 2025 11:51
…ndation#1223)

* wip

* Add csi-s3 and have portainer use it

* Change request @Hrytsuk 1GB max portainer volume size

* Arch Linux Certificates Customization

* Fix pgsql exporter failure

* [Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉  (ITISFoundation#979)

* Introduce longhorn chart

* Further longhorn configuration

* Longhorn: further settings configuration

* Fix longhorn configuration bugs

Extra: introduce longhorn pv vales for portainer

* Add comment for deletion longhorn

* Further longhorn configuration

* Add README.md for Longhorn wit FAQ

* Update Longhorn readme

* Update readme

* Futher LH configuration

* Update LH's Readme

* Update Longhorn Readme

* Improve LH's Readme

* LH: Reduce reserved default disk space to 5%

Since we use a dedicated disk for LH, we can go ahead with 5%

* Use values to set Longhorn storage class

* Update LH's Readme

* LH Readme: add requirements reference

* PR Review: bring back portainer s3 pv

* LH: decrease portinaer volume size

* Experimental: Try to add tracing to simcore-traefik on master

* Fixes ITISFoundation/osparc-simcore#7363

* Arch Linux Certificates Customization - 2

* Send docker logs directly to graylog

* revert arch linux customization

---------

Co-authored-by: Dustin Kaiser <[email protected]>
Co-authored-by: YH <[email protected]>
* wip

* Add csi-s3 and have portainer use it

* Change request @Hrytsuk 1GB max portainer volume size

* Arch Linux Certificates Customization

* Fix pgsql exporter failure

* [Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉  (ITISFoundation#979)

* Introduce longhorn chart

* Further longhorn configuration

* Longhorn: further settings configuration

* Fix longhorn configuration bugs

Extra: introduce longhorn pv vales for portainer

* Add comment for deletion longhorn

* Further longhorn configuration

* Add README.md for Longhorn wit FAQ

* Update Longhorn readme

* Update readme

* Futher LH configuration

* Update LH's Readme

* Update Longhorn Readme

* Improve LH's Readme

* LH: Reduce reserved default disk space to 5%

Since we use a dedicated disk for LH, we can go ahead with 5%

* Use values to set Longhorn storage class

* Update LH's Readme

* LH Readme: add requirements reference

* PR Review: bring back portainer s3 pv

* LH: decrease portinaer volume size

* Experimental: Try to add tracing to simcore-traefik on master

* Fixes ITISFoundation/osparc-simcore#7363

* Arch Linux Certificates Customization - 2

* Remove frontend vendor chatbot service

* wip

---------

Co-authored-by: Dustin Kaiser <[email protected]>
Co-authored-by: YH <[email protected]>
* wip

* Add csi-s3 and have portainer use it

* Change request @Hrytsuk 1GB max portainer volume size

* Arch Linux Certificates Customization

* Fix pgsql exporter failure

* [Kubernetes] Introduce on-prem persistent Storage (Longhorn) 🎉  (ITISFoundation#979)

* Introduce longhorn chart

* Further longhorn configuration

* Longhorn: further settings configuration

* Fix longhorn configuration bugs

Extra: introduce longhorn pv vales for portainer

* Add comment for deletion longhorn

* Further longhorn configuration

* Add README.md for Longhorn wit FAQ

* Update Longhorn readme

* Update readme

* Futher LH configuration

* Update LH's Readme

* Update Longhorn Readme

* Improve LH's Readme

* LH: Reduce reserved default disk space to 5%

Since we use a dedicated disk for LH, we can go ahead with 5%

* Use values to set Longhorn storage class

* Update LH's Readme

* LH Readme: add requirements reference

* PR Review: bring back portainer s3 pv

* LH: decrease portinaer volume size

* Experimental: Try to add tracing to simcore-traefik on master

* Fixes ITISFoundation/osparc-simcore#7363

* Arch Linux Certificates Customization - 2

* wip

* wip

* this might work

* k8s wip

* wip

* wip

---------

Co-authored-by: Dustin Kaiser <[email protected]>
Co-authored-by: YH <[email protected]>
@mrnicegyu11 mrnicegyu11 marked this pull request as ready for review December 1, 2025 12:43
Copy link

@wvangeit wvangeit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I follow what this is about

Copy link
Collaborator

@YuryHrytsuk YuryHrytsuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks HA, right? Or do I miss something?

Copy link
Contributor

@giancarloromeo giancarloromeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@mrnicegyu11
Copy link
Member Author

This breaks HA, right? Or do I miss something?

this breaks HA for the workers, true. And it is ok that you block the merge for it, as said, this was up for discussion. HA can be easily achieved for the separated simcoreworkers, buy adding one more machine of each, but at heightend costs. We can maybe discuss this in the whole team, at the retro or so? Maybe we can put this in when the persistent GPU machines go away, then the cost decrease and increase will compensate each other. let me know what you prefer @YuryHrytsuk please, I want to work towards closing this ticket (that came from an incident :O)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants