Releases: libopenstorage/stork
Stork 2.10.0
New Features
- Added support for Object Lock enabled buckets. This support is currently supported only with the PX-Backup product. #1047
Improvements
- Use
cmdexecutorimage as per stork deployment image #1084 - Use
kopiaexecutorimage as per stork deployment image #1076
Bug Fixes
-
Issue: CSI based backups failing for k8s > 1.22+
User Impact: From k8s 1.22+ onwards CSIDriver v1beta1 APIs are removed which causes CSI backup to fail
Resolution: Added support for V1 API's for csi driver #1079 -
Issue: The restore size for EBS volumes was displaying incorrectly
User Impact: This lead to ambitious behaviour for user as the restore is less than the backup size for EBS volume.
Resolution: Fixed to correct restore size for EBS volume #1069 -
Issue: Generic backup options triggering generic backup instead of CSI backup
Resolution: Added missingBackupTypein Applicationbackup CR to trigger appropriate backup #1066
Stork 2.9.0
New Features
- Migration controller from the primary cluster can auto-detect applications activated on the paired DR cluster and suspend its respective migration schedules. Users can enable this option by passing autoSuspend option in MigrationSchedule specs #845
storkctlwill now report volume/resources and data transfer information into migration summary #1030- Added support for child server activate/deactivate for CR resources via multiple suspend options #1034
Improvements
- Improve resource migration time by parallelization of resources #1024
- Update k8s lib in stork with version v1.20.7 to fix CVEs reported by dependabot #1027
- Stork scheduler/pod placement improvements
- Enable webhook controller by default #1038
- User can now disable stork scheduler scoring by providing flag stork.libopenstorage.org/disableHyperconvergence #1038
- Stork scheduler will now give degraded nodes half of the scores than an online healthy node. Degraded nodes are those which are online but are not serving IOs locally. This is done so that other nodes which are Online get prioritized over such degraded nodes to give application pods higher chances of hyperconvergence. #1042
- STORK now waits for upto 2 minutes for a storage node to come back up online before deleting the application pods running on that node. By default Stork will continue to poll for offline nodes every 2 minutes. The max time a pod running on an offline node to get deleted will be now 4 minutes. #1028
- Migration schedule will now generate an event if an invalid clusterpair configuration is provided. #1053
Bug Fixes
-
Issue: Unable to resize migrated PVCs
User Impact: From stork 2.6.4 onwards SC field for migrated PVCs has been removed to support PVC size reflection on DR site. For migrated pvcs user can't perform resize operation since SC is missing, they have to manually point PVC spec to SC.
Resolution: Upgrade to stork 2.9.0 should resolve this issue. Kindly note that SC migration is not supported by stork, so if SC is missing on DR site or does not have allowVolumeExpansion set. Users will have to create/modify SC to perform PVC resize operation #1029 -
Issue:
storkctlactivate/deactivate migration command would not cleanly shutdown all the pods created by the PerconaXtraDBCluster CR. This was caused due to an incorrect suspendOption provided in this CR.
User Impact: Unable to activate/deactivate PerconaXtraDBCluster cluster gracefully.
Resolution: Since PerconaXtraDBCluster has multiple child servers, the proper way to gracefully shutdown it is via thespec.Pausepath. User either can change appreg option for PerconaXtraDBCluster or delete entry and allow stork to recreate AppReg for them
#1035 -
Issue: SnapshotSchedule CR would show incorrect sync status if a snapshot was triggered through the SnapshotCreate API while the underlying driver went offline.
User Impact: SnapshotSchedule will have Error status but Snapshot object would have successful status for same snapshot
Resolution: Fixed issue to make VolumeSnapshotSchedule and VolumeSnapshot object status always in-sync #1036 -
Issue: Empty volume driverName in PrepareResource API would cause application clone to fail.
User Impact: ApplicationClone failed at resource apply stage with an error VolumeDriver with UID/Name: not found
Resolution: A correct driverName is now being passed as a part of the PrepareResource API. #1046 -
Issue :
storkctldeactivate clusterdomain causes successful migrations to be marked as failed.
User Impact: Successful migration get marked as failed on deactivated cluster domain.
Resolution: Skip marking migration in final stage as failed if local domain is deactivated #1052 -
Issue: Underlying storage pair gets deleted if multiple Cluster Pairs are pointing to same RemoteStorageID and the user deletes any one of them.
User Impact: System will have ClusterPair in ready state, but underlying storage pair may not exist, which can result in migration failure
Resolution: ClusterPair will be deleted and underlying Storage Pair will be kept as it is, if another ClusterPair is pointing to same RemoteStorageID #1052 -
Issue: Unable to recognize storage driver for FA/FB PVC
User Impact: Pods using FA/FB backed PVCs getting scheduled on non-px nod
Resolution: Pods using FA/FB backed PVCs will now allowed to be scheduled by stork scheduler #1056
Release Stork v2.8.2
Improvements
- Improvements were done to the KDMP driver in STORK. This driver is currently used only through PX-Backup. All the improvements can be found on PX-Backup release note page here.
Bug Fixes
-
Following vulnerability has been fixed in this stork version: CVE-2021-42574
-
Issue: While parsing Rules provided as a part of Backup / Restores, STORK was error handling a non-existent error.
User Impact: Backups and Restores would succeed even if the provided Rule was incorrect.
Resolution: The incorrect error handling is now removed. (#973) -
Issue: ResourceCollector package in STORK would collect ClusterRole and ClusterRoleBinding of a service account even though the caller of resource collector did not have permissions on that namespaces.
User Impact: Other libraries using this resource collector package would get ClusterRole and ClusterRoleBinding in its response.
Resolution: Check the permission of the caller if it has access to the service account before returning the ClusterRole and ClusterRoleBinding (#1001) -
Issue: RoleBindings which had a Subjects with no namespaces in it were not restored.
User Impact: An application using such a RoleBinding would fail to start after restoring on the destination cluster
Resolution: STORK will not retain the Subject in a RoleBinding if it does not have a namespace in it. (#1017) -
Issue: RoleBindings which had a
systemprefix in OCP environments were not backed up by STORK.
User Impact: An application using a RoleBinding with an associated system SCC in Openshift would not start up after restore since the associated RoleBinding was not backed up on the source side.
Resolution: RoleBindings which have a prefixsystem:openshift:sccwill be collected and backed up by STORK. (#1021) -
Issue: When a user selects partial restore of a few PVCs from a Backup, the internal volume list maintained in the Backup object was passed as is without filtering the unwanted PVCs.
User Impact: The restore operation would hang and not complete even for the selected PVCs
Resolution: Filter out the unwanted PVCs from the "IncludeResources" list to ensure only the selected PVCs get restored. (#998)
Release Stork v2.8.1
Improvements
- KDMP Driver: The KDMP driver and its CRDs now support kubernetes v1.22+. Generic backup and restore is now supported on kubernetes v1.22 and above.
Release Stork v2.8.0
New Features
- A new driver KDMP has been added to stork for taking generic backups and restores of PVCs of any underlying storage provider. Currently this driver is only supported via Portworx PX-Backup
Note : Generic backup/restore is not supported for k8s 1.22+. To deploy stork 2.8.0 in k8s 1.22+ kindly make sure to disable the kdmp driver in stork specs by adding the following argument: kdmp-controller: false
Improvements
Release Stork v2.7.0
New Features
- Stork now supports kubernetes v1.22 and above.
- [Portworx Driver]: Add support for backups and restores of volumes from a PX-Security enabled clusters. Stork will use the standard auth storage class parameters and annotations to determine which token to use for backing up and restoring Portworx volumes. (#895)
Improvements
- If a storage provider does not have the concept of replica nodes for a volume do not error out the filter request while scheduling pods using such storage provider's volumes. (#897)
storkctl clusterdomaincommands would silently fail if the provided input domain was an invalid one. Now storkctl will fail the operation if the input cluster domain is not one of the stork detected cluster domains. (#898)- [Portworx Driver] Carry over the auth related annotations from MigrationSchedules to actual Migration objects (#899)
- Add short name support for fetching stork VolumeSnapshot and VolumeSnapshotData CR. The following short hand notations can be used to fetch snapshot and snapshot data objects (#907):
kubectl get stork-volumesnapshot kubectl get svs kubectl get stork-volumesnapshotdata kubectl get svsd
Bug Fixes
-
Issue: In-place VolumeSnapshotRestore issued a "force" delete of the pods using the PVC that needs to be restored. The "force" delete of the pods gave a false indication that the PVC is not being used by any apps. The subsequent Restore command used to fail since the volume was still in use.
User Impact: In-place VolumeSnapshotRestore would fail intermittently.
Resolution: Perform a regular delete of the pods and wait for pod deletion while performing in-place VolumeSnapshotRestore.(#878) -
Issue: Backup and/or Restores of namespaces with both CSI and non CSI PVCs would fail since the non CSI PVCs would finish their backup first but the CSI PVCs would take longer time. The successful non CSI PVC backups would update the Backup CR status to successful causing the CSI drivers to prematurely fail their backup.
User Impact: Backups of namespaces that had both CSI and non CSI PVCs would fail.
Resolution: When a backups involves two different drivers make sure each driver handles only their PVCs. (#885) (#892) (#894) (#901) -
Issue: Stork was not updating the finish timestamp on an ApplicationBackup CR in certain failed backup scenarios
User Impact: Prometheus metrics were not reported for failed backups.
Resolution: Set the finish timestamp even when an application backup fails. (#896) -
Issue: Stork would ignore the replace policy and always update the namespace metadata on ApplicationRestore
User Impact: Even if the replace policy on an ApplicationRestore is set to retain Stork would override the annotations on a namespace
Resolution: On ApplicationRestore always set replace the namespace metadata only if the replace policy is not retain. (#896 ) -
Issue:
storkctl activate/updatecommand would throw an incorrect message on a successful operation.
User Impact:storkctl activate/updatecommand would throw an incorrect message where it would have set the MongoDB CR's spec.Replicas field tofalsewhere it would have actual set the value to0
Resolution:storkctl activatecommand now shows a proper message based on the update it did. (#896) -
Issue: CSI Portworx PVs have a volumeHandle field set to volumeID which could change if a failback operation is executed on the source cluster.
User Impact: Applications using CSI PVCs on the source cluster cannot start after a failback operation
Resolution: Fix the volumeHandle field of a CSI Portworx PV during failback migration (#943) -
Issue: The portworx driver in stork did not handle a pod specification which could directly use Portworx PVs instead of PVCs.
User Impact: Stork pods could hit a nil panic and restart when using Portworx PVs directly in pod specification.
Resolution: Fix a nil panic in portworx driver in the GetPodVolumes implementation. (#926) -
Issue: Backups would fails with older versions of stork on GKE 1.21 clusters
User Impact: Backups would fails with older versions of stork on GKE 1.21 clusters
Resolution: Use the new label for zone failure domain while handling GCE PDs. (#930) -
Issue: PX-Security annotations on MigrationSchedule objects were not propagated to respective Migration object
User Impact: Migrations triggered as a part of a MigrationSchedule would fail on a PX-Security enabled Portworx cluster
Resolution: Add the annotations from MigrationSchedule to the respective Migration object (#899) -
Issue: The PVC UID mappings were not updated after a Migration causing CSI PVCs to stay in Unbound state.
User Impact: Portworx CSI PVC would stay in Unbound state after a Migration
Resolution: Update pvc uid mapping while migrating pv objects (#919) -
Issue: Stork would hit a nil panic while handling an ApplicationRestore object if the source namespace have no labels.
User Impact: ApplicationRestores would timeout if namespaces have no labels
Resolution: Stork now handles empty labels on a namespace when performing ApplicationRestores (#918) -
Issue: Stork was not able to handle in-place volume snapshot restore when the Portworx driver was initializing or not able to handle restore requests.
User Impact: VolumeSnapshotRestore would fail if Portworx driver is temporarily unhealthy or is initializing.
Resolution: Stork now waits for Portworx driver to be up and healthy and handlesresource temporarily unavailableerrors from Portworx and does not fail the restore request. (#875)
Docker Hub Image: openstorage/stork:2.7.0
Release Stork v2.6.5
Bug Fixes
-
Issue: In-place VolumeSnapshotRestore issued a "force" delete of the pods using the PVC that needs to be restored. The "force" delete of the pods gave a false indication that the PVC is not being used by any apps. The subsequent Restore command used to fail since the volume was still in use.
User Impact: In-place VolumeSnapshotRestore would fail intermittently.
Resolution: Perform a regular delete of the pods and wait for pod deletion while performing in-place VolumeSnapshotRestore. Perform a force delete if regular pod deletion fails. (#878) -
Issue: On certain cloud providers after deleting a Kubernetes the associated ports & ips are not released immediately. This caused migrations of service objects to fail since as a part of migration the service objects are deleted and recreated.
User Impact: Migrations would fail intermittently.
Resolution: Stork will not recreate service on destination/DR cluster if it's not changed on primary cluster during migration. (#874)
Release Stork v2.6.4
New Features
- Added storkctl command to create bidirectional cluster-pair on source and destination cluster. (#787)
storkctl create clusterpair testpair -n kube-system --src-kube-file <src-kubeconfig> --dest-kube-file <dest-kubeconfig> --src-ip <src_ip> --destip <dest_ip> --src-token <src_token> --dest-token <dest_token>
ClusterPair testpair created successfully on source cluster
ClusterPair testpair created successfully on destination cluster
- Added NamespacedSchedulePolicy which can be used in all the Schedule objects. All the schedules will try to find a policy in the namespace first. If it doesn't exist it'll try to use the cluster scoped policy (#832 )
- Added support for doing custom resource selection per namespace when backing up multiple namespaces as a part of single ApplicationBackup. (#848)
- Added support for backup and migration of MongoDB Community CRs. (#856)
Improvements
- Do not fail application backups when the storage provider returns a busy error code, instead backoff and retry the operation after sometime. (#847 )
- Added support for backing up NetworkPolicy and PodDisruptionBudget objects (#841 )
- While applying the ServiceAccount resources as a part of ApplicationResource or Migrations merge the annotations from the source and destination. (#844) (#858)
- Use annotation
stork.libopenstorage.org/skipSchedulerScoringon the PVCs whose replicas should not be considered while scoring nodes in stork scheduler. (#846) - ApplicationClone now set
portsfield of service to nil/empty before cloning to destination namespace. (#870) - Create a ConfigMap with details about the current working stork version. (#855)
- Add an option
skipServiceUpdateto skip updating service objects during ApplicationBackups. (#844)
Bug Fixes
-
Issue: During migrations, if the size of a PVC on the source cluster got changed it did not get reflected on the corresponding PVC on the target cluster.
User Impact: Target PVCs showed incorrect size even if the backing volume had the updated size.
Resolution: In every migration cycle, the PVC on the target cluster will get updated. (#835) -
Issue: When using static IPs in Kubernetes service objects, after migration stork would clear out the IP field from the service object.
User Impact: On ApplicationBackup/Restore of service objects that use static IPs, on restore the service objects would loose the IP.
Resolution: ApplicationBackup CR now has a new boolean fieldskipServiceUpdatethat can be used to instruct stork to skip any Kubernetes service object processing. If set totrue, stork will migrate the service resource as-is to the target cluster. (#844 ) -
Issue: ApplicationClones were failing if the target namespace already exists.
User Impact: ApplicationClones would fail when the target namespaces where the applications need to be cloned already exists. (#834)
Resolution: Ignore the AlreadyExists error when checking target namespaces instead of marking the ApplicationClone failed. (#834 ) -
Issue: Stork webhook controller would not work since the associated certificate did not have a valid SAN set.
User Impact: With stork webhook enabled, stork would not automatically set theschedulerNameon pods using stork supported storage driver PVCs.
Resolution: Stork now creates a valid cert and it will also update the certs that were already created. Kubernetes API server will recognize the stork webhook controller's cert as valid and forward the webhook requests.(#859 ) -
Issue:
storkctl activatecommand would panic with the following errorcannot deep copy int32
User Impact: storkctl activate would fail to failover the applications on a target cluster.
Resolution:storkctl activatecommand now uses the right replica field value while activating migrations. (#862)
Docker Hub Image: openstorage/stork:2.6.4
Release Stork v2.6.3
New Features
- Added support to watch newly registered CRDs and auto create
ApplicationRegistrationresource for them. (#792) - Add suspend option support for the following new apps (#811) :
- perconadb
- prometheus
- rabbitmq
- kafka (strimzi)
- postgress(acid.zalan.do)
Improvements
- Added support for specifying ResourceType in ApplicationBackup that allows selecting specific resources while backing up a namespace. (#799)
- Support of pvc object migration in case of migration spec with disable application resource migration. (#783)
- All stork prometheus metrics will now have
stork_as prefix, eg. stork_application_backup_status, stork_migration_status, stork_hyperconverged_pods_total etc. All existing metrics will now move over tostork_prefix convention. (#816)
Bug Fixes
- Configure side-effect parameter for stork webhook. This will allow running kubectl --dry-run command when stork webook is enabled. (#802)
- Restores of certain applications would partially succeed in k8s 1.20 due to the addition of a new field ClusterIPs to Services. This new field is now handled in restores. (#800)
- Allow excluding resources from GetResources API. This allows handling of ApplicationRestores in k8s 1.20 and above where certain resources like kube-root.ca config map are always created when a new namespace is created. (#798)
- Added support to watch newly registered CRDs and auto create
ApplicationRegistrationresource for them. (#792) - Fixed issue with failed snapshots being retried. Printing correct error state for snapshots (#786)
Docker Hub Image: openstorage/stork:2.6.3
Release Stork v2.6.0
New Features
- Added the ability to backup and restore CSI volumes (#697).
- Added Prometheus metrics and Grafana dashboards for the scheduler extender and health monitor (#710).
Improvements
- Added a mechanism to change the log level during runtime. You can do this by adding the log level to
/tmp/logleveland then sending SIGUSR1 to the process (#735). - Pruned migrations that are in
PartialSuccessstate. Previously, Stork retained all migrations in this state, causing theMigrationScheduleobject to become very large (#742).
Bug Fixes
- Stork now triggers volume backups in batches of 10 and retries update failures due to conflicts. This behavior prevents issues where a backup is triggered, but Stork fails to update the CR (#707).
- Added support for disabling the cronjob object upon migration to a remote cluster. You can now also activate/deactivate a cronjob object using the
storkctl activate/deactivate migration <migration_namespace>command (#731). - Stork now fetches ApplicationRegistration objects only once when preparing resources. Previously, Stork made multiple calls to the kube-apiserver, causing delays during migration and backup (#732).
- Fixed issue where cluster-scoped CRs were being collected for migration and backup (#732).
- Fixed the volume size for EBS backups, which were previously incorrectly reported in GiB (#733).