Fix triggering replication when multiple destinations are set #331

welansari · 2024-12-19T14:07:50Z

Issue 1: In Artesca "SITE_NAME" is never used, so we always
trigger objects that are replicated to the first storageClass in
the replication rule.
Issue 2: We check the global replication status when verifying
wether or not an object should be retriggered. This doesn't necessarily
work all the time, especially when replicating to multiple destinations.
As if one destination fails the global status becomes failed, which will
make it impossible to trigger objects with a completed status for example.
Issue 3: replication info is completely overwritten when it does not contain
info about a specific site. This will cause an issue when replicating to multiple
destinations as the script can only be launched for one site at a time, so when
having a object with non initialized replication info, we won't be able to set
the replication info properly for all destinations.

bert-e · 2024-12-19T14:07:53Z

Hello kerkesni,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options

name	description	privileged	authored
`/after_pull_request`	Wait for the given pull request id to be merged before continuing with the current one.
`/bypass_author_approval`	Bypass the pull request author's approval	⭐
`/bypass_build_status`	Bypass the build and test status	⭐
`/bypass_commit_size`	Bypass the check on the size of the changeset `TBA`	⭐
`/bypass_incompatible_branch`	Bypass the check on the source branch prefix	⭐
`/bypass_jira_check`	Bypass the Jira issue check	⭐
`/bypass_peer_approval`	Bypass the pull request peers' approval	⭐
`/bypass_leader_approval`	Bypass the pull request leaders' approval	⭐
`/approve`	Instruct Bert-E that the author has approved the pull request.		✍️
`/create_pull_requests`	Allow the creation of integration pull requests.
`/create_integration_branches`	Allow the creation of integration branches.
`/no_octopus`	Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
`/unanimity`	Change review acceptance criteria from `one reviewer at least` to `all reviewers`
`/wait`	Instruct Bert-E not to run until further notice.

Available commands

name	description	privileged
`/help`	Print Bert-E's manual in the pull request.
`/status`	Print Bert-E's current status in the pull request `TBA`
`/clear`	Remove all comments from Bert-E from the history `TBA`
`/retry`	Re-start a fresh build `TBA`
`/build`	Re-start a fresh build `TBA`
`/force_reset`	Delete integration branches & pull requests, and restart merge process from the beginning.
`/reset`	Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

bert-e · 2024-12-19T14:28:05Z

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

bert-e · 2024-12-19T14:32:18Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

The following options are set: create_pull_requests

bert-e · 2024-12-20T13:46:30Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

The following options are set: create_pull_requests

CRR/ReplicationStatusUpdater.js

francoisferrand · 2024-12-20T14:52:04Z

CRR/ReplicationStatusUpdater.js

+                let replicationInfo = objMD.getReplicationInfo();
+                if (!replicationInfo || !replicationInfo.status) {
                    const { Rules, Role } = repConfig;
                    const destination = Rules[0].Destination.Bucket;


this may not be correct, if there is more than 1 desitination....

crrExistingObjects doesn't currently support triggering replication for multiple sites at once. So this would work as it will initialize the replication info when triggering the first site and then append the other sites' info to the fields when the other ones are triggered.

even without triggering multiple sites at once : if a user requests replicatoin to the second "site", this will initialize with the 1st destination (--> trigger replication!), then we will add the second one...

When there is no replicationInfo (i.e. typically when it is empty), should we not just initialize an empty replicationInfo, and let the next block ("Update replication info with site specific info") fill the details for the requested destination?

in cloudserver, it seems this is initialized to bucketMD.replicationConfig.destination : should we do the same?

This is a weird one... So, we currently only support having a single destination bucket for all replication rules of a bucket. When creating the rules/workflows via UI it's even worse, the destination bucket becomes the name of the bucket we are replicating from.
The value of this field is not used in Zenko. When creating a location, Cloudserver initializes a client class for the respective backend (aws/azure/...) that keeps the name of the destination bucket in memory, that's the value we use when replicating and not what's in the replication rule (only the storageClass is used to know which client to use).

francoisferrand · 2024-12-20T14:55:18Z

CRR/ReplicationStatusUpdater.js

+                        storageClass: '',
                        role: Role,
-                        storageType: this.storageType,
+                        storageType: '',


what are these 2 fields (storageClass and storageType) used for?

storageClass is what each location's replication queue processor uses to check if it should replicate an object or not.
storageType is used by Cloudserver in Backbeat routes to do some pre-checks (check if versioning is supported on the backend and that the location is valid)

that is how these fields are used... my question is really what information do they actually store, what these fields represent (for example, the storageClass of the object is already known, and stored in .location[] and dataStoreName ; and this cannot be the storageClass of the 'remote' object, since there may be multiple destinations...)

same for destination btw, I don't understand this field initialize with a single bucket name :-/ Or maybe this is a left-over from the first replication (not supporting multi-targets)?

the code which initializes these in cloudserver:

function _getReplicationInfo(rule, replicationConfig, content, operationType, objectMD, bucketMD) { const storageTypes = []; const backends = []; const storageClasses = _getStorageClasses(rule); if (!storageClasses) { return undefined; } storageClasses.forEach(storageClass => { const storageClassName = storageClass.endsWith(':preferred_read') ? storageClass.split(':')[0] : storageClass; const location = s3config.locationConstraints[storageClassName]; if (location && replicationBackends[location.type]) { storageTypes.push(location.type); } backends.push(_getBackend(objectMD, storageClassName)); }); if (storageTypes.length > 0 && operationType) { content.push(operationType); } return { status: 'PENDING', backends, content, destination: replicationConfig.destination, storageClass: storageClasses.join(','), role: replicationConfig.role, storageType: storageTypes.join(','), isNFS: bucketMD.isNFS(), }; }

--> it seems these fields (along with role) are not multi-destination/rule aware?
--> these should be mostly useless for Zenko's multibackend replication (otherwise we probably have bugs), but may still be needed for CRR (and may cause bugs if multiple rules/destinations were used in that case)

storageClass and storageType are multi-destination aware (although their name suggests otherwise), they contain the list of all storage classes and storage types we are replicating to.
Example:

storageClass: "aws-location,azure-blob" storageType: "aws_s3,azure"

In Zenko, these fields are duplicates of information we already have, as backbeat/cloudserver have a list of all location information, we could just use the storage class stored in the rules to get the info we want. I think these are more of a relic from S3C that we can't really remove right now as S3C uses them.

In CRR, the role is also a list. I don't think this works in Zenko tho.
The destination field is a weird one, i explained how it works in the previous comment

these are more of a relic from S3C that we can't really remove right now as S3C uses them
In CRR, the role is also a list. I don't think this works in Zenko tho.

"soon", we will have a single arsenal, cloudserver & backbeat : so we can start planning to clean this up, possibly removing (deprecating) redudant fields and aligning the way CRR and multi-backend replication store/manage the state?

can you please create a tech debt ticket, to track this (future) work and document the problem/situation/state of the analysis?

CRR/ReplicationStatusUpdater.js

francoisferrand · 2025-01-08T10:20:39Z

CRR/ReplicationStatusUpdater.js

+                    || !objMD.getReplicationSiteStatus(site));
            }
            return (objMD.getReplicationInfo()
-                && objMD.getReplicationInfo().status === filter);
+                && objMD.getReplicationSiteStatus(site) === filter);


is this true as well for CRR (in s3c) ?

esp. we should check that there is no "legacy" data format where we would not have the per-site status, and where we should rely only on the global state...

(e.g. maybe if CRR sometimes has objet which are marked with replicationInfo.status === 'COMPLETE' and site listed in storageClass, but no site status...)

@nicolas2bert @jonathan-gramain what do you think?

Replication metadata are managed the same in both S3C and Zenko so yes this should work fine.
I looked back to the initial format of the metadata, there was always a global status.

francoisferrand · 2025-01-08T10:44:43Z

code-wise LGTM, but there may be some corner with S3C/CRR, to be confirmed...

bert-e · 2025-01-13T13:57:16Z

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

bert-e · 2025-01-13T14:05:55Z

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

bert-e · 2025-01-14T10:59:09Z

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

bert-e · 2025-01-14T11:02:34Z

Waiting for approval

The following approvals are needed before I can proceed with the merge:

the author
2 peers

- Issue 1: In Artesca "SITE_NAME" is never passed, so we always trigger objects that are replicated to the first storageClass in the replication rule. - Issue 2: We check the global replication status when verifying wether or not an object should be retriggered. This doesn't necessarily work all the time, especially when replicating to multiple destinations. As if one destination fails the global status becomes failed, which will make it impossible to trigger objects with a completed status for example. - Issue 3: replication info is completely overwritten when it does not contain info about a specific site. This will cause an issue when replicating to multiple destinations as the script can only be launched for one site at a time, so when having a object with non initialized replication info, we won't be able to set the replication info propely for all destinations. Issue: S3UTILS-184

oras was removed from from Ubuntu 24.04 GitHub runner images Issue: S3UTILS-184

Ubuntu 24.04 doesn't have libssl1.1 which is needed by mongo-memory-server Issue: S3UTILS-184

welansari · 2025-01-14T13:48:19Z

/approve

bert-e · 2025-01-14T13:48:29Z

I have successfully merged the changeset of this pull request
into targetted development branches:

✔️ development/1.15

The following branches have NOT changed:

development/1.13
development/1.14
development/1.4

Please check the status of the associated issue S3UTILS-184.

Goodbye kerkesni.

The following options are set: approve

welansari force-pushed the bugfix/S3UTILS-184 branch from 00a5b99 to 09ccadb Compare December 19, 2024 14:27

scality deleted a comment from bert-e Dec 19, 2024

bert-e mentioned this pull request Dec 19, 2024

INTEGRATION [PR#331 > development/1.15] Fix triggering replication when multiple destinations are set #332

Closed

welansari force-pushed the bugfix/S3UTILS-184 branch from 09ccadb to 4efccfb Compare December 20, 2024 13:46

welansari changed the base branch from development/1.14 to development/1.15 December 20, 2024 13:46

scality deleted a comment from bert-e Dec 20, 2024

francoisferrand reviewed Dec 20, 2024

View reviewed changes

CRR/ReplicationStatusUpdater.js Outdated Show resolved Hide resolved

francoisferrand reviewed Dec 20, 2024

View reviewed changes

welansari force-pushed the bugfix/S3UTILS-184 branch from 4efccfb to 55c1b97 Compare December 23, 2024 09:29

welansari requested a review from francoisferrand December 23, 2024 09:42

KillianG approved these changes Dec 23, 2024

View reviewed changes

francoisferrand reviewed Jan 8, 2025

View reviewed changes

CRR/ReplicationStatusUpdater.js Outdated Show resolved Hide resolved

francoisferrand reviewed Jan 8, 2025

View reviewed changes

francoisferrand approved these changes Jan 8, 2025

View reviewed changes

welansari force-pushed the bugfix/S3UTILS-184 branch from 55c1b97 to 7df1433 Compare January 13, 2025 13:52

scality deleted a comment from bert-e Jan 13, 2025

welansari force-pushed the bugfix/S3UTILS-184 branch from 4a9f9a1 to 7a943c1 Compare January 14, 2025 10:54

setup oras in build workflow

b66c7f4

oras was removed from from Ubuntu 24.04 GitHub runner images Issue: S3UTILS-184

welansari force-pushed the bugfix/S3UTILS-184 branch 4 times, most recently from 2603ce3 to bb1ad79 Compare January 14, 2025 12:16

install libssl1.1 in ci

e82f3ea

Ubuntu 24.04 doesn't have libssl1.1 which is needed by mongo-memory-server Issue: S3UTILS-184

welansari force-pushed the bugfix/S3UTILS-184 branch from bb1ad79 to e82f3ea Compare January 14, 2025 13:28

bert-e merged commit e82f3ea into development/1.15 Jan 14, 2025
10 checks passed

bert-e deleted the bugfix/S3UTILS-184 branch January 14, 2025 13:48

Fix triggering replication when multiple destinations are set #331

Fix triggering replication when multiple destinations are set #331

Uh oh!

Conversation

welansari commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bert-e commented Dec 19, 2024

Hello kerkesni,

Uh oh!

bert-e commented Dec 19, 2024

Request integration branches

Uh oh!

bert-e commented Dec 19, 2024

Waiting for approval

Uh oh!

bert-e commented Dec 20, 2024

Waiting for approval

Uh oh!

Uh oh!

francoisferrand Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

welansari Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

francoisferrand Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

francoisferrand Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

welansari Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francoisferrand Dec 20, 2024

Choose a reason for hiding this comment

Uh oh!

welansari Dec 23, 2024

Choose a reason for hiding this comment

Uh oh!

francoisferrand Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francoisferrand Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

welansari Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francoisferrand Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

francoisferrand Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

francoisferrand Jan 8, 2025

Choose a reason for hiding this comment

Uh oh!

welansari Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

francoisferrand commented Jan 8, 2025

Uh oh!

bert-e commented Jan 13, 2025

Build failed

Uh oh!

bert-e commented Jan 13, 2025

Build failed

Uh oh!

bert-e commented Jan 14, 2025

Build failed

welansari commented Dec 19, 2024 •

edited

Loading

francoisferrand Dec 20, 2024 •

edited

Loading

francoisferrand Dec 26, 2024 •

edited

Loading

welansari Dec 26, 2024 •

edited

Loading

francoisferrand Dec 26, 2024 •

edited

Loading

francoisferrand Dec 26, 2024 •

edited

Loading

welansari Dec 26, 2024 •

edited

Loading

francoisferrand Jan 8, 2025 •

edited

Loading