Skip to content

Conversation

@welansari
Copy link

@welansari welansari commented Dec 19, 2024

  • Issue 1: In Artesca "SITE_NAME" is never used, so we always
    trigger objects that are replicated to the first storageClass in
    the replication rule.

  • Issue 2: We check the global replication status when verifying
    wether or not an object should be retriggered. This doesn't necessarily
    work all the time, especially when replicating to multiple destinations.
    As if one destination fails the global status becomes failed, which will
    make it impossible to trigger objects with a completed status for example.

  • Issue 3: replication info is completely overwritten when it does not contain
    info about a specific site. This will cause an issue when replicating to multiple
    destinations as the script can only be launched for one site at a time, so when
    having a object with non initialized replication info, we won't be able to set
    the replication info properly for all destinations.

Issue: S3UTILS-184

@bert-e
Copy link
Contributor

bert-e commented Dec 19, 2024

Hello kerkesni,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@scality scality deleted a comment from bert-e Dec 19, 2024
@bert-e
Copy link
Contributor

bert-e commented Dec 19, 2024

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@bert-e
Copy link
Contributor

bert-e commented Dec 19, 2024

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_pull_requests

@welansari welansari changed the base branch from development/1.14 to development/1.15 December 20, 2024 13:46
@bert-e
Copy link
Contributor

bert-e commented Dec 20, 2024

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following options are set: create_pull_requests

@scality scality deleted a comment from bert-e Dec 20, 2024
@scality scality deleted a comment from bert-e Dec 20, 2024
let replicationInfo = objMD.getReplicationInfo();
if (!replicationInfo || !replicationInfo.status) {
const { Rules, Role } = repConfig;
const destination = Rules[0].Destination.Bucket;
Copy link
Contributor

@francoisferrand francoisferrand Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be correct, if there is more than 1 desitination....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crrExistingObjects doesn't currently support triggering replication for multiple sites at once. So this would work as it will initialize the replication info when triggering the first site and then append the other sites' info to the fields when the other ones are triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even without triggering multiple sites at once : if a user requests replicatoin to the second "site", this will initialize with the 1st destination (--> trigger replication!), then we will add the second one...

When there is no replicationInfo (i.e. typically when it is empty), should we not just initialize an empty replicationInfo, and let the next block ("Update replication info with site specific info") fill the details for the requested destination?

Copy link
Contributor

@francoisferrand francoisferrand Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in cloudserver, it seems this is initialized to bucketMD.replicationConfig.destination : should we do the same?

Copy link
Author

@welansari welansari Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird one... So, we currently only support having a single destination bucket for all replication rules of a bucket. When creating the rules/workflows via UI it's even worse, the destination bucket becomes the name of the bucket we are replicating from.
The value of this field is not used in Zenko. When creating a location, Cloudserver initializes a client class for the respective backend (aws/azure/...) that keeps the name of the destination bucket in memory, that's the value we use when replicating and not what's in the replication rule (only the storageClass is used to know which client to use).

Comment on lines +197 to +199
storageClass: '',
role: Role,
storageType: this.storageType,
storageType: '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these 2 fields (storageClass and storageType) used for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageClass is what each location's replication queue processor uses to check if it should replicate an object or not.
storageType is used by Cloudserver in Backbeat routes to do some pre-checks (check if versioning is supported on the backend and that the location is valid)

Copy link
Contributor

@francoisferrand francoisferrand Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is how these fields are used... my question is really what information do they actually store, what these fields represent (for example, the storageClass of the object is already known, and stored in .location[] and dataStoreName ; and this cannot be the storageClass of the 'remote' object, since there may be multiple destinations...)

same for destination btw, I don't understand this field initialize with a single bucket name :-/ Or maybe this is a left-over from the first replication (not supporting multi-targets)?

Copy link
Contributor

@francoisferrand francoisferrand Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code which initializes these in cloudserver:

function _getReplicationInfo(rule, replicationConfig, content, operationType,
    objectMD, bucketMD) {
    const storageTypes = [];
    const backends = [];
    const storageClasses = _getStorageClasses(rule);
    if (!storageClasses) {
        return undefined;
    }
    storageClasses.forEach(storageClass => {
        const storageClassName =
              storageClass.endsWith(':preferred_read') ?
              storageClass.split(':')[0] : storageClass;
        const location = s3config.locationConstraints[storageClassName];
        if (location && replicationBackends[location.type]) {
            storageTypes.push(location.type);
        }
        backends.push(_getBackend(objectMD, storageClassName));
    });
    if (storageTypes.length > 0 && operationType) {
        content.push(operationType);
    }
    return {
        status: 'PENDING',
        backends,
        content,
        destination: replicationConfig.destination,
        storageClass: storageClasses.join(','),
        role: replicationConfig.role,
        storageType: storageTypes.join(','),
        isNFS: bucketMD.isNFS(),
    };
}

--> it seems these fields (along with role) are not multi-destination/rule aware?
--> these should be mostly useless for Zenko's multibackend replication (otherwise we probably have bugs), but may still be needed for CRR (and may cause bugs if multiple rules/destinations were used in that case)

Copy link
Author

@welansari welansari Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageClass and storageType are multi-destination aware (although their name suggests otherwise), they contain the list of all storage classes and storage types we are replicating to.
Example:

storageClass: "aws-location,azure-blob"
storageType: "aws_s3,azure"

In Zenko, these fields are duplicates of information we already have, as backbeat/cloudserver have a list of all location information, we could just use the storage class stored in the rules to get the info we want. I think these are more of a relic from S3C that we can't really remove right now as S3C uses them.

In CRR, the role is also a list. I don't think this works in Zenko tho.
The destination field is a weird one, i explained how it works in the previous comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are more of a relic from S3C that we can't really remove right now as S3C uses them
In CRR, the role is also a list. I don't think this works in Zenko tho.

"soon", we will have a single arsenal, cloudserver & backbeat : so we can start planning to clean this up, possibly removing (deprecating) redudant fields and aligning the way CRR and multi-backend replication store/manage the state?

can you please create a tech debt ticket, to track this (future) work and document the problem/situation/state of the analysis?

Comment on lines +126 to +129
|| !objMD.getReplicationSiteStatus(site));
}
return (objMD.getReplicationInfo()
&& objMD.getReplicationInfo().status === filter);
&& objMD.getReplicationSiteStatus(site) === filter);
Copy link
Contributor

@francoisferrand francoisferrand Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true as well for CRR (in s3c) ?

esp. we should check that there is no "legacy" data format where we would not have the per-site status, and where we should rely only on the global state...

(e.g. maybe if CRR sometimes has objet which are marked with replicationInfo.status === 'COMPLETE' and site listed in storageClass, but no site status...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas2bert @jonathan-gramain what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replication metadata are managed the same in both S3C and Zenko so yes this should work fine.
I looked back to the initial format of the metadata, there was always a global status.

@francoisferrand
Copy link
Contributor

code-wise LGTM, but there may be some corner with S3C/CRR, to be confirmed...

@scality scality deleted a comment from bert-e Jan 13, 2025
@bert-e
Copy link
Contributor

bert-e commented Jan 13, 2025

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jan 13, 2025

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jan 14, 2025

Build failed

The build for commit did not succeed in branch bugfix/S3UTILS-184

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jan 14, 2025

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

- Issue 1: In Artesca "SITE_NAME" is never passed, so we always
trigger objects that are replicated to the first storageClass in
the replication rule.

- Issue 2: We check the global replication status when verifying
wether or not an object should be retriggered. This doesn't necessarily
work all the time, especially when replicating to multiple destinations.
As if one destination fails the global status becomes failed, which will
make it impossible to trigger objects with a completed status for example.

- Issue 3: replication info is completely overwritten when it does not contain
info about a specific site. This will cause an issue when replicating to multiple
destinations as the script can only be launched for one site at a time, so when
having a object with non initialized replication info, we won't be able to set
the replication info propely for all destinations.

Issue: S3UTILS-184
oras was removed from from Ubuntu 24.04 GitHub runner images

Issue: S3UTILS-184
@welansari welansari force-pushed the bugfix/S3UTILS-184 branch 4 times, most recently from 2603ce3 to bb1ad79 Compare January 14, 2025 12:16
Ubuntu 24.04 doesn't have libssl1.1 which is needed by mongo-memory-server

Issue: S3UTILS-184
@welansari
Copy link
Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Jan 14, 2025

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/1.15

The following branches have NOT changed:

  • development/1.13
  • development/1.14
  • development/1.4

Please check the status of the associated issue S3UTILS-184.

Goodbye kerkesni.

The following options are set: approve

@bert-e bert-e merged commit e82f3ea into development/1.15 Jan 14, 2025
10 checks passed
@bert-e bert-e deleted the bugfix/S3UTILS-184 branch January 14, 2025 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants