Skip to content

Commit d53ffb1

Browse files
authored
Merge pull request ceph#59311 from soumyakoduri/wip-skoduri-cloud-restore
rgw/cloudtier: initial MVP of the feature RestoreObject from cloud Reviewed-by: Daniel Gryniewicz <[email protected]> Reviewed-by: Matt Benjamin <[email protected]>
2 parents f185578 + 1b66bc5 commit d53ffb1

28 files changed

+1228
-54
lines changed

src/common/options/rgw.yaml.in

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,19 @@ options:
448448
services:
449449
- rgw
450450
with_legacy: true
451+
- name: rgw_restore_debug_interval
452+
type: int
453+
level: dev
454+
desc: The number of seconds that simulate one "day" in order to debug RGW CloudRestore.
455+
Do *not* modify for a production cluster.
456+
long_desc: For debugging RGW Cloud Restore, the number of seconds that are equivalent to
457+
one simulated "day". Values less than 1 are ignored and do not change Restore behavior.
458+
For example, during debugging if one wanted every 10 minutes to be equivalent to one day,
459+
then this would be set to 600, the number of seconds in 10 minutes.
460+
default: -1
461+
services:
462+
- rgw
463+
with_legacy: true
451464
- name: rgw_mp_lock_max_time
452465
type: int
453466
level: advanced

src/doc/rgw/cloud-restore.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# cloud-restore
2+
3+
## Introduction
4+
5+
[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone.
6+
7+
The `cloud-restore` feature enables restoration of those transitioned objects from the remote cloud S3 endpoints back into RGW.
8+
9+
The objects can be restored either by using S3 `restore-object` CLI or via `read-through`. The restored copies can be either temporary or permanent.
10+
11+
## S3 restore-object CLI
12+
13+
The goal here is to implement minimal functionality of [`S3RestoreObject`](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) API so that users can restore the cloud transitioned objects.
14+
15+
```sh
16+
aws s3api restore-object \
17+
--bucket <value> \
18+
--key <value> ( can be object name or * for Bulk restore) \
19+
[--version-id <value>] \
20+
--restore-request (structure) {
21+
// for temporary restore
22+
{ "Days": integer, }
23+
// if Days not provided, it will be considered as permanent copy
24+
}
25+
```
26+
27+
This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW.
28+
29+
## read-through
30+
31+
As per the cloud-transition feature functionality, the cloud-transitioned objects cannot be read. `GET` on those objects fails with ‘InvalidObjectState’ error.
32+
33+
But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint.
34+
35+
Note: The object copy restored via `readthrough` is temporary and is retained only for the duration of `read_through_restore_days`.
36+
37+
## Design
38+
39+
* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**.
40+
* This feature works for only **cloud-transitioned objects**. In order to validate this, `retain_head_object` option should be set to true so that the object’s `HEAD` object can be verified before restoring the object.
41+
42+
* **Request flow:**
43+
* Once the `HEAD` object is verified, its cloudtier storage class config details are fetched.
44+
Note: Incase the cloudtier storage-class is deleted/updated, the object may not be restored.
45+
* RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress`
46+
* Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint.
47+
* Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`.
48+
* Incase the operation fails, RestoreStatus is marked as `RestoreFailed`.
49+
50+
* **New attrs:** Below are the new attrs being added
51+
* `user.rgw.restore-status`: <Restore operation Status>
52+
* `user.rgw.restore-type`: <Type of Restore>
53+
* `user.rgw.restored-at`: <Restoration Time>
54+
* `user.rgw.restore-expiry-date`: <Expiration time incase of temporary copies>
55+
* `user.rgw.cloudtier_storage_class`: <CloudTier storage class used in case of temporarily restored copies>
56+
57+
```cpp
58+
enum RGWRestoreStatus : uint8_t {
59+
None = 0,
60+
RestoreAlreadyInProgress = 1,
61+
CloudRestored = 2,
62+
RestoreFailed = 3
63+
};
64+
enum class RGWRestoreType : uint8_t {
65+
None = 0,
66+
Temporary = 1,
67+
Permanent = 2
68+
};
69+
```
70+
71+
* **Response:**
72+
* `S3 restore-object CLI` returns SUCCESS - either the 200 OK or 202 Accepted status code.
73+
* If the object is not previously restored, then RGW returns 202 Accepted in the response.
74+
* If the object is previously restored, RGW returns 200 OK in the response.
75+
* Special errors:
76+
Code: RestoreAlreadyInProgress ( Cause: Object restore is already in progress.)
77+
Code: ObjectNotFound (if Object is not found in cloud endpoint)
78+
Code: I/O error (for any other I/O errors during restore)
79+
* `GET request` continues to return an ‘InvalidObjectState’ error till the object is successfully restored.
80+
* S3 head-object can be used to verify if the restore is still in progress.
81+
* Once the object is restored, GET will return the object data.
82+
83+
* **StorageClass**: By default, the objects are restored to `STANDARD` storage class. However, as per [AWS S3 Restore](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) the storage-class remains the same for restored objects. Hence for the temporary copies, the `x-amz-storage-class` returned contains original cloudtier storage-class.
84+
* Note: A new tier-config option may be added to select the storage-class to restore the objects to.
85+
86+
* **mtime**: If the restored object is temporary, object is still marked `RGWObj::CloudTiered` and mtime is not changed i.e, still set to transition time. But in case the object is permanent copy, it is marked `RGWObj::Main` and mtime is updated to the restore time (now()).
87+
88+
* **Lifecycle**:
89+
* `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request.
90+
* `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable.
91+
92+
* **Replication**: The restored objects (both temporary and permanent) are also replicated like regular objects and will be deleted across the zones post expiration.
93+
94+
* **VersionedObjects** : In case of versioning, if any object is cloud-transitioned, it would have been non-current. Post restore too, the same non-current object will be updated with the downloaded data and its HEAD object will be updated accordingly as the case with regular objects.
95+
96+
* **Temporary Object Expiry**: This is done via Object Expirer
97+
* When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value.
98+
* This object is then added to the list used by `ObjectExpirer`.
99+
* `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e,
100+
* HEAD object with size=0
101+
* new attrs removed
102+
* `delete_at` reset
103+
* Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`).
104+
105+
* **FAILED Restore**: In case the restore operation fails,
106+
* The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class
107+
* All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed`
108+
109+
* **Check Restore Progress**: Users can issue S3 `head-object` request to check if the restore is done or still in progress for any object.
110+
111+
* **RGW down/restarts** - Since the restore operation is asynchronous, we need to keep track of the objects being restored. In case RGW is down/restarts, this data will be used to retrigger on-going restore requests or do appropriate cleanup for the failed requests.
112+
113+
* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512)
114+
115+
* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512)
116+
117+
* **Delete cloud object post restore** - Once the object is successfully restored, the object at the remote endpoint is still retained. However we could choose to delete it for permanent restored copies by adding new tier-config option.
118+
119+
## Future work
120+
121+
* **Bulk Restore**: In the case of BulkRestore, some of the objects may not be restored. User needs to manually cross-check the objects to check the objects restored or InProgress.
122+
123+
* **Admin CLIs**: Admin debug commands will be provided to start, check the status and cancel the restore operations.
124+
125+
* **Admin Ops**
126+
127+
* **Restore Notifications**

src/rgw/driver/daos/rgw_sal_daos.cc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1028,6 +1028,22 @@ int DaosObject::transition_to_cloud(
10281028
return DAOS_NOT_IMPLEMENTED_LOG(dpp);
10291029
}
10301030

1031+
int DaosObject::restore_obj_from_cloud(Bucket* bucket,
1032+
rgw::sal::PlacementTier* tier,
1033+
rgw_placement_rule& placement_rule,
1034+
rgw_bucket_dir_entry& o,
1035+
CephContext* cct,
1036+
RGWObjTier& tier_config,
1037+
real_time& mtime,
1038+
uint64_t olh_epoch,
1039+
std::optional<uint64_t> days,
1040+
const DoutPrefixProvider* dpp,
1041+
optional_yield y,
1042+
uint32_t flags)
1043+
{
1044+
return DAOS_NOT_IMPLEMENTED_LOG(dpp);
1045+
}
1046+
10311047
bool DaosObject::placement_rules_match(rgw_placement_rule& r1,
10321048
rgw_placement_rule& r2) {
10331049
/* XXX: support single default zone and zonegroup for now */

src/rgw/driver/daos/rgw_sal_daos.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,6 +649,18 @@ class DaosObject : public StoreObject {
649649
CephContext* cct, bool update_object,
650650
const DoutPrefixProvider* dpp,
651651
optional_yield y) override;
652+
virtual int restore_obj_from_cloud(Bucket* bucket,
653+
rgw::sal::PlacementTier* tier,
654+
rgw_placement_rule& placement_rule,
655+
rgw_bucket_dir_entry& o,
656+
CephContext* cct,
657+
RGWObjTier& tier_config,
658+
real_time& mtime,
659+
uint64_t olh_epoch,
660+
std::optional<uint64_t> days,
661+
const DoutPrefixProvider* dpp,
662+
optional_yield y,
663+
uint32_t flags) override;
652664
virtual bool placement_rules_match(rgw_placement_rule& r1,
653665
rgw_placement_rule& r2) override;
654666
virtual int dump_obj_layout(const DoutPrefixProvider* dpp, optional_yield y,

src/rgw/driver/posix/rgw_sal_posix.cc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3039,6 +3039,22 @@ int POSIXObject::transition_to_cloud(Bucket* bucket,
30393039
return -ERR_NOT_IMPLEMENTED;
30403040
}
30413041

3042+
int POSIXObject::restore_obj_from_cloud(Bucket* bucket,
3043+
rgw::sal::PlacementTier* tier,
3044+
rgw_placement_rule& placement_rule,
3045+
rgw_bucket_dir_entry& o,
3046+
CephContext* cct,
3047+
RGWObjTier& tier_config,
3048+
real_time& mtime,
3049+
uint64_t olh_epoch,
3050+
std::optional<uint64_t> days,
3051+
const DoutPrefixProvider* dpp,
3052+
optional_yield y,
3053+
uint32_t flags)
3054+
{
3055+
return -ERR_NOT_IMPLEMENTED;
3056+
}
3057+
30423058
bool POSIXObject::placement_rules_match(rgw_placement_rule& r1, rgw_placement_rule& r2)
30433059
{
30443060
return (r1 == r2);

src/rgw/driver/posix/rgw_sal_posix.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -681,6 +681,18 @@ class POSIXObject : public StoreObject {
681681
bool update_object,
682682
const DoutPrefixProvider* dpp,
683683
optional_yield y) override;
684+
virtual int restore_obj_from_cloud(Bucket* bucket,
685+
rgw::sal::PlacementTier* tier,
686+
rgw_placement_rule& placement_rule,
687+
rgw_bucket_dir_entry& o,
688+
CephContext* cct,
689+
RGWObjTier& tier_config,
690+
real_time& mtime,
691+
uint64_t olh_epoch,
692+
std::optional<uint64_t> days,
693+
const DoutPrefixProvider* dpp,
694+
optional_yield y,
695+
uint32_t flags) override;
684696
virtual bool placement_rules_match(rgw_placement_rule& r1, rgw_placement_rule& r2) override;
685697
virtual int dump_obj_layout(const DoutPrefixProvider *dpp, optional_yield y, Formatter* f) override;
686698
virtual int swift_versioning_restore(const ACLOwner& owner, const rgw_user& remote_user, bool& restored,

src/rgw/driver/rados/rgw_lc_tier.cc

Lines changed: 77 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include "rgw_common.h"
1515
#include "rgw_rest.h"
1616
#include "svc_zone.h"
17+
#include "rgw_rados.h"
1718

1819
#include <boost/algorithm/string/split.hpp>
1920
#include <boost/algorithm/string.hpp>
@@ -231,18 +232,38 @@ static void init_headers(map<string, bufferlist>& attrs,
231232
}
232233
}
233234

234-
/* Read object or just head from remote endpoint. For now initializes only headers,
235-
* but can be extended to fetch etag, mtime etc if needed.
235+
struct generic_attr {
236+
const char *http_header;
237+
const char *rgw_attr;
238+
};
239+
240+
/*
241+
* mapping between http env fields and rgw object attrs
242+
*/
243+
static const struct generic_attr generic_attrs[] = {
244+
{ "CONTENT_TYPE", RGW_ATTR_CONTENT_TYPE },
245+
{ "HTTP_CONTENT_LANGUAGE", RGW_ATTR_CONTENT_LANG },
246+
{ "HTTP_EXPIRES", RGW_ATTR_EXPIRES },
247+
{ "HTTP_CACHE_CONTROL", RGW_ATTR_CACHE_CONTROL },
248+
{ "HTTP_CONTENT_DISPOSITION", RGW_ATTR_CONTENT_DISP },
249+
{ "HTTP_CONTENT_ENCODING", RGW_ATTR_CONTENT_ENC },
250+
{ "HTTP_X_ROBOTS_TAG", RGW_ATTR_X_ROBOTS_TAG },
251+
{ "ETAG", RGW_ATTR_ETAG },
252+
};
253+
254+
/* Read object or just head from remote endpoint.
236255
*/
237-
static int cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
238-
std::map<std::string, std::string>& headers) {
256+
int rgw_cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
257+
std::map<std::string, std::string>& headers,
258+
real_time* pset_mtime, std::string& etag,
259+
uint64_t& accounted_size, rgw::sal::Attrs& attrs,
260+
void* cb) {
239261
RGWRESTConn::get_obj_params req_params;
240262
std::string target_obj_name;
241263
int ret = 0;
242264
rgw_lc_obj_properties obj_properties(tier_ctx.o.meta.mtime, tier_ctx.o.meta.etag,
243265
tier_ctx.o.versioned_epoch, tier_ctx.acl_mappings,
244266
tier_ctx.target_storage_class);
245-
std::string etag;
246267
RGWRESTStreamRWRequest *in_req;
247268

248269
rgw_bucket dest_bucket;
@@ -261,20 +282,57 @@ static int cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
261282
req_params.rgwx_stat = true;
262283
req_params.sync_manifest = true;
263284
req_params.skip_decrypt = true;
285+
req_params.cb = (RGWHTTPStreamRWRequest::ReceiveCB *)cb;
264286

265-
ret = tier_ctx.conn.get_obj(tier_ctx.dpp, dest_obj, req_params, true /* send */, &in_req);
266-
if (ret < 0) {
267-
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: " << __func__ << "(): conn.get_obj() returned ret=" << ret << dendl;
268-
return ret;
287+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): fetching object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl;
288+
289+
static constexpr int NUM_ENPOINT_IOERROR_RETRIES = 20;
290+
for (int tries = 0; tries < NUM_ENPOINT_IOERROR_RETRIES; tries++) {
291+
ret = tier_ctx.conn.get_obj(tier_ctx.dpp, dest_obj, req_params, true /* send */, &in_req);
292+
if (ret < 0) {
293+
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: " << __func__ << "(): conn.get_obj() returned ret=" << ret << dendl;
294+
return ret;
295+
}
296+
297+
/* fetch headers */
298+
// accounted_size in complete_request() reads from RGWX_OBJECT_SIZE which is set
299+
// only for internal ops/sync. So instead read from headers[CONTENT_LEN].
300+
// Same goes for pattrs.
301+
ret = tier_ctx.conn.complete_request(tier_ctx.dpp, in_req, &etag, pset_mtime, nullptr, nullptr, &headers, null_yield);
302+
if (ret < 0) {
303+
if (ret == -EIO && tries < NUM_ENPOINT_IOERROR_RETRIES - 1) {
304+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): failed to fetch object from remote. retries=" << tries << dendl;
305+
continue;
306+
}
307+
return ret;
308+
}
309+
break;
269310
}
270311

271-
/* fetch headers */
272-
ret = tier_ctx.conn.complete_request(tier_ctx.dpp, in_req, nullptr, nullptr, nullptr, nullptr, &headers, null_yield);
273-
if (ret < 0 && ret != -ENOENT) {
274-
ldpp_dout(tier_ctx.dpp, 20) << "ERROR: " << __func__ << "(): conn.complete_request() returned ret=" << ret << dendl;
275-
return ret;
312+
static map<string, string> generic_attrs_map;
313+
for (const auto& http2rgw : generic_attrs) {
314+
generic_attrs_map[http2rgw.http_header] = http2rgw.rgw_attr;
276315
}
277-
return 0;
316+
317+
for (auto header: headers) {
318+
const char* name = header.first.c_str();
319+
const string& val = header.second;
320+
bufferlist bl;
321+
bl.append(val.c_str(), val.size());
322+
323+
const auto aiter = generic_attrs_map.find(name);
324+
if (aiter != std::end(generic_attrs_map)) {
325+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << " Received attrs aiter->first = " << aiter->first << ", aiter->second = " << aiter->second << ret << dendl;
326+
attrs[aiter->second] = bl;
327+
}
328+
329+
if (header.first == "CONTENT_LENGTH") {
330+
accounted_size = atoi(val.c_str());
331+
}
332+
}
333+
334+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): Sucessfully fetched object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl;
335+
return ret;
278336
}
279337

280338
static bool is_already_tiered(const DoutPrefixProvider *dpp,
@@ -1184,9 +1242,12 @@ static int cloud_tier_multipart_transfer(RGWLCCloudTierCtx& tier_ctx) {
11841242
static int cloud_tier_check_object(RGWLCCloudTierCtx& tier_ctx, bool& already_tiered) {
11851243
int ret;
11861244
std::map<std::string, std::string> headers;
1245+
std::string etag;
1246+
uint64_t accounted_size;
1247+
rgw::sal::Attrs attrs;
11871248

11881249
/* Fetch Head object */
1189-
ret = cloud_tier_get_object(tier_ctx, true, headers);
1250+
ret = rgw_cloud_tier_get_object(tier_ctx, true, headers, nullptr, etag, accounted_size, attrs, nullptr);
11901251

11911252
if (ret < 0) {
11921253
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: failed to fetch HEAD from cloud for obj=" << tier_ctx.obj << " , ret = " << ret << dendl;

src/rgw/driver/rados/rgw_lc_tier.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,9 @@ struct RGWLCCloudTierCtx {
4949

5050
/* Transition object to cloud endpoint */
5151
int rgw_cloud_tier_transfer_object(RGWLCCloudTierCtx& tier_ctx, std::set<std::string>& cloud_targets);
52+
53+
int rgw_cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
54+
std::map<std::string, std::string>& headers,
55+
real_time* pset_mtime, std::string& etag,
56+
uint64_t& accounted_size, rgw::sal::Attrs& attrs,
57+
void* cb);

src/rgw/driver/rados/rgw_object_expirer_core.cc

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -219,13 +219,9 @@ int RGWObjectExpirer::garbage_single_object(const DoutPrefixProvider *dpp, objex
219219
}
220220

221221
rgw_obj_key key = hint.obj_key;
222-
if (key.instance.empty()) {
223-
key.instance = "null";
224-
}
225222

226223
std::unique_ptr<rgw::sal::Object> obj = bucket->get_object(key);
227-
obj->set_atomic();
228-
ret = obj->delete_object(dpp, null_yield, rgw::sal::FLAG_LOG_OP, nullptr, nullptr);
224+
ret = static_cast<rgw::sal::RadosObject*>(obj.get())->handle_obj_expiry(dpp, null_yield);
229225

230226
return ret;
231227
}

0 commit comments

Comments
 (0)