Skip to content

Commit 0e0222d

Browse files
committed
rgw/cloudtier: Restore object from cloud endpoint
1)Add functionality to restore cloud-transitioned objects on demand. Current commit has below - * Given <bucket,object>, fetch the object from the cloud endpoint. * if days provided and > 0, the restore is marked temporary with expiry date. * Without <days>, it is marked as permanent restore. 2)Use ObjectExpirer/delete_at attr to delete temp objects For temporarily restored objects, set delete_at attr to the expiration time. This will add those objects to ObjectExpirer list. Use LC worker thread to scan that list and delete expired objects. By delete here, it means to delete restored object data and reset HEAD object as Cloud-transitioned object as it was before restore. In addition below changes are done - * If temporary, object is still marked RGWObj::CloudTiered and mtime is set same as transition time. * If permanent, object is marked RGWObj::Main and mtime is set to restore time (now()). * rgw_restore_debug_interval option added to set configure restore Days (similar to rgw_lc_debug_interval) There is an issue with ObjectExpirer code where in if an object is added to ObjectExpirer list and is re-written, it is not deleted from the expirer list and hence the new object may get deleted. Fixed the same and also addressed minor review comments. 3)Design doc added 4) ObjCategory should be set to CloudTiered only for cloud-transitioned objects and temporarily restored objects. Permanent copies are to be treated as regular objects. Signed-off-by: Soumya Koduri <[email protected]>
1 parent 4df972a commit 0e0222d

20 files changed

+842
-23
lines changed

src/common/options/rgw.yaml.in

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,19 @@ options:
448448
services:
449449
- rgw
450450
with_legacy: true
451+
- name: rgw_restore_debug_interval
452+
type: int
453+
level: dev
454+
desc: The number of seconds that simulate one "day" in order to debug RGW CloudRestore.
455+
Do *not* modify for a production cluster.
456+
long_desc: For debugging RGW Cloud Restore, the number of seconds that are equivalent to
457+
one simulated "day". Values less than 1 are ignored and do not change Restore behavior.
458+
For example, during debugging if one wanted every 10 minutes to be equivalent to one day,
459+
then this would be set to 600, the number of seconds in 10 minutes.
460+
default: -1
461+
services:
462+
- rgw
463+
with_legacy: true
451464
- name: rgw_mp_lock_max_time
452465
type: int
453466
level: advanced

src/doc/rgw/cloud-restore.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# cloud-restore
2+
3+
## Introduction
4+
5+
[`cloud-transition`](https://docs.ceph.com/en/latest/radosgw/cloud-transition) feature enables data transition to a remote cloud service as part of Lifecycle Configuration via Storage Classes. However the transition is unidirectional; data cannot be transitioned back from the remote zone.
6+
7+
The `cloud-restore` feature enables restoration of those transitioned objects from the remote cloud S3 endpoints back into RGW.
8+
9+
The objects can be restored either by using S3 `restore-object` CLI or via `read-through`. The restored copies can be either temporary or permanent.
10+
11+
## S3 restore-object CLI
12+
The goal here is to implement minimal functionality of [`S3RestoreObject`](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) API so that users can restore the cloud transitioned objects.
13+
14+
```sh
15+
aws s3api restore-object \
16+
--bucket <value> \
17+
--key <value> ( can be object name or * for Bulk restore) \
18+
[--version-id <value>] \
19+
--restore-request (structure) {
20+
// for temporary restore
21+
{ "Days": integer, }
22+
// if Days not provided, it will be considered as permanent copy
23+
}
24+
```
25+
This CLI may be extended in future to include custom parameters (like target-bucket/storage-class etc) specific to RGW.
26+
27+
28+
## read-through
29+
As per the cloud-transition feature functionality, the cloud-transitioned objects cannot be read. `GET` on those objects fails with ‘InvalidObjectState’ error.
30+
31+
But using this restore feature, transitioned objects can be restored and read. New tier-config options `allow_read_through` and `read_through_restore_days` are added for the same. Only when `allow_read_through` is enabled, `GET` on the transitioned objects will restore the objects from the S3 endpoint.
32+
33+
Note: The object copy restored via `readthrough` is temporary and is retained only for the duration of `read_through_restore_days`.
34+
35+
## Design
36+
37+
* Similar to cloud-transition feature, this feature currently works for **only s3 compatible cloud endpoint**.
38+
* This feature works for only **cloud-transitioned objects**. In order to validate this, `retain_head_object` option should be set to true so that the object’s `HEAD` object can be verified before restoring the object.
39+
40+
* **Request flow:**
41+
* Once the `HEAD` object is verified, its cloudtier storage class config details are fetched.
42+
Note: Incase the cloudtier storage-class is deleted/updated, the object may not be restored.
43+
* RestoreStatus for the `HEAD` object is marked `RestoreAlreadyInProgress`
44+
* Object Restore is done asynchronously by issuing either S3 `GET` or S3 `RESTORE` request to the remote endpoint.
45+
* Once the object is restored, RestoreStaus is updated as `CloudRestored` and RestoreType is set to either `Temporary` or `Permanent`.
46+
* Incase the operation fails, RestoreStatus is marked as `RestoreFailed`.
47+
48+
* **New attrs:** Below are the new attrs being added
49+
* `user.rgw.restore-status`: <Restore operation Status>
50+
* `user.rgw.restore-type`: <Type of Restore>
51+
* `user.rgw.restored-at`: <Restoration Time>
52+
* `user.rgw.restore-expiry-date`: <Expiration time incase of temporary copies>
53+
* `user.rgw.cloudtier_storage_class`: <CloudTier storage class used in case of temporarily restored copies>
54+
```sh
55+
enum RGWRestoreStatus : uint8_t {
56+
None = 0,
57+
RestoreAlreadyInProgress = 1,
58+
CloudRestored = 2,
59+
RestoreFailed = 3
60+
};
61+
enum class RGWRestoreType : uint8_t {
62+
None = 0,
63+
Temporary = 1,
64+
Permanent = 2
65+
};
66+
```
67+
68+
* **Response:**
69+
* `S3 restore-object CLI` returns SUCCESS - either the 200 OK or 202 Accepted status code.
70+
* If the object is not previously restored, then RGW returns 202 Accepted in the response.
71+
* If the object is previously restored, RGW returns 200 OK in the response.
72+
* Special errors:
73+
Code: RestoreAlreadyInProgress ( Cause: Object restore is already in progress.)
74+
Code: ObjectNotFound (if Object is not found in cloud endpoint)
75+
Code: I/O error (for any other I/O errors during restore)
76+
* `GET request` continues to return an ‘InvalidObjectState’ error till the object is successfully restored.
77+
* S3 head-object can be used to verify if the restore is still in progress.
78+
* Once the object is restored, GET will return the object data.
79+
80+
81+
* **StorageClass**: By default, the objects are restored to `STANDARD` storage class. However, as per [AWS S3 Restore](https://docs.aws.amazon.com/cli/latest/reference/s3api/restore-object.html) the storage-class remains the same for restored objects. Hence for the temporary copies, the `x-amz-storage-class` returned contains original cloudtier storage-class.
82+
* Note: A new tier-config option may be added to select the storage-class to restore the objects to.
83+
84+
* **mtime**: If the restored object is temporary, object is still marked `RGWObj::CloudTiered` and mtime is not changed i.e, still set to transition time. But in case the object is permanent copy, it is marked `RGWObj::Main` and mtime is updated to the restore time (now()).
85+
86+
* **Lifecycle**:
87+
* `Temporary` copies are not subjected to any further transition to the cloud. However (as is the case with cloud-transitioned objects) they can be deleted via regular LC expiration rules or via external S3 Delete request.
88+
* `Permanent` copies are treated as any regular objects and are subjected to any LC rules applicable.
89+
90+
* **Replication**: The restored objects (both temporary and permanent) are also replicated like regular objects and will be deleted across the zones post expiration.
91+
92+
* **VersionedObjects** : In case of versioning, if any object is cloud-transitioned, it would have been non-current. Post restore too, the same non-current object will be updated with the downloaded data and its HEAD object will be updated accordingly as the case with regular objects.
93+
94+
* **Temporary Object Expiry**: This is done via Object Expirer
95+
* When the object is restored as temporary, `user.rgw.expiry-date` is set accordingly and `delete_at` attr is also updated with the same value.
96+
* This object is then added to the list used by `ObjectExpirer`.
97+
* `LC` worker thread is used to scan through that list and post expiry, resets the objects back to cloud-transitioned state i.e,
98+
* HEAD object with size=0
99+
* new attrs removed
100+
* `delete_at` reset
101+
* Note: A new RGW option `rgw_restore_debug_interval` is added, which when set will be considered as `Days` value (similar to `rgw_lc_debug_interval`).
102+
103+
* **FAILED Restore**: In case the restore operation fails,
104+
* The HEAD object will be updated accordingly.. i.e, Storage-class is reset to the original cloud-tier storage class
105+
* All the new attrs added will be removed , except for `user.rgw.restore-status` which will be updated as `RestoreFailed`
106+
107+
* **Check Restore Progress**: Users can issue S3 `head-object` request to check if the restore is done or still in progress for any object.
108+
109+
* **RGW down/restarts** - Since the restore operation is asynchronous, we need to keep track of the objects being restored. In case RGW is down/restarts, this data will be used to retrigger on-going restore requests or do appropriate cleanup for the failed requests.
110+
111+
* **Compression** - If the placement-target to which the objects are being restored to has compression enabled, the data will be compressed accordingly (bug2294512)
112+
113+
* **Encryption** - If the restored object is encrypted, the old sse-related xattrs/keys from the HEAD stub will be copied back into object metadata (bug2294512)
114+
115+
* **Delete cloud object post restore** - Once the object is successfully restored, the object at the remote endpoint is still retained. However we could choose to delete it for permanent restored copies by adding new tier-config option.
116+
117+
118+
## Future work
119+
120+
* **Bulk Restore**: In the case of BulkRestore, some of the objects may not be restored. User needs to manually cross-check the objects to check the objects restored or InProgress.
121+
122+
* **Admin CLIs**: Admin debug commands will be provided to start, check the status and cancel the restore operations.
123+
124+
* **Admin Ops**
125+
126+
* **Restore Notifications**
127+

src/rgw/driver/daos/rgw_sal_daos.cc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1028,6 +1028,22 @@ int DaosObject::transition_to_cloud(
10281028
return DAOS_NOT_IMPLEMENTED_LOG(dpp);
10291029
}
10301030

1031+
int DaosObject::restore_obj_from_cloud(Bucket* bucket,
1032+
rgw::sal::PlacementTier* tier,
1033+
rgw_placement_rule& placement_rule,
1034+
rgw_bucket_dir_entry& o,
1035+
CephContext* cct,
1036+
RGWObjTier& tier_config,
1037+
real_time& mtime,
1038+
uint64_t olh_epoch,
1039+
std::optional<uint64_t> days,
1040+
const DoutPrefixProvider* dpp,
1041+
optional_yield y,
1042+
uint32_t flags)
1043+
{
1044+
return DAOS_NOT_IMPLEMENTED_LOG(dpp);
1045+
}
1046+
10311047
bool DaosObject::placement_rules_match(rgw_placement_rule& r1,
10321048
rgw_placement_rule& r2) {
10331049
/* XXX: support single default zone and zonegroup for now */

src/rgw/driver/daos/rgw_sal_daos.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,6 +649,18 @@ class DaosObject : public StoreObject {
649649
CephContext* cct, bool update_object,
650650
const DoutPrefixProvider* dpp,
651651
optional_yield y) override;
652+
virtual int restore_obj_from_cloud(Bucket* bucket,
653+
rgw::sal::PlacementTier* tier,
654+
rgw_placement_rule& placement_rule,
655+
rgw_bucket_dir_entry& o,
656+
CephContext* cct,
657+
RGWObjTier& tier_config,
658+
real_time& mtime,
659+
uint64_t olh_epoch,
660+
std::optional<uint64_t> days,
661+
const DoutPrefixProvider* dpp,
662+
optional_yield y,
663+
uint32_t flags) override;
652664
virtual bool placement_rules_match(rgw_placement_rule& r1,
653665
rgw_placement_rule& r2) override;
654666
virtual int dump_obj_layout(const DoutPrefixProvider* dpp, optional_yield y,

src/rgw/driver/posix/rgw_sal_posix.cc

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3039,6 +3039,22 @@ int POSIXObject::transition_to_cloud(Bucket* bucket,
30393039
return -ERR_NOT_IMPLEMENTED;
30403040
}
30413041

3042+
int POSIXObject::restore_obj_from_cloud(Bucket* bucket,
3043+
rgw::sal::PlacementTier* tier,
3044+
rgw_placement_rule& placement_rule,
3045+
rgw_bucket_dir_entry& o,
3046+
CephContext* cct,
3047+
RGWObjTier& tier_config,
3048+
real_time& mtime,
3049+
uint64_t olh_epoch,
3050+
std::optional<uint64_t> days,
3051+
const DoutPrefixProvider* dpp,
3052+
optional_yield y,
3053+
uint32_t flags)
3054+
{
3055+
return -ERR_NOT_IMPLEMENTED;
3056+
}
3057+
30423058
bool POSIXObject::placement_rules_match(rgw_placement_rule& r1, rgw_placement_rule& r2)
30433059
{
30443060
return (r1 == r2);

src/rgw/driver/posix/rgw_sal_posix.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -681,6 +681,18 @@ class POSIXObject : public StoreObject {
681681
bool update_object,
682682
const DoutPrefixProvider* dpp,
683683
optional_yield y) override;
684+
virtual int restore_obj_from_cloud(Bucket* bucket,
685+
rgw::sal::PlacementTier* tier,
686+
rgw_placement_rule& placement_rule,
687+
rgw_bucket_dir_entry& o,
688+
CephContext* cct,
689+
RGWObjTier& tier_config,
690+
real_time& mtime,
691+
uint64_t olh_epoch,
692+
std::optional<uint64_t> days,
693+
const DoutPrefixProvider* dpp,
694+
optional_yield y,
695+
uint32_t flags) override;
684696
virtual bool placement_rules_match(rgw_placement_rule& r1, rgw_placement_rule& r2) override;
685697
virtual int dump_obj_layout(const DoutPrefixProvider *dpp, optional_yield y, Formatter* f) override;
686698
virtual int swift_versioning_restore(const ACLOwner& owner, const rgw_user& remote_user, bool& restored,

src/rgw/driver/rados/rgw_lc_tier.cc

Lines changed: 77 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include "rgw_common.h"
1515
#include "rgw_rest.h"
1616
#include "svc_zone.h"
17+
#include "rgw_rados.h"
1718

1819
#include <boost/algorithm/string/split.hpp>
1920
#include <boost/algorithm/string.hpp>
@@ -231,18 +232,38 @@ static void init_headers(map<string, bufferlist>& attrs,
231232
}
232233
}
233234

234-
/* Read object or just head from remote endpoint. For now initializes only headers,
235-
* but can be extended to fetch etag, mtime etc if needed.
235+
struct generic_attr {
236+
const char *http_header;
237+
const char *rgw_attr;
238+
};
239+
240+
/*
241+
* mapping between http env fields and rgw object attrs
242+
*/
243+
static const struct generic_attr generic_attrs[] = {
244+
{ "CONTENT_TYPE", RGW_ATTR_CONTENT_TYPE },
245+
{ "HTTP_CONTENT_LANGUAGE", RGW_ATTR_CONTENT_LANG },
246+
{ "HTTP_EXPIRES", RGW_ATTR_EXPIRES },
247+
{ "HTTP_CACHE_CONTROL", RGW_ATTR_CACHE_CONTROL },
248+
{ "HTTP_CONTENT_DISPOSITION", RGW_ATTR_CONTENT_DISP },
249+
{ "HTTP_CONTENT_ENCODING", RGW_ATTR_CONTENT_ENC },
250+
{ "HTTP_X_ROBOTS_TAG", RGW_ATTR_X_ROBOTS_TAG },
251+
{ "ETAG", RGW_ATTR_ETAG },
252+
};
253+
254+
/* Read object or just head from remote endpoint.
236255
*/
237-
static int cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
238-
std::map<std::string, std::string>& headers) {
256+
int rgw_cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
257+
std::map<std::string, std::string>& headers,
258+
real_time* pset_mtime, std::string& etag,
259+
uint64_t& accounted_size, rgw::sal::Attrs& attrs,
260+
void* cb) {
239261
RGWRESTConn::get_obj_params req_params;
240262
std::string target_obj_name;
241263
int ret = 0;
242264
rgw_lc_obj_properties obj_properties(tier_ctx.o.meta.mtime, tier_ctx.o.meta.etag,
243265
tier_ctx.o.versioned_epoch, tier_ctx.acl_mappings,
244266
tier_ctx.target_storage_class);
245-
std::string etag;
246267
RGWRESTStreamRWRequest *in_req;
247268

248269
rgw_bucket dest_bucket;
@@ -261,20 +282,57 @@ static int cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
261282
req_params.rgwx_stat = true;
262283
req_params.sync_manifest = true;
263284
req_params.skip_decrypt = true;
285+
req_params.cb = (RGWHTTPStreamRWRequest::ReceiveCB *)cb;
264286

265-
ret = tier_ctx.conn.get_obj(tier_ctx.dpp, dest_obj, req_params, true /* send */, &in_req);
266-
if (ret < 0) {
267-
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: " << __func__ << "(): conn.get_obj() returned ret=" << ret << dendl;
268-
return ret;
287+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): fetching object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl;
288+
289+
static constexpr int NUM_ENPOINT_IOERROR_RETRIES = 20;
290+
for (int tries = 0; tries < NUM_ENPOINT_IOERROR_RETRIES; tries++) {
291+
ret = tier_ctx.conn.get_obj(tier_ctx.dpp, dest_obj, req_params, true /* send */, &in_req);
292+
if (ret < 0) {
293+
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: " << __func__ << "(): conn.get_obj() returned ret=" << ret << dendl;
294+
return ret;
295+
}
296+
297+
/* fetch headers */
298+
// accounted_size in complete_request() reads from RGWX_OBJECT_SIZE which is set
299+
// only for internal ops/sync. So instead read from headers[CONTENT_LEN].
300+
// Same goes for pattrs.
301+
ret = tier_ctx.conn.complete_request(tier_ctx.dpp, in_req, &etag, pset_mtime, nullptr, nullptr, &headers, null_yield);
302+
if (ret < 0) {
303+
if (ret == -EIO && tries < NUM_ENPOINT_IOERROR_RETRIES - 1) {
304+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): failed to fetch object from remote. retries=" << tries << dendl;
305+
continue;
306+
}
307+
return ret;
308+
}
309+
break;
269310
}
270311

271-
/* fetch headers */
272-
ret = tier_ctx.conn.complete_request(tier_ctx.dpp, in_req, nullptr, nullptr, nullptr, nullptr, &headers, null_yield);
273-
if (ret < 0 && ret != -ENOENT) {
274-
ldpp_dout(tier_ctx.dpp, 20) << "ERROR: " << __func__ << "(): conn.complete_request() returned ret=" << ret << dendl;
275-
return ret;
312+
static map<string, string> generic_attrs_map;
313+
for (const auto& http2rgw : generic_attrs) {
314+
generic_attrs_map[http2rgw.http_header] = http2rgw.rgw_attr;
276315
}
277-
return 0;
316+
317+
for (auto header: headers) {
318+
const char* name = header.first.c_str();
319+
const string& val = header.second;
320+
bufferlist bl;
321+
bl.append(val.c_str(), val.size());
322+
323+
const auto aiter = generic_attrs_map.find(name);
324+
if (aiter != std::end(generic_attrs_map)) {
325+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << " Received attrs aiter->first = " << aiter->first << ", aiter->second = " << aiter->second << ret << dendl;
326+
attrs[aiter->second] = bl;
327+
}
328+
329+
if (header.first == "CONTENT_LENGTH") {
330+
accounted_size = atoi(val.c_str());
331+
}
332+
}
333+
334+
ldpp_dout(tier_ctx.dpp, 20) << __func__ << "(): Sucessfully fetched object from cloud bucket:" << dest_bucket << ", object: " << target_obj_name << dendl;
335+
return ret;
278336
}
279337

280338
static bool is_already_tiered(const DoutPrefixProvider *dpp,
@@ -1184,9 +1242,12 @@ static int cloud_tier_multipart_transfer(RGWLCCloudTierCtx& tier_ctx) {
11841242
static int cloud_tier_check_object(RGWLCCloudTierCtx& tier_ctx, bool& already_tiered) {
11851243
int ret;
11861244
std::map<std::string, std::string> headers;
1245+
std::string etag;
1246+
uint64_t accounted_size;
1247+
rgw::sal::Attrs attrs;
11871248

11881249
/* Fetch Head object */
1189-
ret = cloud_tier_get_object(tier_ctx, true, headers);
1250+
ret = rgw_cloud_tier_get_object(tier_ctx, true, headers, nullptr, etag, accounted_size, attrs, nullptr);
11901251

11911252
if (ret < 0) {
11921253
ldpp_dout(tier_ctx.dpp, 0) << "ERROR: failed to fetch HEAD from cloud for obj=" << tier_ctx.obj << " , ret = " << ret << dendl;

src/rgw/driver/rados/rgw_lc_tier.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,9 @@ struct RGWLCCloudTierCtx {
4949

5050
/* Transition object to cloud endpoint */
5151
int rgw_cloud_tier_transfer_object(RGWLCCloudTierCtx& tier_ctx, std::set<std::string>& cloud_targets);
52+
53+
int rgw_cloud_tier_get_object(RGWLCCloudTierCtx& tier_ctx, bool head,
54+
std::map<std::string, std::string>& headers,
55+
real_time* pset_mtime, std::string& etag,
56+
uint64_t& accounted_size, rgw::sal::Attrs& attrs,
57+
void* cb);

src/rgw/driver/rados/rgw_object_expirer_core.cc

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -219,13 +219,9 @@ int RGWObjectExpirer::garbage_single_object(const DoutPrefixProvider *dpp, objex
219219
}
220220

221221
rgw_obj_key key = hint.obj_key;
222-
if (key.instance.empty()) {
223-
key.instance = "null";
224-
}
225222

226223
std::unique_ptr<rgw::sal::Object> obj = bucket->get_object(key);
227-
obj->set_atomic();
228-
ret = obj->delete_object(dpp, null_yield, rgw::sal::FLAG_LOG_OP, nullptr, nullptr);
224+
ret = static_cast<rgw::sal::RadosObject*>(obj.get())->handle_obj_expiry(dpp, null_yield);
229225

230226
return ret;
231227
}

0 commit comments

Comments
 (0)