Skip to content

YapDatabaseCloudCore

Robbie Hanson edited this page Oct 26, 2016 · 14 revisions

I can sync to that.

YapDatabaseCloudCore is a general cloud syncing system that allows you to support a wide variety of cloud systems. For example: Dropbox, Box, Google Drive, Amazon Cloud Drive, ownCloud ... (you get the idea)

Overview

There are a lot of cloud services out there. And from a consumer's perspective, there's a lot of variation. Some services integrate better with different operating systems. Some are more tailored for enterprise. And then there's pricing.

But what about from a developer's perspective? What if we ignored the client apps, and the pricing tiers, and the marketing? What if we just looked at the developer API's for each cloud service? What would we find?

We'd find an awful lot of similarity. All of these services offer a REST API. Most are file based (although some are record based). And they all provide some kind of revision system. For example, if we're going to upload a modified version of a file, our HTTP request might have this header field: "If-Match: PreviousETag". And the server would reject us if we're out-of-date. (Just like a git system would reject a "push", if we're out-of-date and need to do a "pull" first.)

YapDatabaseCloudCore is cloud service agnostic. That is, it's designed to be able to support just about any type of cloud service you can throw at it. And in addition to objects (key/value pairs), it also supports regular files (jpg, pdf, mp4, etc).

Operation Based

A common misperception is that a record-based cloud service is required in order to support syncing objects (records with key/value pairs). This is untrue. And YapDatabaseCloudCore helps you to support either record-based cloud services OR file-based cloud services.

There's a whole bunch of (cheap & efficient) file-based sync services out there. But developers primarily work with objects. And objects aren't files, right? ... But they could be. We already know how to serialize our objects for storage in a database. It's not difficult to extend this concept of serialization for storage in a file in the cloud. We could use JSON, or XML, or protocol buffers, or something custom. So, theoretically, we could store all of our objects as individual files in the cloud.

Locally we're still using a database. We just need to adapt the object for storage in the cloud, in a format which the cloud service supports.

You'll note that this is true for file-based cloud services, as well as record-based cloud services. For example, many record-based cloud services only support JSON, or only support a handful of data types. Thus, to store something like a color in the cloud, you'd need to convert it to/from a string.

YapDatabaseCloudCore assits you by helping you store a "queue of operations". That is, an ordered list of changes that you need to push to the server. You have all the control in terms of what information gets stored in an "operation", and you're the one that's responsible for executing the "operation". Thus just about any kind of cloud service can be supported.

Basics

Imagine we want to support FooBarCloud, the latest imaginary cloud service provider. Since YapDatabaseCloudCore is kind of a "bring your own cloud API" system, we can support it.

To break this down, let's go through all the various steps one would need to take in order to sync an object to the FooBarCloud. For example, let's say we have a contact book app, and the user has just modified the firstName of a contact, and hit 'Save'. This translates into a modified MYContact object that gets saved into the local database. Here's everything that needs to happen:

  • Within the same atomic transaction that modified the MYContact object, we need to record the fact that this contact needs to be pushed to the cloud.
  • Also within the same atomic transaction, we should record what changes were made. So something like: "old:{firstName: Robby}, new:{firstName: Robbie}". (This will allow us to properly merge changes in the event of a conflict. More on this topic later.)
  • After the transaction, we need to generate the file-based representation of the contact object.
  • Then we need to fetch, from our local database, the latest revision tag for the contact. Since FooBarCloud uses the etag system, this means the latest known etag we have for the corresponding URL.
  • Then we need to perform the proper HTTP REST call to upload the file-based representation to the cloud. For our FooBarCloud, this would mean performing a PUT to /contacts/the_contact_uuid.json. (And we have to specify: If-Match: PreviousETag)
  • Assuming we get a 200 OK response from the server, we need to execute a new atomic transaction to remove the "flag" in the database that says the upload needs to be performed. And remove the saved information regarding what changed. (The "old:{}, new:{}" stuff.)
  • Also in this same transaction we need to record the new etag.

So ... that's a lot of stuff that needs to happen. But you'll notice that the actual HTTP request was only a small portion of it. And in fact, if you switched from FooBarCloud to MooCowCloud, the only thing that would change would be the HTTP request.

So all that other stuff above that isn't the HTTP request, that's exactly what YapDatabaseCloudCore is for.

Operations

At the heart of the system is YapDatabaseCloudCoreOperation. When you need to perform some REST operation, you record that fact with one of these objects. That is, you use a YapDatabaseCloudCoreOperation instance to record whatever information you'll need in order to perform the REST operation in the future. And this instance gets stored in the database. So even if the user quits the app, and relaunches it tomorrow, the operation instance will automatically be restored.

Think about it like this: In a magical ideal world, the user's device would always be connected to the Internet. And the connection would be so fast, every network operation would be instantaneous. And there would never be upload errors or merge conflicts. And there would be rainbows and unicorns and rivers made out of chocolate. But in the real world we have to accept certain facts. The user might not have an Internet connection. And it might be slow. And there will be a delay between the moment we save something in the local database, and when that information hits the cloud. And there will be interrupted uploads and merge conflicts and death and taxes.

Long story short: You cannot simply perform the REST operation at the moment you need to, because you cannot guarantee it will succeed. Instead you need to record information about the REST operation that needs to be performed, wait for it to succeed (resolving conflicts as neccessary), and then delete the recorded operation. And YapDatabaseCloudCoreOperation is what helps you perform this task.

There's a lot more to be said about operations (dependencies, priorities, graphs, pipelines, etc). But let's start with the basics.

HandlerBlock

The handlerBlock is one mechanism that can be used to create operations. (There are others.) That is, as you make changes to your own custom data model objects, you can use the handlerBlock to tell YapDatabaseCloudCore about the REST operations that need to be performed. Here's the general idea:

  • You update an object in the database via the normal setObject:forKey:inCollection method
  • Since YapDatabaseCloudCore is an extension, it's automatically notified that you modified an object
  • YapDatabaseCloudCore then invokes your handlerBlock, passes you the modified object, and asks you to create any needed operations.
  • Afterwards, the extension will automatically process the operation(s) you created, save them to the database, and notify you when they're ready to be executed.

This all sounds very abstract, so let's jump ahead a few steps and show some code. Then we'll go over the code and fill in all the details.

YapDatabaseCloudCoreHandler *handler =
  [YapDatabaseCloudCoreHandler withObjectBlock:
  ^(YapDatabaseReadTransaction *transaction,
    NSMutableArray *operations,
    NSString *collection, NSString *key, id object)
{
	// We're only syncing todo items (for now)
	if (![object isKindOfClass:[MyTodo class]])
	{
		// We don't sync this type of object.
		return;
	}
	
	MyTodo *todo = (MyTodo *)object;
	
	if (!todo.hasChangedCloudProperties)
	{
		// Nothing changed that affects the cloud file.
		return;
	}
	
	NSString *path = todo.cloudPath;
	YapDatabaseCloudCoreRecordOperation *operation =
	  [YapDatabaseCloudCoreRecordOperation uploadWithCloudPath:path];
	operation.originalValues = todo.originalCloudValues;
	operation.updatedValues = todo.updatedCloudValues;
	operation.persistentUserInfo = @{ @"todoID": todo.uuid };

	[operations addObject:operation];
}];

The handlerBlock is where you hand-off operations to YapDatabaseCloudCore. If this was english, and not code, the handlerBlock would be the following conversation:

  • Hey, I noticed that you just modified that Todo item.
  • Are there any corresponding REST operations that need to be performed?
  • If so, just fill them out, hand them to me, and I'll take care of storing them, sorting them, and letting you know when its safe to perform them.

If you inspected the code carefully, you probably have a rather important question at this point:

How do I know which properties were changed on an object, within the context of the handlerBlock ?

It is your responsibility to provide this functionality. But rather than throwing this over the fence at you, I will at least provide sample code that you can integrate into your project in order to easily achieve this functionality: MyDatabaseObject class. This is a base class that you can use to make things incredibly easy for you. But, of course, you're under no obligation to use it. As you might recall from the Storing Objects wiki page, YapDatabase doesn't care what kind of objects you use. As long as you can serialize & deserialize the object, YapDatabase will happily store it for you. In other words, MyDatabaseObject is just one possible solution. You're welcome to substitute your own.

Alternative to HandlerBlock

Do I have to use the handlerBlock? In my situation it would be easier if I could just straight hand you a YDBCloudCoreOperation myself.

That's supported too !

I mentioned the handlerBlock first because that's usually the simplest system for generating operations. But it's not always feasible or optimal. There are situations in which you may be storing objects you didn't design. Or delicate multi-step operations. Or perhaps there are some weird migration issues you need to handle. Or you just have some development cycle tasks that you need to "once-off" and be done.

And so the API allows you to directly queue a YDBCloudCoreOperation whenever you want:

@interface YapDatabaseCloudCoreTransaction
- (BOOL)addOperation:(YapDatabaseCloudCoreOperation *)operation;
...

Operation Order

When syncing objects to a file-based cloud service provider, one of the most difficult tasks is properly ordering all the operations.

Your local database (such as YapDatabase) supports atomic transactions. This means you can change multiple objects simultaneously, and save them all to the database in one single atomic transaction. Your cloud service provider, however, likely does NOT support atomic transactions involving multiple files.

This can make things complicated when our objects:

  • have references to other objects
  • and those references are expected to be valid

For example, a new PurchaseOrder object may point to a new Customer object. With a local database, we can store both the new PurchaseOrder object and new Customer object within the same atomic transaction. But when pushing these objects to the cloud, we can only push one at a time. So, in this case, we'd like to push the Customer object first, and then the PurchaseOrder.

YapDatabaseCloudCore solves these problems using 3 technologies:

  • Operation dependencies
  • Graphs
  • Pipelines

Operation Dependencies

Every operation can be assigned a set of dependencies. That is, you can specify that operationB depends on operationA.

NSString *pathA = @"/files/topSecret.txt.key";
NSString *pathB = @"/files/topSecret.txt.encrypted";

opA = [YapDatabaseCloudCoreFileOperation
                     uploadWithCloudPath:pathA];
opB = [YapDatabaseCloudCoreFileOperation
                     uploadWithCloudPath:pathB];

[opB addDependency:opA.uuid];

The system will then ensure that opB is not started until opA has completed.

Graphs

Every database commit may generate zero or more YDBCloudCoreOperations. If one or more operations are created, then a YapDatabaseCloudCoreGraph instance is created to manage all the operations (for the commit). The graph will take into account each operation's dependencies (and priorities), and it will conceptually create a graph of the operations.

For example, say the following operations are created:

  • opA (priority=100, dependencies={})
  • opB (priority=0, dependencies={})
  • opC (priority=0, dependencies={opB})
  • opD (priority=0, dependencies={opB})

The graph will deduce:

  • opA should be executed first, as it has no dependencies and the highest priority
  • opB can be executed next. opB does NOT need to wait for opA to finish. So opB can be executed in parallel (while opA is still in flight).
  • opC & opD cannot be started until opB completes.
  • opC & opD can be executed in any order, and can execute in parallel.

Further, you do NOT need to worry about dependencies between commits. (For example, if objectA was created in commit #4, and objectB was created in commit #5, and objectB references objectA...) This is a non-issue in YapDatabaseCloudCore, because each commit gets its own graph. And the graph for commit #4 MUST complete in its entirety before the graph for commit #5 can start.

(In the future, there will be advanced alternative systems without this constraint. But this is how it currently functions, and this functionality will remain available in the future.)

Pipelines

A pipeline is simply an array of graphs. The pipeline ensures that graph4 (representing all operations from commit #4) completes before graph5 (representing all operations from commit #5) is started.

Every pipeline has a delegate (that's you), which must implement a single method:

@protocol YapDatabaseCloudCorePipelineDelegate
@required

- (void)startOperation:(YapDatabaseCloudCoreOperation *)operation
           forPipeline:(YapDatabaseCloudCorePipeline *)pipeline;

@end

So all you have to do is execute the operation when it's handed to you via this delegate method. (The pipeline & graph ensure that it's safe to execute the operation before it's handed to you.)

And since the pipeline supports executing operations in parallel, you can configure the pipeline with a 'maxConcurrentOperationCount'

/**
 * This value is the maximum number of operations that will be
 * assigned to the delegate at any one time.
 * 
 * The pipeline keeps track of operations that have been assigned to
 * the delegate (via startOperation:forPipeline:), and will delay
 * assigning any more operations once the maxConcurrentOperationCount
 * has been reached.
 * 
 * Once an operation is completed (or skipped), the pipeline will
 * automatically resume.
 * 
 * Of course, the delegate is welcome to perform its own concurrency
 * restriction. For example, via
 * NSURLSessionConfiguration.HTTPMaximumConnectionsPerHost.
 * In which case it may simply set this to a high enough value that
 * it won't interfere with its own implementation.
 * 
 * This value may be changed at anytime.
 *
 * The default value is 8.
**/
@property (atomic, assign, readwrite) NSInteger maxConcurrentOperationCount;

Most applications will simply use a single pipeline (the default pipeline). However, YapDatabaseCloudCore supports multiple pipelines, which opens up some interesting possibilities.

For example, let's say that we're making a recipe app. Which means that we're syncing Recipe objects and Photos (that the user takes of the prepared recipe).

Uploading a recipe is quick, as the recipe object/file is rather small. However, uploading a photo of the recipe is going to take a lot longer. Since we want to store full-size photos, this means we're uploading several megabyte files. This isn't a problem, but it may have an effect on our syncing.

For example, imagine the user performs the following actions (in this order):

  • adds a photo to a recipe
  • creates a new recipe (appetizer)
  • creates a new recipe (cookies)

This results in 3 new operations:

  • upload photo (8 megabytes) (graph #1)
  • upload new recipe (appetizer) (graph #2)
  • upload new recipe (cookies) (graph #3)

Which means the recipes won't hit the server until after the photo has been uploaded. Maybe this is what you want. But what if it's not? What if you want the large photos to not block the recipe objects?

The solution is to move the photos to a different pipeline.

Here's how it works:

  • Every operation must be assigned to exactly one pipeline.
  • Every pipeline operates independently.
  • Operations cannot have cross-pipeline dependencies.

So if you moved all photo operations to their own pipeline, then the upload of these large files won't block the upload of changes to recipe objects.

Clone this wiki locally