Skip to content

YapDatabaseCloudCore

Robbie Hanson edited this page Oct 26, 2016 · 14 revisions

I can sync to that.

YapDatabaseCloudCore is a general cloud syncing system that allows you to support a wide variety of cloud systems. For example: Dropbox, Box, Google Drive, Amazon Cloud Drive, ownCloud ... (you get the idea)

Overview

There are a lot of cloud services out there. And from a consumer's perspective, there's a lot of variation. Some services integrate better with different operating systems. Some are more tailored for enterprise. And then there's pricing.

But what about from a developer's perspective? What if we ignored the client apps, and the pricing tiers, and the marketing? What if we just looked at the developer API's for each cloud service? What would we find?

We'd find an awful lot of similarity. All of these services offer a REST API. Most are file based (although some are record based). And they all provide some kind of revision system. For example, if we're going to upload a modified version of a file, our HTTP request might have this header field: "If-Match: PreviousETag". And the server would reject us if we're out-of-date. (Just like a git system would reject a "push", if we're out-of-date and need to do a "pull" first.)

YapDatabaseCloudCore is cloud service agnostic. That is, it's designed to be able to support just about any type of cloud service you can throw at it. And in addition to objects (key/value pairs), it also supports regular files (jpg, pdf, mp4, etc).

Operation Based

A common misperception is that a record-based cloud service is required in order to support syncing objects (records with key/value pairs). This is untrue. And YapDatabaseCloudCore helps you to support either record-based cloud services OR file-based cloud services.

There's a whole bunch of (cheap & efficient) file-based sync services out there. But developers primarily work with objects. And objects aren't files, right? ... But they could be. We already know how to serialize our objects for storage in a database. It's not difficult to extend this concept of serialization for storage in a file in the cloud. We could use JSON, or XML, or protocol buffers, or something custom. So, theoretically, we could store all of our objects as individual files in the cloud.

Locally we're still using a database. We just need to adapt the object for storage in the cloud, in a format which the cloud service supports.

You'll note that this is true for file-based cloud services, as well as record-based cloud services. For example, many record-based cloud services only support JSON, or only support a handful of data types. Thus, to store something like a color in the cloud, you'd need to convert it to/from a string.

YapDatabaseCloudCore assits you by helping you store a "queue of operations". That is, an ordered list of changes that you need to push to the server. You have all the control in terms of what information gets stored in an "operation", and you're the one that's responsible for executing the "operation". Thus just about any kind of cloud service can be supported.

Basics

Imagine we want to support FooBarCloud, the latest imaginary cloud service provider. Since YapDatabaseCloudCore is kind of a "bring your own cloud API" system, we can support it.

To break this down, let's go through all the various steps one would need to take in order to sync an object to the FooBarCloud. For example, let's say we have a contact book app, and the user has just modified the firstName of a contact, and hit 'Save'. This translates into a modified MYContact object that gets saved into the local database. Here's everything that needs to happen:

  • Within the same atomic transaction that modified the MYContact object, we need to record the fact that this contact needs to be pushed to the cloud.
  • Also within the same atomic transaction, we should record what changes were made. So something like: "old:{firstName: Robby}, new:{firstName: Robbie}". (This will allow us to properly merge changes in the event of a conflict. More on this topic later.)
  • After the transaction, we need to generate the file-based representation of the contact object.
  • Then we need to fetch, from our local database, the latest revision tag for the contact. Since FooBarCloud uses the etag system, this means the latest known etag we have for the corresponding URL.
  • Then we need to perform the proper HTTP REST call to upload the file-based representation to the cloud. For our FooBarCloud, this would mean performing a PUT to /contacts/the_contact_uuid.json. (And we have to specify: If-Match: PreviousETag)
  • Assuming we get a 200 OK response from the server, we need to execute a new atomic transaction to remove the "flag" in the database that says the upload needs to be performed. And remove the saved information regarding what changed. (The "old:{}, new:{}" stuff.)
  • Also in this same transaction we need to record the new etag.

So ... that's a lot of stuff that needs to happen. But you'll notice that the actual HTTP request was only a small portion of it. And in fact, if you switched from FooBarCloud to MooCowCloud, the only thing that would change would be the HTTP request.

So all that other stuff above that isn't the HTTP request, that's exactly what YapDatabaseCloudCore is for.

Operations

At the heart of the system is YapDatabaseCloudCoreOperation. When you need to perform some REST operation, you record that fact with one of these objects. That is, you use a YapDatabaseCloudCoreOperation instance to record whatever information you'll need in order to perform the REST operation in the future. And this instance gets stored in the database. So even if the user quits the app, and relaunches it tomorrow, the operation instance will automatically be restored.

Think about it like this: In a magical ideal world, the user's device would always be connected to the Internet. And the connection would be so fast, every network operation would be instantaneous. And there would never be upload errors or merge conflicts. And there would be rainbows and unicorns and rivers made out of chocolate. But in the real world we have to accept certain facts. The user might not have an Internet connection. And it might be slow. And there will be a delay between the moment we save something in the local database, and when that information hits the cloud. And there will be interrupted uploads and merge conflicts and death and taxes.

Long story short: You cannot simply perform the REST operation at the moment you need to, because you cannot guarantee it will succeed. Instead you need to record information about the REST operation that needs to be performed, wait for it to succeed (resolving conflicts as neccessary), and then delete the recorded operation. And YapDatabaseCloudCoreOperation is what helps you perform this task.

There's a lot more to be said about operations (dependencies, priorities, graphs, pipelines, etc). But let's start with the basics.

YapDatabaseCloudCoreOperation

YapDatabaseCloudCoreOperation is a bare-bones class that provides the very minimal functionality required by the YapDatabaseCloudCore extension. As such, you're encouraged to subclass it. And then you're free to add your own properties. Whatever you need to facilitate the REST operations and your sync logic.

Just don't forget to properly support NSCoding & NSCopying in your subclass.

Operation Order

When syncing objects to a file-based cloud service provider, one of the most difficult tasks is properly ordering all the operations.

Your local database (such as YapDatabase) supports atomic transactions. This means you can change multiple objects simultaneously, and save them all to the database in one single atomic transaction. Your cloud service provider, however, likely does NOT support atomic transactions involving multiple files.

This can make things complicated when our objects:

  • have references to other objects
  • and those references are expected to be valid

For example, a new PurchaseOrder object may point to a new Customer object. With a local database, we can store both the new PurchaseOrder object and new Customer object within the same atomic transaction. But when pushing these objects to the cloud, we can only push one at a time. So, in this case, we'd like to push the Customer object first, and then the PurchaseOrder.

YapDatabaseCloudCore solves these problems using 3 technologies:

  • Operation dependencies
  • Graphs
  • Pipelines

Operation Dependencies

Every operation can be assigned a set of dependencies. That is, you can specify that operationB depends on operationA.

NSString *pathA = @"/files/topSecret.txt.key";
NSString *pathB = @"/files/topSecret.txt.encrypted";

opA = [FooBarCloudOperation uploadWithCloudPath:pathA];
opB = [FooBarCloudOperation uploadWithCloudPath:pathB];

[opB addDependency:opA.uuid];

The system will then ensure that opB is not started until opA has completed.

Graphs

Every database commit may generate zero or more operations. If one or more operations are created, then a YapDatabaseCloudCoreGraph instance is created to manage all the operations (for the commit). The graph will take into account each operation's dependencies (and priorities), and it will conceptually create a graph of the operations.

For example, say the following operations are created:

  • opA (priority=100, dependencies={})
  • opB (priority=0, dependencies={})
  • opC (priority=0, dependencies={opB})
  • opD (priority=0, dependencies={opB})

The graph will deduce:

  • opA should be executed first, as it has no dependencies and the highest priority
  • opB can be executed next. opB does NOT need to wait for opA to finish. So opB can be executed in parallel (while opA is still in flight).
  • opC & opD cannot be started until opB completes.
  • opC & opD can be executed in any order, and can execute in parallel.

Further, you do NOT need to worry about dependencies between commits. (For example, if objectA was created in commit #4, and objectB was created in commit #5, and objectB references objectA...) This is a non-issue in YapDatabaseCloudCore, because each commit gets its own graph. And the graph for commit #4 MUST complete in its entirety before the graph for commit #5 can start.

(In the future, there will be advanced alternative systems without this constraint. But this is how it currently functions, and this functionality will remain available in the future.)

Pipelines

A pipeline is simply an array of graphs. The pipeline ensures that graph4 (representing all operations from commit #4) completes before graph5 (representing all operations from commit #5) is started.

Every pipeline has a delegate (that's you), which must implement a single method:

@protocol YapDatabaseCloudCorePipelineDelegate
@required

- (void)startOperation:(YapDatabaseCloudCoreOperation *)operation
           forPipeline:(YapDatabaseCloudCorePipeline *)pipeline;

@end

So all you have to do is execute the operation when it's handed to you via this delegate method. (The pipeline & graph ensure that it's safe to execute the operation before it's handed to you.)

And since the pipeline supports executing operations in parallel, you can configure the pipeline with a 'maxConcurrentOperationCount'

/**
 * This value is the maximum number of operations that will be
 * assigned to the delegate at any one time.
 * 
 * The pipeline keeps track of operations that have been assigned to
 * the delegate (via startOperation:forPipeline:), and will delay
 * assigning any more operations once the maxConcurrentOperationCount
 * has been reached.
 * 
 * Once an operation is completed (or skipped), the pipeline will
 * automatically resume.
 * 
 * Of course, the delegate is welcome to perform its own concurrency
 * restriction. For example, via
 * NSURLSessionConfiguration.HTTPMaximumConnectionsPerHost.
 * In which case it may simply set this to a high enough value that
 * it won't interfere with its own implementation.
 * 
 * This value may be changed at anytime.
 *
 * The default value is 8.
**/
@property (atomic, assign, readwrite) NSInteger maxConcurrentOperationCount;

Most applications will simply use a single pipeline (the default pipeline). However, YapDatabaseCloudCore supports multiple pipelines, which opens up some interesting possibilities.

For example, let's say that we're making a recipe app. Which means that we're syncing Recipe objects and Photos (that the user takes of the prepared recipe).

Uploading a recipe is quick, as the recipe object/file is rather small. However, uploading a photo of the recipe is going to take a lot longer. Since we want to store full-size photos, this means we're uploading several megabyte files. This isn't a problem, but it may have an effect on our syncing.

For example, imagine the user performs the following actions (in this order):

  • adds a photo to a recipe
  • creates a new recipe (appetizer)
  • creates a new recipe (cookies)

This results in 3 new operations:

  • upload photo (8 megabytes) (graph #1)
  • upload new recipe (appetizer) (graph #2)
  • upload new recipe (cookies) (graph #3)

Which means the recipes won't hit the server until after the photo has been uploaded. Maybe this is what you want. But what if it's not? What if you want the large photos to not block the recipe objects?

The solution is to move the photos to a different pipeline.

Here's how it works:

  • Every operation must be assigned to exactly one pipeline.
  • Every pipeline operates independently.
  • Operations cannot have cross-pipeline dependencies.

So if you moved all photo operations to their own pipeline, then the upload of these large files won't block the upload of changes to recipe objects.

Clone this wiki locally