Skip to content

ServiceDependencies

Jude Nelson edited this page Feb 12, 2015 · 1 revision

Service Dependencies

One problem we've started to encounter with services is that sometimes they need to be aware of what other services are doing. For example, HPC needs to learn when a user adds a Sliver to the HPC Slice, so it can deploy a new cache instance. Another example, RR needs to learn when HPC creates a new cache instance, so it can begin directing requests to it and listening for its heartbeats. As a third example, Syndicate needs to learn when any user adds a new Sliver to a Slice, so it can grant the Sliver access to the Slice's private volume.

Proposal 1: Encode Dependencies in the Data Model

One way to address this is to have each service explicitly list the services it depends on. Then, when XOS enacts any changes to a service, it also walks the dependency tree of service-specific Observers in a breadth-first manner to propagate the change through all interested services. Once an Observer handles the state change, it generates a "handle" to the database object(s) it modified so subsequent dependent Observers can read and act on them.

For example, the RR service would depend on the HPC service, and the HPC service would depend on XOS. When the user adds a Sliver to HPC's Slice, XOS takes the following steps:

  1. XOS creates the new Sliver, and generates a handle to the new Sliver.
  2. XOS calls the HPC Observer's "change" callback with the Sliver handle.
  3. HPC's observer sets up the Sliver by creating an OriginServer for it, and generates a handle to the OriginServer.
  4. XOS calls the RR Observer's "change" callback with the OriginServer handle.
  5. RR installs new request routes in its redirection maps, and gives back a handle to each affected redirection map.

Handle Structure

The handle would be a lightweight data structure that lets another service find the set of information affected by a recent change. It should not encode the changed data itself, since the data could be quite large in practice.

At a minimum, it needs to encode the service that generated it and the objects the change affected. It should also encode for each object the type of object, the instance name of the object (allowing the service Observer to query it), and the action taken on it (i.e. CREATE, UPDATE, or DELETE).

The handle generated in step (1) might look something like:

{
   "ServiceName": "XOS",
   "Objects": [
       {"Type": "Sliver", "Name": "slice-XXX", "Action": "CREATE"}
   ]
}

The handle generated in step (2) might look like:

{
   "ServiceName": "HPC",
   "Objects": [
       {"Type": "OriginServer", "Name": "originserver-XXX", "Action": "CREATE"}
   ]
}

The handle generated in step (3) might look like:

{ 
   "ServiceName": "RR",
   "Objects": [
       {"Type": "RedirectionMap", "Name": "rrmap-XXX", "Action": "UPDATE"},
       {"Type": "RedirectionMap", "Name": "rrmap-YYY", "Action": "UPDATE"}
   ]
}

Challenges

  • There needs to be a secure API for a service's Observer to read a handle's objects. It needs to ensure that the Observer can only pull objects and fields the handle specifies
    • Maybe the handle could be cryptographically signed by the Observer that generates it?
    • Maybe each object's schema encodes which fields are readable to other services?
  • Depending on how unwieldy things get, Service Observers may need to filter on specific objects and fields, instead of the services (i.e. we may discover later that services prove too coarse-grained). This could lead to tight coupling between services if we're not careful, which will make service composition difficult.
  • What happens if a service Observer goes offline and misses some notifications? Can changes be "re-propagated"?

Proposal 2: Keep a Read-Only Ledger of Changes

Another way to address this is to maintain a global, append-only, read-only ledger of all enacted changes, which other Observers can query (e.g. a global persistent message queue). Each record takes the form of (Service, Object, Operation, Metadata), where:

  • Service is a well-known name of an XOS-hosted service.
  • Object is a well-known name of a type of object.
  • Operation is one of a small set of well-known operations (i.e. CREATE, MODIFY, DELETE).
  • Metadata is a service-specific record that that describes the operation.

In this scenario, each service Observer defines a record filter that describes the messages they're interested in. Since XOS knows this for each service, and since it is the single writer to this ledger, it can "wake up" each interested service Observer in parallel once it finishes appending a new message. Service Observers that depend on multiple changes can simply accumulate messages until they have enough knowledge to carry out the requested operation.

Using the example from Proposal #1 above, the HPC service would subscribe to ("XOS", "Sliver", "*", "*"), and the RR service would subscribe to ("HPC", "OriginServer", "*", "*"). Then, the order of operations becomes:

  1. XOS appends ("XOS", "Slice", "UPDATE", "{'Name': 'slice-xxx', 'Owned-By': 'HPC'}") and ("XOS", "Sliver", "CREATE", "{'Name': 'sliver-XXX', 'Owned-By': 'slice-xxx'}) to the public ledger.
  2. XOS reads the head of the log, and runs the HPC Observer so it can react to ("XOS", "Sliver", "CREATE", "{'Name': 'sliver-XXX', 'Owned-By': 'slice-xxx'}).
  3. The HPC Observer sets up sliver-XXX.
  4. The HPC Observer emits ("HPC", "OriginServer", "CREATE", "{'Name': 'originserver-xxx', 'Hostname': 'node33.princeton.vicci.org', 'Port': '8888'}"), which XOS captures and appends to the ledger.
  5. XOS reads the head of the log, and runs the RR Observer so it can react to ("HPC", "OriginServer", "CREATE", "{'Name': 'originserver-xxx', 'Hostname': 'node33.princeton.vicci.org', 'Port': '8888'}").
  6. The RR Observer updates two of its redirection maps: rrmap-XXX and rrmap-YYY.
  7. The RR observer emits ("RR", "RedirectionMap", "UPDATE", "{'Name': 'rrmap-XXX'}") and ("RR", "RedirectionMap", "UPDATE", "{'Name': 'rrmap-YYY'}").

The advantage this proposal has over Proposal #1 is that the ledger goes back to the beginning of time, so service Observers that go offline and miss messages can automatically replay them, and make themselves consistent with XOS.

Challenges

  • We'll need to compress/snapshot the ledger over time, to keep it from eating our disks.
  • We'll need to find a way to suppress "flapping" on-the-fly, so service Observers don't do things like bill users multiple times or trigger cascading failures from replaying a flap.

Clone this wiki locally