Skip to content

Scaling Sync Gateway

Chris Anderson edited this page Jul 9, 2013 · 6 revisions

Scaling Sync Gateway

The Sync Gateway can be scaled up by running it as a cluster. This simply means running an instance of the gateway on each of serveral machines, all with the identical configuration, and load-balancing them by directing each incoming HTTP request to a random node. Really, that's pretty much it. The gateway nodes are "shared-nothing", so they don't need to coordinate any state or even know about each other. Everything they know is contained in the central Couchbase bucket.

Clearly, all of the gateway nodes will be talking to the same Couchbase Server bucket. This can of course be hosted by a cluster of Couchbase nodes. The gateway uses the standard Couchbase "smart-client" APIs and works with database clusters of any size.

Notes On Performance

  • The gateway nodes don't keep any local state, so they don't require any disk.
  • Nor do they cache much in RAM; every request is handled independently. Go does use garbage collection so the memory usage may be somewhat higher than for C code, but it shouldn't be excessive, provided the number of simultaneous requests per node is kept limited.
  • Go is good at multiprocessing. It uses lightweight threads and asynchronous I/O. Throwing more CPU cores at a gateway node should definitely speed it up.
  • As is typical with databases, writes are going to put a lot more load on the system than reads. In particular, replication and channels imply that there's a lot of fan-out, where making a change will trigger sending notifications to many other clients, who will then perform reads to get the new data.
  • We don't currently have any guidelines for how many gateway or database nodes you might need for particular workloads. We'll know more once we do more testing and tuning and get experience with real use cases.

Managing TCP Connections

Very large-scale deployments may run into challenges managing large numbers of simultaneous open TCP connections. The CouchDB replication protocol uses a "hanging-GET" technique to enable the server to push notifications of changes. This means that an active client running a continuous pull replication will always have an open TCP connection on the server. (This is similar to other applications that use server-push aka "Comet" techniques, as well as protocols like XMPP and IMAP.)

These sockets remain idle most of the time (unless documents are being modified at a very high rate!) so the actual data traffic is low; the issue is simply managing that many sockets. This is commonly known as the "C10k Problem" and it's been pretty well analyzed in the last few years. Go uses asynchronous I/O so it's capable of listening on large numbers of sockets, provided you make sure the OS is tuned accordingly and you've got enough network interfaces to provide a sufficiently large namespace of TCP port numbers per node.

Clone this wiki locally