-
Notifications
You must be signed in to change notification settings - Fork 739
Databus for MySQL
## Introduction A frequently question asked on the Databus open source mailing list is about the possibility of running Databus for MySQL. Such a feature is useful for audience who want to use MySQL for Desktop development, but use Oracle for production or use MySQL as a primary database and use Databus as an asynchronous change-capture mechanism.
In what follows, we describe the details of building such a MySQL adapter for Databus. It uses an open-source library called “OpenReplicator” which provides the ability to parse, filter and broadcast binlog events in a real time manner. The library, written in Java, is available here.
## How to build? The code will shortly be available on github. It may be built with the following commands:
$ gradle -Dopen_source=true assemble
$ cd build/databus2-relay-example-pkg/distributions
$ tar -zxvf databus2-example-relay-pkg.tar.gz
$ ./bin/create_person.sh : The script assumes that MySQL is started on port 33066; please change it appropriately for your setup. It creates a database called ‘or_test’, a table called ‘person’ within it, and it inserts 9 sample rows in that table.
$ ./bin/start-example-relay.sh or_person -Y ./conf/sources-or-person.json : This script starts a Databus relay and subscribes for change-capture for the table ‘or_test.person’.
Test if the relay has successfully been started. A quick way is to issue a curl command to the relay’s HTTP port as specified in conf/relay_or_person.properties (11115). The logical source id specified for the table ‘Person’ in conf/sources-or-person.json is 40:
$ curl -s http://localhost:11115/sources
[{"name":"com.linkedin.events.example.or_test.Person","id":40}]
Insert an event into the table with a command like:
update person set first_name='Balaji' where id=1;
Check if the relay has received the event from the database with the following command:
$ curl -s http://localhost:11115/containerStats/inbound/events/total?pretty | grep -m1 numDataEvents "numDataEvents" : 1,
The relevant code is available in databus2-relay/databus2-event-producer-or
## Logical Timeline The current version is a prototype implementation of a MySQL adapter for Databus. It has been designed for a single-node MySQL setup, and it has been tested in a single-database (with multiple tables) scenario. It has not yet been deployed in our production clusters. Further, we describe the limitations in the current design and would like to note that the design/implementation may change significantly going forward. Hence subsequent versions may not be backward-compatible.
SCN (System Change Number) is used as a timeline for ordering events from various transactions occurring at the database. Please refer to Section 3.2 in our paper titled “All Aboard the Databus”. here. In the current implementation, SCN is represented as a long (64 bits). Of the 64 bits, the high-order 32 bits are used to represent the binlog file number, and the low-order 32 bits are used to represent the binlog offset. Therefore, an event starting in binary log file mysql-bin.000001 at binlog offset 4 is represented as (1 << 32) | 4 = 4294967300 .
- Simple to understand and easy to implement
- Works with Vanilla MySQL-5.5.8
- Works well for non-clustered environments
- Not resilient to DBA commands like ‘reset master’.
Why? The command ‘reset master’ resets the binlog file/sequence number generation to initial values of
mysql-bin.000001and binlog offset 4. This puts the Databus services in an inconsistent state where they are unable to differentiate the new incoming events from events they have seen in the past. A simple workaround is to just use ‘purge logs’. - Does not work in clustered setups with Databus replication enabled on SLAVES. Why? MySql replication does not guarantee that binlog file numbers and sequences match on the slave nodes with corresponding masters in the cluster. In such a case, if Databus is connected to a slave storage node, the binlog co-ordinates of a transaction may be different than that on the master. A Databus client switching from a Databus relay connected to such a master node to a slave relay that is connected to the corresponding slave node will get no/inconsistent data.
- Does not work in clustered setups with mastership transfers. Why? A mastership transfer for a database from node n1 to node n2 breaks consistency on the Databus client side. This is because the client may have consumed from node n1 up through scn1. It would look for transactions greater than scn1 on the new node n2. But due to the nature of MySQL replication, the newly committed transaction for the database can actually be numerically smaller than scn1, which will cause the new transaction to be missed and thereby to affect consistency.
- Real-time change-capture from MySQL
- Binlog file rotation on MySQL
- Tested for single / multiple tables
- Tested for MySQL-5.5.8
- Automatic Avro schema file generation for a given MySQL table
- Support for consistent change-capture in a clustered MySQL environment with mastership transfers
- Support for global TXID in MySQL-5.6
- Multi-tenancy optimizations (w.r.t. number of fetches of binlog files from master and server-side filtering)
- Composite keys