Skip to content

Commit c63df25

Browse files
author
Shlomi Noach
committed
Merge pull request #43 from github/documentation
Beginning documentation
2 parents e723909 + ed81a42 commit c63df25

10 files changed

+306
-1
lines changed

.github/CONTRIBUTING.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
## Contributing
2+
3+
Hi there! We're thrilled that you'd like to contribute to this project. Your help is essential for keeping it great.
4+
5+
This project adheres to the [Open Code of Conduct](http://todogroup.org/opencodeofconduct/#gh-ost/[email protected]). By participating, you are expected to uphold this code.
6+
7+
## Submitting a pull request
8+
9+
0. [Fork](https://github.com/github/gh-ost/fork) and clone the repository
10+
0. Create a new branch: `git checkout -b my-branch-name`
11+
0. Make your change, add tests, and make sure the tests still pass
12+
0. Push to your fork and [submit a pull request](https://github.com/github/gh-ost/compare)
13+
0. Pat your self on the back and wait for your pull request to be reviewed and merged.
14+
15+
Here are a few things you can do that will increase the likelihood of your pull request being accepted:
16+
17+
- Follow the [style guide](https://golang.org/doc/effective_go.html#formatting).
18+
- Write tests.
19+
- Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests.
20+
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
21+
22+
## Resources
23+
24+
- [Contributing to Open Source on GitHub](https://guides.github.com/activities/contributing-to-open-source/)
25+
- [Using Pull Requests](https://help.github.com/articles/using-pull-requests/)
26+
- [GitHub Help](https://help.github.com)

.github/ISSUE_TEMPLATE.md

Whitespace-only changes.

.github/PULL_REQUEST_TEMPLATE.md

Whitespace-only changes.

README.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,30 @@
11
# gh-ost
2-
GitHub's Online Schema Change for MySQL
2+
3+
#### GitHub's online schema migration for MySQL
4+
5+
`gh-ost` allows for online schema migrations in MySQL which are:
6+
- Triggerless
7+
- Testable
8+
- Pausable
9+
- Operations-friendly
10+
11+
## How?
12+
13+
WORK IN PROGRESS
14+
15+
Please meanwhile refer to the [docs](doc) for more information.
16+
17+
## What's in a name?
18+
19+
Originally this was named `gh-osc`: GitHub Online Schema Change, in the likes of [Facebook online schema change](https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/) and [pt-online-schema-change](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
20+
21+
But then a rare genetic mutation happened, and the `s` transformed into `t`. And that sent us down the path of trying to figure out a new acronym. Right now, `gh-ost` (pronounce: _Ghost_), stands for:
22+
- GitHub Online Schema Translator/Transformer/Transfigurator
23+
24+
## Authors
25+
26+
`gh-ost` is designed, authored, reviewed and tested by the database infrastructure team at GitHub:
27+
- [@jonahberquist](https://github.com/jonahberquist)
28+
- [@ggunson](https://github.com/ggunson)
29+
- [@tomkrouper](https://github.com/tomkrouper)
30+
- [@shlomi-noach](https://github.com/shlomi-noach)

doc/command-line-flags.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Command line flags
2+
3+
A more in-depth discussion of various `gh-ost` command line flags: implementation, implication, use cases.
4+
5+
##### exact-rowcount
6+
7+
A `gh-ost` execution need to copy whatever rows you have in your existing table onto the ghost table. This can, and often be, a large number. Exactly what that number is?
8+
`gh-ost` initially estimates the number of rows in your table by issuing an `explain select * from your_table`. This will use statistics on your table and return with a rough estimate. How rough? It might go as low as half or as high as double the actual number of rows in your table. This is the same method as used in [`pt-online-schema-change`](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
9+
10+
`gh-ost` also supports the `--exact-rowcount` flag. When this flag is given, two things happen:
11+
- An initial, authoritative `select count(*) from your_table`.
12+
This query may take a long time to complete, but is performed before we begin the massive operations.
13+
- A continuous update to the estimate as we make progress applying events.
14+
We heuristically update the number of rows based on the queries we process from the binlogs.
15+
16+
While the ongoing estimated number of rows is still heuristic, it's almost exact, such that the reported [ETA](understanding-output.md) or percentage progress is typically accurate to the second throughout a multiple-hour operation.

doc/migrating-with-sbr.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Migrating with Statement Based Replication
2+
3+
Even though `gh-ost` relies on Row Based Replication (RBR), it does not mean you can't keep your Statement Based Replication (SBR).
4+
5+
`gh-ost` is happy to, and actually prefers and suggests so, connect to a replica. On this replica, it is happy to:
6+
- issue the heavyweight `INFORMATION_SCHEMA` queries that make a table structure analysis
7+
- issue a `select count(*) from mydb.mytable`, should `--exact-rowcount` be provided
8+
- connect itself as a fake replica to get the binary log stream
9+
10+
All of the above can be executed on the master, but we're more comfortable that they execute on a replica.
11+
12+
Please note the third item: `gh-ost` connects as a fake replica and pulls the binary logs. This is how `gh-ost` finds the table's changelog: it looks up entries in the binary log.
13+
14+
The magic is that your master can still produce SRB, but if you have a replica with `log-slave-updates`, you can also configure it to have `binlog_format='ROW'`. Such a replica accepts SBR statements from its master, and produces RBR statements onto its binary logs.
15+
16+
`gh-ost` is happy to modify the `binlog_format` on the replica for you:
17+
- If you supply `--switch-to-rbr`, `gh-ost` will convert the binlog format for you, and restart replication to make sure this takes effect.
18+
- If your replica is an intermediate master, i.e. further serves as a master to other replicas, `gh-ost` will not convert the `binlog_format`.
19+
- At any case, `gh-ost` **will not** convert back to `STATEMENT` (SBR). This is because you may be running multiple migrations concurrently. Being able to run concurrent migrations is one of the design goals of this tool. It's your own responsibility to switch back to SBR once all pending migrations are complete.
20+
21+
### Summary
22+
23+
- If you're already using RBR, all is well for you
24+
- If not, convert one of your replicas to `binlog_format='ROW'`, or let `gh-ost` do this for you.

doc/swapping-tables.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Swapping the tables
2+
3+
The table-swap is the final major step of the migration: it's the moment where your original table is pushed aside, and the ghost table (the one we secretly altered and operated on throughout the process) takes its place.
4+
5+
MySQL poses some limitations on how the table swap can take place. While it supports an atomic swap, it does not allow for a swap under controlled lock.
6+
7+
The [facebook OSC](https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/) tool documents this nicely. Look for **"Cut-over phase"**.
8+
9+
`gh-ost` supports various types of table-swap / cut-over options:
10+
11+
- `--quick-and-bumpy-swap-tables` - this method is similar to the one taken by the facebook OSC. It's non-blocking but also non-atomic. The original table is first renames and pushed aside, then the ghost table is renamed to take its place. In between the two renames there's a brief period of time where your table just does not exist, and queries will fail.
12+
- Voluntary lock based solution (default at this time): as depicted in [Solving the Facebook-OSC non-atomic table swap problem](http://code.openark.org/blog/mysql/solving-the-facebook-osc-non-atomic-table-swap-problem), this solution uses voluntary MySQL locks, and makes for a blocking swap, where your queries do not fail, but block until operation is complete. This effect is desired. There is danger in this solution, since connection failure of the two sessions involved in creating the lock, would result in a premature swap of the tables, hence with potentially corrupted data.
13+
- We are working at this time on a blocking, safe, atomic solution, using wait conditions and via User Defined Functions which will need to be dynamically loaded onto your MySQL server.
14+
- With [`--test-on-replica`](testing-on-replica.md) there is no table swap.

doc/testing-on-replica.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Testing on replica
2+
3+
`gh-ost`'s design allows for trusted and reliable tests of the migration without compromising production data integrity.
4+
5+
Test on replica if you:
6+
- Are unsure of `gh-ost`, have not gained confidence into its workings
7+
- Just want to experiment with a real migration without affecting production (maybe measure migration time?)
8+
- Wish to observe data change impact
9+
10+
## What testing on replica means
11+
12+
TL;DR `gh-ost` will make all changes on a replica and leave both original and ghost tables for you to compare.
13+
14+
## Issuing a test drive
15+
16+
Apply `--test-on-replica --host=<a.replica>`.
17+
- `gh-ost` would connect to the indicated server
18+
- Will verify this is indeed a replica and not a master
19+
- Will perform _everything_ on this replica. Other then checking who the master is, it will otherwise not touch it.
20+
- All `INFORMATION_SCHEMA` and `SELECT` queries run on the replica
21+
- Ghost table is created on the replica
22+
- Rows are copied onto the ghost table on the replica
23+
- Binlog events are read from the replica and applied to ghost table on the replica
24+
- So... everything
25+
26+
`gh-ost` will sync the ghost table with the original table.
27+
- When it is satisfied, it will issue a `STOP SLAVE IO_THREAD`, effectively stopping replication
28+
- Will finalize last few statements
29+
- Will terminate. No table swap takes place. No table is dropped.
30+
31+
You are now left with the original table **and** the ghost table. They _should_ be identical.
32+
33+
You now have the time to verify the tool works correctly. You may checksum the entire table data if you like.
34+
- e.g.
35+
`mysql -e 'select * from mydb.mytable order by id' | md5sum`
36+
`mysql -e 'select * from mydb._mytable_gst order by id' | md5sum`
37+
- or of course only select the shared columns before/after the migration
38+
- We use the trivial `engine=innodb` for `alter` when testing. This way the resulting ghost table is identical in structure to the original table (including indexes) and we expect data to be completely identical. We use `md5sum` on the entire dataset to confirm the test result.
39+
40+
### Cleanup
41+
42+
It's your job to:
43+
- Drop the ghost table (at your leisure, you should be aware that a `DROP` can be a lengthy operation)
44+
- Start replication back (via `START SLAVE`)
45+
46+
### Examples
47+
48+
Simple:
49+
```shell
50+
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --verbose --execute
51+
```
52+
53+
Elaborate:
54+
```shell
55+
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --switch-to-rbr --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --postpone-swap-tables-flag-file=/tmp/ghost-postpone.flag --exact-rowcount --allow-nullable-unique-key --verbose --execute
56+
```
57+
- Count exact number of rows (makes ETA estimation very good). This goes at the expense of paying the time for issuing a `SELECT COUNT(*)` on your table. We use this lovingly.
58+
- Automatically switch to `RBR` if replica is configured as `SBR`. See also: [migrating with SBR](migrating-with-sbr.md)
59+
- allow iterating on a `UNIQUE KEY` that has `NULL`able columns (at your own risk)

doc/triggerless-design.md

Whitespace-only changes.

0 commit comments

Comments
 (0)