Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
66894d3
resurrection: dump/restore of migration context cross executions
Dec 18, 2016
75b6f9e
encoding range values as base64
Dec 20, 2016
776c8d3
Merge branch 'master' into resurrect
Dec 20, 2016
6999b4e
exporting to changelog table, not to file
Dec 20, 2016
5e0f38c
Merge branch 'resurrect' of github.com:github/gh-ost into resurrect
Dec 20, 2016
3223a93
context dump serialized with table writes; avoiding sync problems
Dec 20, 2016
6f81d62
storing and updating streamer binlog coordinates
Dec 20, 2016
4c6f42f
passwords not exported in MigrationContext
Dec 20, 2016
c72851e
initial support for --resurrect flag
Dec 20, 2016
171cad2
sanity checks for resurrection
Dec 21, 2016
47d8306
comment typo
Dec 21, 2016
bad30a8
sanity checks on --resurrection; skipping some normal-mode operations
Dec 21, 2016
5f25f74
something that works! True resurrection applied
Dec 21, 2016
89ca346
instead of loading the entire context, only updating particular field…
Dec 21, 2016
1080b11
binlog event listeners accept coordinates.
Dec 23, 2016
e50361a
at resurrection, pointing streamer back at last known applied coordin…
Dec 24, 2016
6128076
some cleanup
Dec 24, 2016
45b63f6
applying IsResurrected flag
Dec 24, 2016
fa399e0
added context test, JSON export/import
Dec 24, 2016
7dfb740
format
Dec 24, 2016
0e8e5de
added on-resurrecting hook
Dec 25, 2016
af74e8c
Resurrection documentation
Dec 25, 2016
874cf24
typo
Dec 25, 2016
8952e24
rewinding resurrecting at beginning of known logfile; more verbose
Dec 28, 2016
738270a
more verbose on resurrection
Dec 28, 2016
90f61f8
resurrected execution does not apply migration range from terminated …
Dec 28, 2016
e4874c8
making sure to dump context before row-copy, so we always have some i…
Dec 28, 2016
e9e9d6d
allowing EOF result for loadJSON
Dec 28, 2016
24f5c6d
ght/ghr suffix -> delr suffix
Dec 29, 2016
7bdfd1b
not applying range if nil
Dec 29, 2016
0b6d834
Merge branch 'master' into resurrect
Dec 29, 2016
856d0d4
Merge branch 'master' into resurrect
Dec 30, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ In addition, it offers many [operational perks](doc/perks.md) that make it safer
- Auditing: you may query `gh-ost` for status. `gh-ost` listens on unix socket or TCP.
- Control over cut-over phase: `gh-ost` can be instructed to postpone what is probably the most critical step: the swap of tables, until such time that you're comfortably available. No need to worry about ETA being outside office hours.
- External [hooks](doc/hooks.md) can couple `gh-ost` with your particular environment.
- [Resurrection](doc/resurrect.md) can resume a failed migration, proceeding from last known good position.

Please refer to the [docs](doc) for more information. No, really, read the [docs](doc).

Expand Down Expand Up @@ -76,19 +77,17 @@ But then a rare genetic mutation happened, and the `c` transformed into `t`. And

## Community

`gh-ost` is released at a stable state, but with mileage to go. We are [open to pull requests](https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md). Please first discuss your intentions via [Issues](https://github.com/github/gh-ost/issues).
`gh-ost` is released at a stable state, and still with mileage to go. We are [open to pull requests](https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md). Please first discuss your intentions via [Issues](https://github.com/github/gh-ost/issues).

We develop `gh-ost` at GitHub and for the community. We may have different priorities than others. From time to time we may suggest a contribution that is not on our immediate roadmap but which may appeal to others.

## Download/binaries/source

`gh-ost` is now GA and stable.

`gh-ost` is available in binary format for Linux and Mac OS/X
`gh-ost` is GA and stable, available in binary format for Linux and Mac OS/X

[Download latest release here](https://github.com/github/gh-ost/releases/latest)

`gh-ost` is a Go project; it is built with Go 1.5 with "experimental vendor". Soon to migrate to Go 1.6. See and use [build file](https://github.com/github/gh-ost/blob/master/build.sh) for compiling it on your own.
`gh-ost` is a Go project; it is built with Go 1.7. See and use [build file](https://github.com/github/gh-ost/blob/master/build.sh) for compiling it on your own.

Generally speaking, `master` branch is stable, but only [releases](https://github.com/github/gh-ost/releases) are to be used in production.

Expand Down
2 changes: 1 addition & 1 deletion RELEASE_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.32
1.1.0
8 changes: 8 additions & 0 deletions doc/command-line-flags.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,14 @@ See also: [Sub-second replication lag throttling](subsecond-lag.md)

Typically `gh-ost` is used to migrate tables on a master. If you wish to only perform the migration in full on a replica, connect `gh-ost` to said replica and pass `--migrate-on-replica`. `gh-ost` will briefly connect to the master but other issue no changes on the master. Migration will be fully executed on the replica, while making sure to maintain a small replication lag.

### resurrect

It is possible to resurrect/resume a failed migration. Such a migration would be a valid execution, which bailed out throughout the migration process. A migration would bail out on meeting with `--critical-load`, or perhaps a user `kill -9`'d it.

Use `--resurrect` with exact same other flags (same `--database, --table, --alter`) to resume a failed migration.

Read more on [resurrection docs](resurrect.md)

### skip-foreign-key-checks

By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not referenece other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`.
Expand Down
1 change: 1 addition & 0 deletions doc/hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The full list of supported hooks is best found in code: [hooks.go](https://githu

- `gh-ost-on-startup`
- `gh-ost-on-validated`
- `gh-ost-on-resurrecting`
- `gh-ost-on-rowcount-complete`
- `gh-ost-on-before-row-copy`
- `gh-ost-on-status`
Expand Down
42 changes: 42 additions & 0 deletions doc/resurrect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Resurrection

`gh-ost` supports resurrection of a failed migration, continuing the migration from last known good position, potentially saving hours of clock-time.

A migration may fail as follows:

- On meeting with `--critical-load`
- On successively meeting with a specific error (e.g. recurring locks)
- Being `kill -9`'d by a user
- MySQL crash
- Server crash
- Robots taking over the world and other reasons.

### --resurrect

One may resurrect such a migration by running the exact same command, adding the `--resurrect` flag.

The terms for resurrection are:

- Exact same database/table/alter
- Previous migration ran for at least one minute
- Previous migration began looking at row-copy and event handling (by `1` minute of execution you may expect this to be the case)

### How does it work?

`gh-ost` dumps its migration status (context) once per minute, onto the _changelog table_. The changelog table is used for internal bookkeeping, and manages heartbeat and internal message passing.

When `--resurrect` is provided,`gh-ost` attempts to find such status dump in the changelog table. Most interestingly this status included:

- Last handled binlog event coordinates (any event up to that point has been applied to _ghost_ table)
- Last copied chunk range
- Other useful information

Resurrection reconnects the streamer at last handled binlog coordinates, and skips rowcopy to proceed from last copied chunk range.

Noteworthy is that it is not important to resume from _exact same_ coordinates and chunk as last applied; the context dump only runs once per minute, and resurrection may re-apply a minute's worth of binary logs, and re-iterate a minute's work of copied chunks.

Row-based replication has the property of being idempotent for DML events. There is no damage in reapplying contiguous binlog events starting at some point in the past.

Chunk-reiteration likewise poses no integrity concern and there is no harm in re-copying same range of rows.

The only concern is to never skip binlog events, and never skip a row range. By virtue of only dumping events and ranges that have been applied, and by virtue of only processing binlog events and chunks moving forward, `gh-ost` keeps integrity intact.
119 changes: 107 additions & 12 deletions go/base/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
package base

import (
"encoding/json"
"fmt"
"io"
"os"
"regexp"
"strings"
Expand Down Expand Up @@ -81,14 +83,15 @@ type MigrationContext struct {
SkipRenamedColumns bool
IsTungsten bool
DiscardForeignKeys bool
Resurrect bool

config ContextConfig
configMutex *sync.Mutex
ConfigFile string
CliUser string
CliPassword string
cliPassword string
CliMasterUser string
CliMasterPassword string
cliMasterPassword string

HeartbeatIntervalMilliseconds int64
defaultNumRetries int64
Expand Down Expand Up @@ -161,7 +164,7 @@ type MigrationContext struct {
UserCommandedUnpostponeFlag int64
CutOverCompleteFlag int64
InCutOverCriticalSectionFlag int64
PanicAbort chan error
IsResurrected int64

OriginalTableColumnsOnApplier *sql.ColumnList
OriginalTableColumns *sql.ColumnList
Expand All @@ -177,8 +180,8 @@ type MigrationContext struct {
Iteration int64
MigrationIterationRangeMinValues *sql.ColumnValues
MigrationIterationRangeMaxValues *sql.ColumnValues

CanStopStreaming func() bool
EncodedRangeValues map[string]string
AppliedBinlogCoordinates mysql.BinlogCoordinates
}

type ContextConfig struct {
Expand All @@ -197,10 +200,10 @@ type ContextConfig struct {
var context *MigrationContext

func init() {
context = newMigrationContext()
context = NewMigrationContext()
}

func newMigrationContext() *MigrationContext {
func NewMigrationContext() *MigrationContext {
return &MigrationContext{
defaultNumRetries: 60,
ChunkSize: 1000,
Expand All @@ -214,8 +217,9 @@ func newMigrationContext() *MigrationContext {
throttleControlReplicaKeys: mysql.NewInstanceKeyMap(),
configMutex: &sync.Mutex{},
pointOfInterestTimeMutex: &sync.Mutex{},
AppliedBinlogCoordinates: mysql.BinlogCoordinates{},
ColumnRenameMap: make(map[string]string),
PanicAbort: make(chan error),
EncodedRangeValues: make(map[string]string),
}
}

Expand All @@ -224,6 +228,78 @@ func GetMigrationContext() *MigrationContext {
return context
}

// DumpJSON exports this config to JSON string and writes it to file
func (this *MigrationContext) ToJSON() (string, error) {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()

if this.MigrationRangeMinValues != nil {
this.EncodedRangeValues["MigrationRangeMinValues"], _ = this.MigrationRangeMinValues.ToBase64()
}
if this.MigrationRangeMaxValues != nil {
this.EncodedRangeValues["MigrationRangeMaxValues"], _ = this.MigrationRangeMaxValues.ToBase64()
}
if this.MigrationIterationRangeMinValues != nil {
this.EncodedRangeValues["MigrationIterationRangeMinValues"], _ = this.MigrationIterationRangeMinValues.ToBase64()
}
if this.MigrationIterationRangeMaxValues != nil {
this.EncodedRangeValues["MigrationIterationRangeMaxValues"], _ = this.MigrationIterationRangeMaxValues.ToBase64()
}
jsonBytes, err := json.Marshal(this)
if err != nil {
return "", err
}
return string(jsonBytes), nil
}

// LoadJSON treats given json as context-dump, and attempts to load this context's data.
func (this *MigrationContext) LoadJSON(jsonString string) error {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()

jsonBytes := []byte(jsonString)

if err := json.Unmarshal(jsonBytes, this); err != nil && err != io.EOF {
return err
}

var err error
if this.MigrationRangeMinValues, err = sql.NewColumnValuesFromBase64(this.EncodedRangeValues["MigrationRangeMinValues"]); err != nil {
return err
}
if this.MigrationRangeMaxValues, err = sql.NewColumnValuesFromBase64(this.EncodedRangeValues["MigrationRangeMaxValues"]); err != nil {
return err
}
if this.MigrationIterationRangeMinValues, err = sql.NewColumnValuesFromBase64(this.EncodedRangeValues["MigrationIterationRangeMinValues"]); err != nil {
return err
}
if this.MigrationIterationRangeMaxValues, err = sql.NewColumnValuesFromBase64(this.EncodedRangeValues["MigrationIterationRangeMaxValues"]); err != nil {
return err
}

return nil
}

// ApplyResurrectedContext loads resurrection-related infor from given context
func (this *MigrationContext) ApplyResurrectedContext(other *MigrationContext) {
// this.MigrationRangeMinValues = other.MigrationRangeMinValues
// this.MigrationRangeMaxValues = other.MigrationRangeMaxValues
if other.MigrationIterationRangeMinValues != nil {
this.MigrationIterationRangeMinValues = other.MigrationIterationRangeMinValues
}
if other.MigrationIterationRangeMaxValues != nil {
this.MigrationIterationRangeMaxValues = other.MigrationIterationRangeMaxValues
}

this.RowsEstimate = other.RowsEstimate
this.RowsDeltaEstimate = other.RowsDeltaEstimate
this.TotalRowsCopied = other.TotalRowsCopied
this.TotalDMLEventsApplied = other.TotalDMLEventsApplied

this.Iteration = other.Iteration
this.AppliedBinlogCoordinates = other.AppliedBinlogCoordinates
}

// GetGhostTableName generates the name of ghost table, based on original table name
func (this *MigrationContext) GetGhostTableName() string {
return fmt.Sprintf("_%s_gho", this.OriginalTableName)
Expand All @@ -232,10 +308,10 @@ func (this *MigrationContext) GetGhostTableName() string {
// GetOldTableName generates the name of the "old" table, into which the original table is renamed.
func (this *MigrationContext) GetOldTableName() string {
if this.TestOnReplica {
return fmt.Sprintf("_%s_ght", this.OriginalTableName)
return fmt.Sprintf("_%s_delr", this.OriginalTableName)
}
if this.MigrateOnReplica {
return fmt.Sprintf("_%s_ghr", this.OriginalTableName)
return fmt.Sprintf("_%s_delr", this.OriginalTableName)
}
return fmt.Sprintf("_%s_del", this.OriginalTableName)
}
Expand Down Expand Up @@ -524,6 +600,13 @@ func (this *MigrationContext) SetNiceRatio(newRatio float64) {
this.niceRatio = newRatio
}

func (this *MigrationContext) SetAppliedBinlogCoordinates(binlogCoordinates *mysql.BinlogCoordinates) {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()

this.AppliedBinlogCoordinates = *binlogCoordinates
}

// ReadMaxLoad parses the `--max-load` flag, which is in multiple key-value format,
// such as: 'Threads_running=100,Threads_connected=500'
// It only applies changes in case there's no parsing error.
Expand Down Expand Up @@ -598,6 +681,18 @@ func (this *MigrationContext) AddThrottleControlReplicaKey(key mysql.InstanceKey
return nil
}

func (this *MigrationContext) SetCliPassword(password string) {
this.cliPassword = password
}

func (this *MigrationContext) SetCliMasterPassword(password string) {
this.cliMasterPassword = password
}

func (this *MigrationContext) GetCliMasterPassword() string {
return this.cliMasterPassword
}

// ApplyCredentials sorts out the credentials between the config file and the CLI flags
func (this *MigrationContext) ApplyCredentials() {
this.configMutex.Lock()
Expand All @@ -613,9 +708,9 @@ func (this *MigrationContext) ApplyCredentials() {
if this.config.Client.Password != "" {
this.InspectorConnectionConfig.Password = this.config.Client.Password
}
if this.CliPassword != "" {
if this.cliPassword != "" {
// Override
this.InspectorConnectionConfig.Password = this.CliPassword
this.InspectorConnectionConfig.Password = this.cliPassword
}
}

Expand Down
55 changes: 55 additions & 0 deletions go/base/context_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
Copyright 2016 GitHub Inc.
See https://github.com/github/gh-ost/blob/master/LICENSE
*/

package base

import (
"io"
"testing"

"github.com/outbrain/golib/log"
test "github.com/outbrain/golib/tests"

"github.com/github/gh-ost/go/mysql"
"github.com/github/gh-ost/go/sql"
)

func init() {
log.SetLevel(log.ERROR)
}

func TestContextToJSON(t *testing.T) {
context := NewMigrationContext()
jsonString, err := context.ToJSON()
test.S(t).ExpectNil(err)
test.S(t).ExpectNotEquals(jsonString, "")
}

func TestContextLoadJSON(t *testing.T) {
var jsonString string
var err error
{
context := NewMigrationContext()
context.AppliedBinlogCoordinates = mysql.BinlogCoordinates{LogFile: "mysql-bin.012345", LogPos: 6789}

abstractValues := []interface{}{31, "2016-12-24 17:04:32"}
context.MigrationRangeMinValues = sql.ToColumnValues(abstractValues)

jsonString, err = context.ToJSON()
test.S(t).ExpectNil(err)
test.S(t).ExpectNotEquals(jsonString, "")
}
{
context := NewMigrationContext()
err = context.LoadJSON(jsonString)
test.S(t).ExpectEqualsAny(err, nil, io.EOF)
test.S(t).ExpectEquals(context.AppliedBinlogCoordinates, mysql.BinlogCoordinates{LogFile: "mysql-bin.012345", LogPos: 6789})

abstractValues := context.MigrationRangeMinValues.AbstractValues()
test.S(t).ExpectEquals(len(abstractValues), 2)
test.S(t).ExpectEquals(abstractValues[0], 31)
test.S(t).ExpectEquals(abstractValues[1], "2016-12-24 17:04:32")
}
}
Loading