|
| 1 | +--- |
| 2 | +title: 'How to Migrate database from MySQL to PostgreSQL' |
| 3 | +--- |
| 4 | + |
| 5 | +## When you should consider migrating from MySQL to PostgreSQL |
| 6 | + |
| 7 | +### Advanced feature requirements |
| 8 | + |
| 9 | +- **Complex Data Types**: PostgreSQL provides robust support for JSON, arrays, hstore, and custom types, making it ideal for applications with complex data structures. |
| 10 | +- **Geospatial Support**: PostgreSQL with PostGIS offers superior geospatial capabilities compared to MySQL's spatial extensions. |
| 11 | + |
| 12 | +### Scalability needs |
| 13 | + |
| 14 | +- **Table Partitioning**: PostgreSQL's declarative partitioning is more flexible and powerful than MySQL's partitioning system. |
| 15 | +- **Parallel Query Execution**: PostgreSQL can utilize multiple CPU cores for single queries, improving performance for complex analytical workloads. |
| 16 | +- **Advanced Indexing**: PostgreSQL supports more index types (B-tree, Hash, GiST, SP-GiST, GIN, and BRIN) and offers partial and expression indexes. |
| 17 | + |
| 18 | +### Licensing concerns |
| 19 | + |
| 20 | +PostgreSQL offers freedoms that MySQL's GPL version doesn't: |
| 21 | + |
| 22 | +- **Permissive License**: PostgreSQL uses a PostgreSQL License (similar to MIT/BSD), which: |
| 23 | + |
| 24 | + - Allows unrestricted use in proprietary applications |
| 25 | + - Doesn't require source code disclosure |
| 26 | + - Permits creating closed-source derivatives |
| 27 | + |
| 28 | +- **Unrestricted Embedding**: You can embed PostgreSQL in commercial products without licensing fees or source code obligations. |
| 29 | +- **Fork Freedom**: You can create proprietary forks of PostgreSQL without license obligations. |
| 30 | +- **No Corporate Control**: PostgreSQL is developed by a community organization rather than a single company, reducing concerns about commercial interests affecting the license. |
| 31 | + |
| 32 | +## When you should think twice |
| 33 | + |
| 34 | +- If your application handles a high volume of write operations, PostgreSQL may perform less efficiently. Check [Uber's switch from PostgreSQL to MySQL](https://www.uber.com/en-SG/blog/postgres-to-mysql-migration/) |
| 35 | +- All-in-one vs best of breed. While PostgreSQL is like an all-in-one database thanks to its extensible architecture, it may be better to let your relational database handle transactional processing and use more specialized systems for analytical processing, full-text search, etc. If you are considering migrating databases, yours has likely reached a certain scale and encountered bottlenecks. The all-in-one approach is more desirable when you’re just getting started. |
| 36 | + |
| 37 | +## MySQL and PostgreSQL schema differences |
| 38 | + |
| 39 | +### Data types |
| 40 | + |
| 41 | +While many data types are similar, important differences exist: |
| 42 | + |
| 43 | +| MySQL Type | PostgreSQL Equivalent | Notes | |
| 44 | +| ------------------------------------ | ------------------------ | --------------------------------------------------------------- | |
| 45 | +| INT | INTEGER | Similar functionality | |
| 46 | +| BIGINT | BIGINT | Similar functionality | |
| 47 | +| FLOAT | REAL | PostgreSQL's REAL is equivalent to MySQL's FLOAT | |
| 48 | +| DOUBLE | DOUBLE PRECISION | Similar functionality | |
| 49 | +| DECIMAL | NUMERIC | Similar functionality | |
| 50 | +| DATETIME | TIMESTAMP | PostgreSQL's TIMESTAMP has no automatic initialization | |
| 51 | +| TIMESTAMP | TIMESTAMP WITH TIME ZONE | PostgreSQL handles time zones more explicitly | |
| 52 | +| ENUM | ENUM or CHECK constraint | PostgreSQL's ENUM is a custom type, not a string constraint | |
| 53 | +| SET | Array or JSONB | No direct equivalent; arrays or JSONB can replace functionality | |
| 54 | +| TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT | TEXT | PostgreSQL has a single TEXT type with no practical size limit | |
| 55 | +| VARCHAR | VARCHAR | PostgreSQL's VARCHAR has no performance penalty for full length | |
| 56 | +| BLOB | BYTEA | Different functions for manipulation | |
| 57 | + |
| 58 | +### Constraints and keys |
| 59 | + |
| 60 | +PostgreSQL handles constraints differently: |
| 61 | + |
| 62 | +- **Primary Keys**: Both systems support primary keys, but PostgreSQL automatically creates an index for each primary key. |
| 63 | +- **Foreign Keys**: PostgreSQL enforces foreign key constraints more strictly and offers more deferral options. |
| 64 | +- **CHECK Constraints**: PostgreSQL fully enforces CHECK constraints, while MySQL historically stored but ignored them (this has improved in recent MySQL versions). |
| 65 | +- **Unique Constraints**: Both support unique constraints, but PostgreSQL distinguishes between unique constraints and unique indexes. |
| 66 | + |
| 67 | +### Sequences and auto-increment |
| 68 | + |
| 69 | +- MySQL uses `AUTO_INCREMENT` for generating sequential values. |
| 70 | +- PostgreSQL uses sequences, typically with `SERIAL` or `IDENTITY` columns. |
| 71 | +- Migration requires converting `AUTO_INCREMENT` to PostgreSQL sequences or identity columns. |
| 72 | + |
| 73 | +### Default values |
| 74 | + |
| 75 | +- PostgreSQL supports more complex default values, including functions. |
| 76 | +- MySQL's `CURRENT_TIMESTAMP` default for `DATETIME` columns becomes `CURRENT_TIMESTAMP` in PostgreSQL. |
| 77 | +- PostgreSQL allows defaults on TEXT columns, which some MySQL versions restricted. |
| 78 | + |
| 79 | +### Schema naming and case sensitivity |
| 80 | + |
| 81 | +- PostgreSQL is case-sensitive for identifiers unless quoted, while MySQL's case sensitivity depends on the operating system and configuration. |
| 82 | +- PostgreSQL automatically converts unquoted identifiers to lowercase, which can cause issues during migration. |
| 83 | +- PostgreSQL uses schemas (similar to namespaces) more extensively than MySQL's databases. |
| 84 | + |
| 85 | +### Stored procedures and functions |
| 86 | + |
| 87 | +- PostgreSQL uses PL/pgSQL as its primary procedural language, while MySQL uses its own syntax. |
| 88 | +- PostgreSQL supports multiple procedural languages (PL/pgSQL, PL/Python, PL/Perl, etc.). |
| 89 | + |
| 90 | +### Views and materialized views |
| 91 | + |
| 92 | +- Both support views, but PostgreSQL also offers materialized views that store data physically. |
| 93 | +- PostgreSQL's view updating capabilities are more advanced. |
| 94 | +- PostgreSQL allows indexing of materialized views. |
| 95 | + |
| 96 | +## Data migration strategies |
| 97 | + |
| 98 | +| Strategy | Pros | Cons | |
| 99 | +| --------------------------- | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | |
| 100 | +| Direct export/import | Simple to implement for small databases | Requires downtime<br/>Challenging for large databases<br/>Manual conversion may be needed | |
| 101 | +| ETL process | Highly customizable<br/>Can handle complex transformations<br/>Can be parallelized | Requires more development effort<br/>Potentially complex to set up | |
| 102 | +| Replication-based migration | Minimal downtime<br/>Continuous validation possible<br/>Phased migration | More complex setup<br/>Requires monitoring<br/>Potential replication lag | |
| 103 | +| Cloud migration services | Managed service<br/>Often includes schema conversion<br/>Typically supports continuous replication | Vendor lock-in<br/>Potential costs<br/>May require cloud-to-cloud or on-premises-to-cloud networking | |
| 104 | + |
| 105 | +### Direct export/import |
| 106 | + |
| 107 | +The simplest approach involves exporting data from MySQL and importing it into PostgreSQL: |
| 108 | + |
| 109 | +1. **Export MySQL data** using mysqldump: |
| 110 | + |
| 111 | + ```bash |
| 112 | + mysqldump --compatible=postgresql --default-character-set=utf8 \ |
| 113 | + --no-create-info --complete-insert --extended-insert --single-transaction \ |
| 114 | + --skip-triggers --routines=0 --skip-tz-utc \ |
| 115 | + database_name > mysql_data.sql |
| 116 | + ``` |
| 117 | + |
| 118 | +2. **Convert the SQL** to PostgreSQL format using tools like [pgloader](https://github.com/dimitri/pgloader) or custom scripts. |
| 119 | + |
| 120 | +3. **Import into PostgreSQL** using psql: |
| 121 | + ```bash |
| 122 | + psql -d database_name -f converted_data.sql |
| 123 | + ``` |
| 124 | + |
| 125 | +### ETL process |
| 126 | + |
| 127 | +For more complex migrations, an Extract-Transform-Load (ETL) process offers greater control: |
| 128 | + |
| 129 | +1. **Extract** data from MySQL into an intermediate format (CSV, JSON, etc.). |
| 130 | +2. **Transform** the data to match PostgreSQL's requirements (data types, constraints, etc.). |
| 131 | +3. **Load** the transformed data into PostgreSQL. |
| 132 | + |
| 133 | +### Replication-based migration |
| 134 | + |
| 135 | +For minimal downtime, consider a replication-based approach: |
| 136 | + |
| 137 | +1. **Set up initial data copy** using tools like pgloader or AWS DMS. |
| 138 | +2. **Establish ongoing replication** from MySQL to PostgreSQL using tools likes [Debezium](https://debezium.io/). |
| 139 | +3. **Validate data consistency** between the systems. |
| 140 | +4. **Cut over** to PostgreSQL when ready. |
| 141 | + |
| 142 | +### Cloud migration services |
| 143 | + |
| 144 | +Cloud providers offer specialized services for database migration: |
| 145 | + |
| 146 | +- **AWS Database Migration Service** |
| 147 | +- **Google Cloud Database Migration Service** |
| 148 | +- **Azure Database Migration Service** |
| 149 | + |
| 150 | +### Handling large datasets |
| 151 | + |
| 152 | +For very large databases, consider these additional strategies: |
| 153 | + |
| 154 | +- **Partitioned Migration**: Migrate data in chunks based on logical partitions. |
| 155 | +- **Parallel Processing**: Use multiple threads or processes for data extraction and loading. |
| 156 | +- **Incremental Migration**: Migrate historical data first, then recent data during cutover. |
| 157 | +- **Data Validation**: Implement checksums or row counts to verify migration completeness. |
| 158 | + |
| 159 | +## Application code changes |
| 160 | + |
| 161 | +### SQL syntax differences |
| 162 | + |
| 163 | +- **String Concatenation**: MySQL uses `CONCAT()` function, PostgreSQL uses `||` operator. |
| 164 | +- **Date Functions**: Functions like `DATE_ADD()` in MySQL become `date + interval` in PostgreSQL. |
| 165 | +- **LIMIT/OFFSET**: MySQL uses `LIMIT x,y` while PostgreSQL uses `LIMIT y OFFSET x`. |
| 166 | +- **Group By Handling**: PostgreSQL requires all non-aggregated columns in the SELECT list to appear in the GROUP BY clause. |
| 167 | +- **Boolean Values**: MySQL uses 0/1, PostgreSQL uses true/false. |
| 168 | +- **REPLACE INTO**: PostgreSQL doesn't support this MySQL shorthand; use DELETE + INSERT or upsert with ON CONFLICT. |
| 169 | + |
| 170 | +### Connection string |
| 171 | + |
| 172 | +- **Connection Libraries**: Some libraries are database-specific and need replacement. |
| 173 | +- **Connection Strings**: Format differs between MySQL and PostgreSQL. |
| 174 | +- **Connection Pooling**: Configuration parameters differ between systems. |
| 175 | + |
| 176 | +Example MySQL connection string: |
| 177 | + |
| 178 | +```java |
| 179 | +mysql://user:password@host:3306/database |
| 180 | +``` |
| 181 | + |
| 182 | +Equivalent PostgreSQL connection string: |
| 183 | + |
| 184 | +```java |
| 185 | +postgresql://user:password@host:5432/database |
| 186 | +``` |
| 187 | + |
| 188 | +### ORM configurations |
| 189 | + |
| 190 | +- **Dialect Configuration**: Change the database dialect to PostgreSQL. |
| 191 | +- **Type Mappings**: Update custom type mappings to match PostgreSQL types. |
| 192 | +- **Query Generation**: Some ORMs generate different SQL for different databases. |
| 193 | + |
| 194 | +Example changes for popular ORMs: |
| 195 | + |
| 196 | +**Hibernate (Java)**: |
| 197 | + |
| 198 | +```java |
| 199 | +// MySQL |
| 200 | +properties.setProperty("hibernate.dialect", "org.hibernate.dialect.MySQLDialect"); |
| 201 | +// PostgreSQL |
| 202 | +properties.setProperty("hibernate.dialect", "org.hibernate.dialect.PostgreSQLDialect"); |
| 203 | +``` |
| 204 | + |
| 205 | +**Sequelize (Node.js)**: |
| 206 | + |
| 207 | +```javascript |
| 208 | +// MySQL |
| 209 | +const sequelize = new Sequelize('database', 'username', 'password', { |
| 210 | + dialect: 'mysql', |
| 211 | +}); |
| 212 | +// PostgreSQL |
| 213 | +const sequelize = new Sequelize('database', 'username', 'password', { |
| 214 | + dialect: 'postgres', |
| 215 | +}); |
| 216 | +``` |
| 217 | + |
| 218 | +### Transaction management |
| 219 | + |
| 220 | +- **Default Isolation Level**: PostgreSQL uses `Read Committed` by default, while MySQL traditionally used `Repeatable Read`. |
| 221 | +- **Locking Behavior**: PostgreSQL's approach to row locking differs from MySQL's. MySQL uses next-key locking in InnoDB to prevent phantom reads. PostgreSQL uses multi-version concurrency control (MVCC) without next-key locking. This can lead to different behavior in highly concurrent applications. |
| 222 | +- **Serialization Failures**: PostgreSQL may throw serialization failures that MySQL wouldn't. |
| 223 | + |
| 224 | +### Error handling |
| 225 | + |
| 226 | +- **Error Codes**: Different numeric codes for similar errors. |
| 227 | +- **Constraint Violations**: Different formats for constraint violation messages. |
| 228 | +- **Connection Errors**: Different error handling for connection issues. |
| 229 | + |
| 230 | +### Database-specific features |
| 231 | + |
| 232 | +If your application uses MySQL-specific features, alternatives must be implemented: |
| 233 | + |
| 234 | +- **Full-Text Search**: Replace MySQL's full-text search with PostgreSQL's text search capabilities. |
| 235 | +- **Stored Procedures**: Rewrite in PL/pgSQL syntax. |
| 236 | +- **User-Defined Functions**: Convert to PostgreSQL's function syntax. |
| 237 | +- **Triggers**: Update to PostgreSQL's trigger syntax. |
| 238 | + |
| 239 | +## Other notable compatibility issues |
| 240 | + |
| 241 | +Beyond schema and code changes, several other compatibility issues require attention: |
| 242 | + |
| 243 | +### Case sensitivity |
| 244 | + |
| 245 | +- MySQL is typically case-insensitive for table and column names on Windows, but case-sensitive on Unix/Linux. |
| 246 | +- PostgreSQL is always case-sensitive unless identifiers are quoted, and converts unquoted identifiers to lowercase. |
| 247 | +- This can cause unexpected behavior if your application relies on case-insensitive identifiers. |
| 248 | + |
| 249 | +### NULL handling |
| 250 | + |
| 251 | +NULL value handling differs between the systems: |
| 252 | + |
| 253 | +- In MySQL, NULL = NULL returns NULL, while in PostgreSQL, NULL = NULL returns false. |
| 254 | +- MySQL treats empty strings as NULL in some contexts, while PostgreSQL distinguishes between empty strings and NULL. |
| 255 | +- These differences can affect query results and application logic. |
| 256 | + |
| 257 | +### Character sets and collations |
| 258 | + |
| 259 | +- MySQL uses character sets and collations at the server, database, table, and column levels. |
| 260 | +- PostgreSQL uses encoding at the database level and collations at the column level. |
| 261 | +- Default character sets differ: MySQL often defaults to `latin1` (though the sane setup should be `utf8mb4`), while PostgreSQL typically uses `UTF-8`. |
| 262 | + |
| 263 | +## Cutover process |
| 264 | + |
| 265 | +The final phase of migration is the cutover—transitioning production traffic from MySQL to PostgreSQL. Here's a structured approach: |
| 266 | + |
| 267 | +### Cutover strategies |
| 268 | + |
| 269 | +| Strategy | Description | Steps | Pros | Cons | |
| 270 | +| -------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | |
| 271 | +| **Big Bang Cutover** | Switch all traffic at once | 1. Stop all write traffic to MySQL<br/>2. Perform final data synchronization<br/>3. Verify data consistency<br/>4. Update application configuration<br/>5. Restart application services<br/>6. Resume traffic | Simpler to implement;<br/>no need to maintain both systems simultaneously | Higher risk;<br/>longer downtime;<br/>all-or-nothing approach | |
| 272 | +| **Phased Cutover** | Transition traffic gradually | 1. Identify components for independent migration<br/>2. Migrate one component at a time<br/>3. Maintain data synchronization during transition<br/>4. Monitor each component before proceeding<br/>5. Complete when all components are transitioned | Lower risk;<br/>issues affect only part of the system;<br/>easier rollback | More complex;<br/>requires maintaining both systems;<br/>potential data consistency challenges | |
| 273 | +| **Read/Write Split** | Separate read and write operations | 1. Direct reads to PostgreSQL, writes to MySQL<br/>2. Maintain real-time replication<br/>3. Migrate write operations when confident<br/>4. Decommission MySQL after transition | Gradual transition;<br/>reduced risk for read-heavy applications;<br/>easier performance validation | Requires robust replication;<br/>potential replication lag;<br/>complex application changes | |
| 274 | + |
| 275 | +### Zero-downtime approaches |
| 276 | + |
| 277 | +For systems that cannot tolerate downtime: |
| 278 | + |
| 279 | +1. **Dual-Write Pattern**: |
| 280 | + |
| 281 | + - Modify application to write to both MySQL and PostgreSQL |
| 282 | + - Read from MySQL initially |
| 283 | + - Gradually shift reads to PostgreSQL |
| 284 | + - Once confident, stop writing to MySQL |
| 285 | + |
| 286 | +2. **Change Data Capture (CDC)**: |
| 287 | + |
| 288 | + - Use tools like Debezium to capture changes from MySQL |
| 289 | + - Apply changes to PostgreSQL in real-time |
| 290 | + - Switch application connection to PostgreSQL when ready |
| 291 | + |
| 292 | +3. **Proxy-Based Approach**: |
| 293 | + - Implement a database proxy (like ProxySQL or PgBouncer) |
| 294 | + - Configure the proxy to route traffic appropriately during migration |
| 295 | + - Switch routing rules to complete migration |
| 296 | + |
| 297 | +## Common tools |
| 298 | + |
| 299 | +- [pgloader](https://github.com/dimitri/pgloader). A powerful and flexible PostgreSQL migration tool that excels at rapidly loading data into PostgreSQL databases. |
| 300 | +- [Ora2Pg](https://github.com/darold/ora2pg). While primarily designed for Oracle to PostgreSQL migrations, Ora2Pg can also be used to migrate from MySQL to PostgreSQL. |
| 301 | +- pg_dump and pg_restore. These core PostgreSQL utilities are often used in conjunction with other tools. |
| 302 | +- Cloud migration services like [AWS Database Migration Service (DMS)](https://aws.amazon.com/dms/). |
0 commit comments