Skip to content

Commit c201063

Browse files
committed
docs: mysql to postgres
1 parent 48a887f commit c201063

File tree

2 files changed

+304
-0
lines changed

2 files changed

+304
-0
lines changed

content/reference/migration/_layout.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@
22
---
33

44
## [Overview](/reference/migration/overview)
5+
6+
## [How to migrate from MySQL to Postgres](/reference/migration/how-to-migrate-database-from-mysql-to-postgres)
Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
---
2+
title: 'How to Migrate database from MySQL to PostgreSQL'
3+
---
4+
5+
## When you should consider migrating from MySQL to PostgreSQL
6+
7+
### Advanced feature requirements
8+
9+
- **Complex Data Types**: PostgreSQL provides robust support for JSON, arrays, hstore, and custom types, making it ideal for applications with complex data structures.
10+
- **Geospatial Support**: PostgreSQL with PostGIS offers superior geospatial capabilities compared to MySQL's spatial extensions.
11+
12+
### Scalability needs
13+
14+
- **Table Partitioning**: PostgreSQL's declarative partitioning is more flexible and powerful than MySQL's partitioning system.
15+
- **Parallel Query Execution**: PostgreSQL can utilize multiple CPU cores for single queries, improving performance for complex analytical workloads.
16+
- **Advanced Indexing**: PostgreSQL supports more index types (B-tree, Hash, GiST, SP-GiST, GIN, and BRIN) and offers partial and expression indexes.
17+
18+
### Licensing concerns
19+
20+
PostgreSQL offers freedoms that MySQL's GPL version doesn't:
21+
22+
- **Permissive License**: PostgreSQL uses a PostgreSQL License (similar to MIT/BSD), which:
23+
24+
- Allows unrestricted use in proprietary applications
25+
- Doesn't require source code disclosure
26+
- Permits creating closed-source derivatives
27+
28+
- **Unrestricted Embedding**: You can embed PostgreSQL in commercial products without licensing fees or source code obligations.
29+
- **Fork Freedom**: You can create proprietary forks of PostgreSQL without license obligations.
30+
- **No Corporate Control**: PostgreSQL is developed by a community organization rather than a single company, reducing concerns about commercial interests affecting the license.
31+
32+
## When you should think twice
33+
34+
- If your application handles a high volume of write operations, PostgreSQL may perform less efficiently. Check [Uber's switch from PostgreSQL to MySQL](https://www.uber.com/en-SG/blog/postgres-to-mysql-migration/)
35+
- All-in-one vs best of breed. While PostgreSQL is like an all-in-one database thanks to its extensible architecture, it may be better to let your relational database handle transactional processing and use more specialized systems for analytical processing, full-text search, etc. If you are considering migrating databases, yours has likely reached a certain scale and encountered bottlenecks. The all-in-one approach is more desirable when you’re just getting started.
36+
37+
## MySQL and PostgreSQL schema differences
38+
39+
### Data types
40+
41+
While many data types are similar, important differences exist:
42+
43+
| MySQL Type | PostgreSQL Equivalent | Notes |
44+
| ------------------------------------ | ------------------------ | --------------------------------------------------------------- |
45+
| INT | INTEGER | Similar functionality |
46+
| BIGINT | BIGINT | Similar functionality |
47+
| FLOAT | REAL | PostgreSQL's REAL is equivalent to MySQL's FLOAT |
48+
| DOUBLE | DOUBLE PRECISION | Similar functionality |
49+
| DECIMAL | NUMERIC | Similar functionality |
50+
| DATETIME | TIMESTAMP | PostgreSQL's TIMESTAMP has no automatic initialization |
51+
| TIMESTAMP | TIMESTAMP WITH TIME ZONE | PostgreSQL handles time zones more explicitly |
52+
| ENUM | ENUM or CHECK constraint | PostgreSQL's ENUM is a custom type, not a string constraint |
53+
| SET | Array or JSONB | No direct equivalent; arrays or JSONB can replace functionality |
54+
| TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT | TEXT | PostgreSQL has a single TEXT type with no practical size limit |
55+
| VARCHAR | VARCHAR | PostgreSQL's VARCHAR has no performance penalty for full length |
56+
| BLOB | BYTEA | Different functions for manipulation |
57+
58+
### Constraints and keys
59+
60+
PostgreSQL handles constraints differently:
61+
62+
- **Primary Keys**: Both systems support primary keys, but PostgreSQL automatically creates an index for each primary key.
63+
- **Foreign Keys**: PostgreSQL enforces foreign key constraints more strictly and offers more deferral options.
64+
- **CHECK Constraints**: PostgreSQL fully enforces CHECK constraints, while MySQL historically stored but ignored them (this has improved in recent MySQL versions).
65+
- **Unique Constraints**: Both support unique constraints, but PostgreSQL distinguishes between unique constraints and unique indexes.
66+
67+
### Sequences and auto-increment
68+
69+
- MySQL uses `AUTO_INCREMENT` for generating sequential values.
70+
- PostgreSQL uses sequences, typically with `SERIAL` or `IDENTITY` columns.
71+
- Migration requires converting `AUTO_INCREMENT` to PostgreSQL sequences or identity columns.
72+
73+
### Default values
74+
75+
- PostgreSQL supports more complex default values, including functions.
76+
- MySQL's `CURRENT_TIMESTAMP` default for `DATETIME` columns becomes `CURRENT_TIMESTAMP` in PostgreSQL.
77+
- PostgreSQL allows defaults on TEXT columns, which some MySQL versions restricted.
78+
79+
### Schema naming and case sensitivity
80+
81+
- PostgreSQL is case-sensitive for identifiers unless quoted, while MySQL's case sensitivity depends on the operating system and configuration.
82+
- PostgreSQL automatically converts unquoted identifiers to lowercase, which can cause issues during migration.
83+
- PostgreSQL uses schemas (similar to namespaces) more extensively than MySQL's databases.
84+
85+
### Stored procedures and functions
86+
87+
- PostgreSQL uses PL/pgSQL as its primary procedural language, while MySQL uses its own syntax.
88+
- PostgreSQL supports multiple procedural languages (PL/pgSQL, PL/Python, PL/Perl, etc.).
89+
90+
### Views and materialized views
91+
92+
- Both support views, but PostgreSQL also offers materialized views that store data physically.
93+
- PostgreSQL's view updating capabilities are more advanced.
94+
- PostgreSQL allows indexing of materialized views.
95+
96+
## Data migration strategies
97+
98+
| Strategy | Pros | Cons |
99+
| --------------------------- | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
100+
| Direct export/import | Simple to implement for small databases | Requires downtime<br/>Challenging for large databases<br/>Manual conversion may be needed |
101+
| ETL process | Highly customizable<br/>Can handle complex transformations<br/>Can be parallelized | Requires more development effort<br/>Potentially complex to set up |
102+
| Replication-based migration | Minimal downtime<br/>Continuous validation possible<br/>Phased migration | More complex setup<br/>Requires monitoring<br/>Potential replication lag |
103+
| Cloud migration services | Managed service<br/>Often includes schema conversion<br/>Typically supports continuous replication | Vendor lock-in<br/>Potential costs<br/>May require cloud-to-cloud or on-premises-to-cloud networking |
104+
105+
### Direct export/import
106+
107+
The simplest approach involves exporting data from MySQL and importing it into PostgreSQL:
108+
109+
1. **Export MySQL data** using mysqldump:
110+
111+
```bash
112+
mysqldump --compatible=postgresql --default-character-set=utf8 \
113+
--no-create-info --complete-insert --extended-insert --single-transaction \
114+
--skip-triggers --routines=0 --skip-tz-utc \
115+
database_name > mysql_data.sql
116+
```
117+
118+
2. **Convert the SQL** to PostgreSQL format using tools like [pgloader](https://github.com/dimitri/pgloader) or custom scripts.
119+
120+
3. **Import into PostgreSQL** using psql:
121+
```bash
122+
psql -d database_name -f converted_data.sql
123+
```
124+
125+
### ETL process
126+
127+
For more complex migrations, an Extract-Transform-Load (ETL) process offers greater control:
128+
129+
1. **Extract** data from MySQL into an intermediate format (CSV, JSON, etc.).
130+
2. **Transform** the data to match PostgreSQL's requirements (data types, constraints, etc.).
131+
3. **Load** the transformed data into PostgreSQL.
132+
133+
### Replication-based migration
134+
135+
For minimal downtime, consider a replication-based approach:
136+
137+
1. **Set up initial data copy** using tools like pgloader or AWS DMS.
138+
2. **Establish ongoing replication** from MySQL to PostgreSQL using tools likes [Debezium](https://debezium.io/).
139+
3. **Validate data consistency** between the systems.
140+
4. **Cut over** to PostgreSQL when ready.
141+
142+
### Cloud migration services
143+
144+
Cloud providers offer specialized services for database migration:
145+
146+
- **AWS Database Migration Service**
147+
- **Google Cloud Database Migration Service**
148+
- **Azure Database Migration Service**
149+
150+
### Handling large datasets
151+
152+
For very large databases, consider these additional strategies:
153+
154+
- **Partitioned Migration**: Migrate data in chunks based on logical partitions.
155+
- **Parallel Processing**: Use multiple threads or processes for data extraction and loading.
156+
- **Incremental Migration**: Migrate historical data first, then recent data during cutover.
157+
- **Data Validation**: Implement checksums or row counts to verify migration completeness.
158+
159+
## Application code changes
160+
161+
### SQL syntax differences
162+
163+
- **String Concatenation**: MySQL uses `CONCAT()` function, PostgreSQL uses `||` operator.
164+
- **Date Functions**: Functions like `DATE_ADD()` in MySQL become `date + interval` in PostgreSQL.
165+
- **LIMIT/OFFSET**: MySQL uses `LIMIT x,y` while PostgreSQL uses `LIMIT y OFFSET x`.
166+
- **Group By Handling**: PostgreSQL requires all non-aggregated columns in the SELECT list to appear in the GROUP BY clause.
167+
- **Boolean Values**: MySQL uses 0/1, PostgreSQL uses true/false.
168+
- **REPLACE INTO**: PostgreSQL doesn't support this MySQL shorthand; use DELETE + INSERT or upsert with ON CONFLICT.
169+
170+
### Connection string
171+
172+
- **Connection Libraries**: Some libraries are database-specific and need replacement.
173+
- **Connection Strings**: Format differs between MySQL and PostgreSQL.
174+
- **Connection Pooling**: Configuration parameters differ between systems.
175+
176+
Example MySQL connection string:
177+
178+
```java
179+
mysql://user:password@host:3306/database
180+
```
181+
182+
Equivalent PostgreSQL connection string:
183+
184+
```java
185+
postgresql://user:password@host:5432/database
186+
```
187+
188+
### ORM configurations
189+
190+
- **Dialect Configuration**: Change the database dialect to PostgreSQL.
191+
- **Type Mappings**: Update custom type mappings to match PostgreSQL types.
192+
- **Query Generation**: Some ORMs generate different SQL for different databases.
193+
194+
Example changes for popular ORMs:
195+
196+
**Hibernate (Java)**:
197+
198+
```java
199+
// MySQL
200+
properties.setProperty("hibernate.dialect", "org.hibernate.dialect.MySQLDialect");
201+
// PostgreSQL
202+
properties.setProperty("hibernate.dialect", "org.hibernate.dialect.PostgreSQLDialect");
203+
```
204+
205+
**Sequelize (Node.js)**:
206+
207+
```javascript
208+
// MySQL
209+
const sequelize = new Sequelize('database', 'username', 'password', {
210+
dialect: 'mysql',
211+
});
212+
// PostgreSQL
213+
const sequelize = new Sequelize('database', 'username', 'password', {
214+
dialect: 'postgres',
215+
});
216+
```
217+
218+
### Transaction management
219+
220+
- **Default Isolation Level**: PostgreSQL uses `Read Committed` by default, while MySQL traditionally used `Repeatable Read`.
221+
- **Locking Behavior**: PostgreSQL's approach to row locking differs from MySQL's. MySQL uses next-key locking in InnoDB to prevent phantom reads. PostgreSQL uses multi-version concurrency control (MVCC) without next-key locking. This can lead to different behavior in highly concurrent applications.
222+
- **Serialization Failures**: PostgreSQL may throw serialization failures that MySQL wouldn't.
223+
224+
### Error handling
225+
226+
- **Error Codes**: Different numeric codes for similar errors.
227+
- **Constraint Violations**: Different formats for constraint violation messages.
228+
- **Connection Errors**: Different error handling for connection issues.
229+
230+
### Database-specific features
231+
232+
If your application uses MySQL-specific features, alternatives must be implemented:
233+
234+
- **Full-Text Search**: Replace MySQL's full-text search with PostgreSQL's text search capabilities.
235+
- **Stored Procedures**: Rewrite in PL/pgSQL syntax.
236+
- **User-Defined Functions**: Convert to PostgreSQL's function syntax.
237+
- **Triggers**: Update to PostgreSQL's trigger syntax.
238+
239+
## Other notable compatibility issues
240+
241+
Beyond schema and code changes, several other compatibility issues require attention:
242+
243+
### Case sensitivity
244+
245+
- MySQL is typically case-insensitive for table and column names on Windows, but case-sensitive on Unix/Linux.
246+
- PostgreSQL is always case-sensitive unless identifiers are quoted, and converts unquoted identifiers to lowercase.
247+
- This can cause unexpected behavior if your application relies on case-insensitive identifiers.
248+
249+
### NULL handling
250+
251+
NULL value handling differs between the systems:
252+
253+
- In MySQL, NULL = NULL returns NULL, while in PostgreSQL, NULL = NULL returns false.
254+
- MySQL treats empty strings as NULL in some contexts, while PostgreSQL distinguishes between empty strings and NULL.
255+
- These differences can affect query results and application logic.
256+
257+
### Character sets and collations
258+
259+
- MySQL uses character sets and collations at the server, database, table, and column levels.
260+
- PostgreSQL uses encoding at the database level and collations at the column level.
261+
- Default character sets differ: MySQL often defaults to `latin1` (though the sane setup should be `utf8mb4`), while PostgreSQL typically uses `UTF-8`.
262+
263+
## Cutover process
264+
265+
The final phase of migration is the cutover—transitioning production traffic from MySQL to PostgreSQL. Here's a structured approach:
266+
267+
### Cutover strategies
268+
269+
| Strategy | Description | Steps | Pros | Cons |
270+
| -------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
271+
| **Big Bang Cutover** | Switch all traffic at once | 1. Stop all write traffic to MySQL<br/>2. Perform final data synchronization<br/>3. Verify data consistency<br/>4. Update application configuration<br/>5. Restart application services<br/>6. Resume traffic | Simpler to implement;<br/>no need to maintain both systems simultaneously | Higher risk;<br/>longer downtime;<br/>all-or-nothing approach |
272+
| **Phased Cutover** | Transition traffic gradually | 1. Identify components for independent migration<br/>2. Migrate one component at a time<br/>3. Maintain data synchronization during transition<br/>4. Monitor each component before proceeding<br/>5. Complete when all components are transitioned | Lower risk;<br/>issues affect only part of the system;<br/>easier rollback | More complex;<br/>requires maintaining both systems;<br/>potential data consistency challenges |
273+
| **Read/Write Split** | Separate read and write operations | 1. Direct reads to PostgreSQL, writes to MySQL<br/>2. Maintain real-time replication<br/>3. Migrate write operations when confident<br/>4. Decommission MySQL after transition | Gradual transition;<br/>reduced risk for read-heavy applications;<br/>easier performance validation | Requires robust replication;<br/>potential replication lag;<br/>complex application changes |
274+
275+
### Zero-downtime approaches
276+
277+
For systems that cannot tolerate downtime:
278+
279+
1. **Dual-Write Pattern**:
280+
281+
- Modify application to write to both MySQL and PostgreSQL
282+
- Read from MySQL initially
283+
- Gradually shift reads to PostgreSQL
284+
- Once confident, stop writing to MySQL
285+
286+
2. **Change Data Capture (CDC)**:
287+
288+
- Use tools like Debezium to capture changes from MySQL
289+
- Apply changes to PostgreSQL in real-time
290+
- Switch application connection to PostgreSQL when ready
291+
292+
3. **Proxy-Based Approach**:
293+
- Implement a database proxy (like ProxySQL or PgBouncer)
294+
- Configure the proxy to route traffic appropriately during migration
295+
- Switch routing rules to complete migration
296+
297+
## Common tools
298+
299+
- [pgloader](https://github.com/dimitri/pgloader). A powerful and flexible PostgreSQL migration tool that excels at rapidly loading data into PostgreSQL databases.
300+
- [Ora2Pg](https://github.com/darold/ora2pg). While primarily designed for Oracle to PostgreSQL migrations, Ora2Pg can also be used to migrate from MySQL to PostgreSQL.
301+
- pg_dump and pg_restore. These core PostgreSQL utilities are often used in conjunction with other tools.
302+
- Cloud migration services like [AWS Database Migration Service (DMS)](https://aws.amazon.com/dms/).

0 commit comments

Comments
 (0)