Skip to content

Commit d8e8bd1

Browse files
committed
Adding Relational Database approaches to docs
1 parent 220f35c commit d8e8bd1

File tree

2 files changed

+32
-0
lines changed

2 files changed

+32
-0
lines changed

README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,21 @@ outputs = wr.sagemaker.get_job_outputs("JOB_NAME")
422422

423423
## Diving Deep
424424

425+
### Relational Databases (SQL) - (Oracle, PostgreSQL, MySQL, Microsoft SQL Server, etc)
426+
427+
Pandas and PySpark already have great interfaces to handle integrations with relational databases:
428+
429+
1. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
430+
2. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
431+
3. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameReader.jdbc
432+
4. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameWriter.jdbc
433+
434+
AWS Data Wrangler does not want to reinvent the wheel. And will only implement the integrations not covered by the natives Pandas and PySpark APIs.
435+
436+
E.g.:
437+
* MySQL Aurora [LOAD](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html) and [UNLOAD](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html) through S3
438+
* PostgreSQL Aurora [COPY through aws_s3 extension and the S3 service itself](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Migrating.html#USER_PostgreSQL.S3Import)
439+
425440
### Parallelism, Non-picklable objects and GeoPandas
426441

427442
AWS Data Wrangler tries to parallelize everything that is possible (I/O and CPU bound task).

docs/source/divingdeep.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,23 @@
33
Diving Deep
44
===========
55

6+
Relational Databases (SQL) - (Oracle, PostgreSQL, MySQL, Microsoft SQL Server, etc)
7+
-----------------------------------------------------------------------------------
8+
9+
Pandas and PySpark already have great interfaces to handle integrations with relational databases:
10+
11+
1. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
12+
2. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
13+
3. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameReader.jdbc
14+
4. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=jdbc#pyspark.sql.DataFrameWriter.jdbc
15+
16+
AWS Data Wrangler does not want to reinvent the wheel. And will only implement the integrations not covered by the natives Pandas and PySpark APIs.
17+
18+
E.g.:
19+
20+
- MySQL Aurora `LOAD <https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html>`_ and `UNLOAD <https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html>`_ through S3
21+
- PostgreSQL Aurora `COPY through aws_s3 extension and the S3 service itself <https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Migrating.html#USER_PostgreSQL.S3Import>`_
22+
623
Parallelism, Non-picklable objects and GeoPandas
724
------------------------------------------------
825

0 commit comments

Comments
 (0)