Skip to content

Commit fa3526e

Browse files
authored
Merge pull request #1 from cid-harvard/Readme
Update README.md
2 parents 09f8fe4 + 73b53bb commit fa3526e

File tree

1 file changed

+31
-1
lines changed

1 file changed

+31
-1
lines changed

README.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,31 @@
1-
# pandas-to-postgres
1+
# Pandas-to-postgres
2+
3+
Pandas-to-postgres allows you to bulk load the contents of large dataframes into postgres as quickly as possible. The main differences from pandas' `to_sql` functions are:
4+
5+
- Uses `COPY` combined with `to_csv` instead of `execute / executemany`, which runs much faster for large volumes of data
6+
- Uses `COPY FROM STDIN` with `StringIO` to avoid IO overhead to intermediate files. This matters in particular for data stored in unusual formats like HDF, STATA, parquet - common in the scientific world.
7+
- Chunked loading methods to be able to load larger-than-memory tables. In particular the HDF5 functions load data in chunks directly from the file, easily extendible to other formats that support random access by row range.
8+
- Removes indexing overhead by automatically detecting and dropping indexes before load, and then re-creating them afterwards
9+
- Loads separate tables in parallel using multiprocessing
10+
- Hooks to modify data as it's loaded
11+
12+
# Dependencies
13+
14+
- Python 3
15+
- psycopg2 (for the low level COPY from stdin)
16+
- sqlalchemy (for reflection for indexes)
17+
- pandas
18+
19+
# Usage Example
20+
21+
```python3
22+
23+
from pandas_to_postgres import ...
24+
25+
# already loaded dataframe
26+
...
27+
28+
# HDF from file
29+
...
30+
31+
```

0 commit comments

Comments
 (0)