Skip to content

Commit 17f9467

Browse files
committed
Bumping version to 0.0.12
1 parent 86c9a9e commit 17f9467

File tree

14 files changed

+49
-13
lines changed

14 files changed

+49
-13
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,23 @@ print(cluster_id)
285285
## Diving Deep
286286

287287

288+
### Parallelism, Non-picklable objects and GeoPandas
289+
290+
AWS Data Wrangler tries to parallelize everything that is possible (I/O and CPU bound task).
291+
You can control the parallelism level using the parameters:
292+
293+
- **procs_cpu_bound**: number of processes that can be used in single node applications for CPU bound case (Default: os.cpu_count())
294+
- **procs_io_bound**: number of processes that can be used in single node applications for I/O bound cases (Default: os.cpu_count() * PROCS_IO_BOUND_FACTOR)
295+
296+
Both can be defined on Session level or directly in the functions.
297+
298+
Some special cases will not work with parallelism:
299+
300+
- GeoPandas
301+
- Columns with non-picklable objects
302+
303+
To handle that use `procs_cpu_bound=1` and avoid the distribution of the dataframe.
304+
288305
### Pandas with null object columns (UndetectedType exception)
289306

290307
Pandas has a too generic "data type" named object. Pandas object columns can be string, dates, etc, etc, etc.

building/build-docs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cd ..

building/build-glue-egg.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cd ..

building/build-glue-wheel.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cd ..

building/build-image.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cp ../requirements.txt .

building/build-lambda-layer.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
# Go to home

building/deploy-source.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cd ..

building/open-image.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

33
AWS_ACCESS_KEY_ID=$(aws --profile default configure get aws_access_key_id)
44
AWS_SECRET_ACCESS_KEY=$(aws --profile default configure get aws_secret_access_key)

building/publish.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22
set -e
33

44
cd ..

docs/source/divingdeep.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,24 @@
33
Diving Deep
44
===========
55

6+
Parallelism, Non-picklable objects and GeoPandas
7+
------------------------------------------------
8+
9+
AWS Data Wrangler tries to parallelize everything that is possible (I/O and CPU bound task).
10+
You can control the parallelism level using the parameters:
11+
12+
- procs_cpu_bound: number of processes that can be used in single node applications for CPU bound case (Default: os.cpu_count())
13+
- procs_io_bound: number of processes that can be used in single node applications for I/O bound cases (Default: os.cpu_count() * PROCS_IO_BOUND_FACTOR)
14+
15+
Both can be defined on Session level or directly in the functions.
16+
17+
Some special cases will not work with parallelism:
18+
19+
- GeoPandas
20+
- Columns with non-picklable objects
21+
22+
To handle that use `procs_cpu_bound=1` and avoid the distribution of the dataframe.
23+
624
Pandas with null object columns (UndetectedType exception)
725
----------------------------------------------------------
826

0 commit comments

Comments
 (0)