Skip to content

Commit 09b1a32

Browse files
authored
Minor change in dask docs from dask maintainer (#1568)
* patrick feedback on daks docs * Update datasets-dask.md
1 parent 379eaf6 commit 09b1a32

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

docs/hub/datasets-dask.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,13 @@ def dummy_count_words(texts):
7171
return pd.Series([len(text.split(" ")) for text in texts])
7272
```
7373

74+
or a similar function using pandas string methods (faster):
75+
76+
```python
77+
def dummy_count_words(texts):
78+
return texts.str.count(" ")
79+
```
80+
7481
In pandas you can use this function on a text column:
7582

7683
```python
@@ -116,3 +123,29 @@ This is useful when you want to manipulate a subset of the columns or for analyt
116123
# for the filtering and computation and skip the other columns.
117124
df.token_count.mean().compute()
118125
```
126+
127+
## Client
128+
129+
Most features in `dask` are optimized for a cluster or a local `Client` to launch the parallel computations:
130+
131+
```python
132+
import dask.dataframe as dd
133+
from distributed import Client
134+
135+
if __name__ == "__main__": # needed for creating new processes
136+
client = Client()
137+
df = dd.read_parquet(...)
138+
...
139+
```
140+
141+
For local usage, the `Client` uses a Dask `LocalCluster` with multiprocessing by default. You can manually configure the multiprocessing of `LocalCluster` with
142+
143+
```python
144+
from dask.distributed import Client, LocalCluster
145+
cluster = LocalCluster(n_workers=8, threads_per_worker=8)
146+
client = Client(cluster)
147+
```
148+
149+
Note that if you use the default threaded scheduler locally without `Client`, a DataFrame can become slower after certain operations (more details [here](https://github.com/dask/dask-expr/issues/1181)).
150+
151+
Find more information on setting up a local or cloud cluster in the [Deploying Dask documentation](https://docs.dask.org/en/latest/deploying.html).

0 commit comments

Comments
 (0)