Skip to content

Commit 6086cd9

Browse files
authored
Merge pull request #194 from rstudio/getting-started-update
Update "Getting Started"
2 parents 486c776 + 0c394b4 commit 6086cd9

File tree

1 file changed

+7
-10
lines changed

1 file changed

+7
-10
lines changed

docs/getting_started.Rmd

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,13 @@ The name is basically equivalent to a file name; you'll use it when you later wa
6565
The only rule for a pin name is that it can't contain slashes.
6666

6767

68-
Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
69-
But you can choose another option depending on your goals:
68+
Above, we saved the data as a CSV, but you can choose another option depending on your goals:
7069

71-
- `type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
72-
- `type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
73-
- `type = "arrow"` uses `pyarrow` to create an arrow/feather file. [Arrow](https://arrow.apache.org) is a modern, language-independent, high-performance file format designed for data science. Not every tool can read arrow files, but support is growing rapidly.
74-
- `type = "json"` uses `json.dump()` to create a `.json` file. Pretty much every programming language can read json files, but they only work well for nested lists.
70+
- `type = "csv"` uses `to_csv()` from pandas to create a CSV file. CSVs are plain text and can be read easily by many applications, but they only support simple columns (e.g. numbers, strings), can take up a lot of disk space, and can be slow to read.
71+
- `type = "parquet"` uses `to_parquet()` from pandas to create a Parquet file. [Parquet](https://parquet.apache.org/) is a modern, language-independent, column-oriented file format for efficient data storage and retrieval. Parquet is an excellent choice for storing tabular data.
72+
- `type = "arrow"` uses `to_feather()` from pandas to create an Arrow/Feather file.
73+
- `type = "joblib"` uses `joblib.dump()` to create a binary Python data file, such as for storing a trained model. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
74+
- `type = "json"` uses `json.dump()` to create a JSON file. Pretty much every programming language can read JSON files, but they only work well for nested lists.
7575

7676
After you've pinned an object, you can read it back with `pin_read()`:
7777

@@ -201,10 +201,7 @@ my_data = board_urls("", {
201201
})
202202
```
203203

204-
You can read this data by combining `pin_download()` with `read.csv()`[^1]:
205-
206-
[^1]: Here I'm using `read.csv()` to the reduce the dependencies of the pins package.
207-
For real code I'd recommend using `data.table::fread()` or `readr::read_csv().`
204+
You can read this data by combining `pin_download()` with `read_csv()` from pandas:
208205

209206
```{python}
210207
fname = my_data.pin_download("penguins")

0 commit comments

Comments
 (0)