Merge pull request #194 from rstudio/getting-started-update

machow · web-flow · commit 6086cd9a389a · 2023-04-10T15:18:26.000-04:00
Update "Getting Started"
diff --git a/docs/getting_started.Rmd b/docs/getting_started.Rmd
@@ -65,13 +65,13 @@ The name is basically equivalent to a file name; you'll use it when you later wa
 The only rule for a pin name is that it can't contain slashes.
 
 
-Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
-But you can choose another option depending on your goals:
+Above, we saved the data as a CSV, but you can choose another option depending on your goals:
 
--   `type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
--   `type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
--   `type = "arrow"` uses `pyarrow` to create an arrow/feather file. [Arrow](https://arrow.apache.org) is a modern, language-independent, high-performance file format designed for data science. Not every tool can read arrow files, but support is growing rapidly.
--   `type = "json"` uses `json.dump()` to create a `.json` file. Pretty much every programming language can read json files, but they only work well for nested lists.
+-   `type = "csv"` uses `to_csv()` from pandas to create a CSV file. CSVs are plain text and can be read easily by many applications, but they only support simple columns (e.g. numbers, strings), can take up a lot of disk space, and can be slow to read.
+-   `type = "parquet"` uses `to_parquet()` from pandas to create a Parquet file. [Parquet](https://parquet.apache.org/) is a modern, language-independent, column-oriented file format for efficient data storage and retrieval. Parquet is an excellent choice for storing tabular data.
+-   `type = "arrow"` uses `to_feather()` from pandas to create an Arrow/Feather file.
+-   `type = "joblib"` uses `joblib.dump()` to create a binary Python data file, such as for storing a trained model. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
+-   `type = "json"` uses `json.dump()` to create a JSON file. Pretty much every programming language can read JSON files, but they only work well for nested lists.
 
 After you've pinned an object, you can read it back with `pin_read()`:
 
@@ -201,10 +201,7 @@ my_data = board_urls("", {
 })
 ```
 
-You can read this data by combining `pin_download()` with `read.csv()`[^1]:
-
-[^1]: Here I'm using `read.csv()` to the reduce the dependencies of the pins package.
-    For real code I'd recommend using `data.table::fread()` or `readr::read_csv().`
+You can read this data by combining `pin_download()` with `read_csv()` from pandas:
 
 ```{python}
 fname = my_data.pin_download("penguins")