You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pins package helps you publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues.
24
-
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with dropbox), RStudio connect, Amazon S3, and more.
23
+
The pins package helps you publish data sets, models, and other Python objects, making it easy to share them across projects and with your colleagues.
24
+
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with DropBox), RStudio connect, Amazon S3, and more.
25
25
This vignette will introduce you to the basics of pins.
Every pin lives in a pin *board*, so you must start by creating a pin board.
34
-
In this vignette I'll use a temporary board which is automatically deleted when your python session is over:
34
+
In this vignette I'll use a temporary board which is automatically deleted when your Python session is over:
35
35
36
36
```{python}
37
37
board = board_temp()
38
38
```
39
39
40
-
In real-life, you'd pick a board depending on how you want to share the data.
40
+
In reallife, you'd pick a board depending on how you want to share the data.
41
41
Here are a few options:
42
42
43
43
@@ -51,23 +51,23 @@ board = board_rsconnect() # share data with RStudio Connect
51
51
52
52
## Reading and writing data
53
53
54
-
Once you have a pin board, you can write data to it with `pin_write()`:
54
+
Once you have a pin board, you can write data to it with the `.pin_write()` method:
55
55
56
56
```{python}
57
57
from pins.data import mtcars
58
58
59
59
meta = board.pin_write(mtcars, "mtcars", type="csv")
60
60
```
61
61
62
-
The first argument is the object to save (usually a data frame, but it can be any R object), and the second argument gives the "name" of the pin.
63
-
The name is basically equivalent to a file name: you'll use it when you later want to read the data from the pin.
62
+
The first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the "name" of the pin.
63
+
The name is basically equivalent to a file name; you'll use it when you later want to read the data from the pin.
64
64
The only rule for a pin name is that it can't contain slashes.
65
65
66
66
67
-
As you can see from the output, pins has chosen to save this data to an `.rds` file.
67
+
Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
68
68
But you can choose another option depending on your goals:
69
69
70
-
-`type = "csv"` uses `write.csv()` to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
70
+
-`type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
71
71
-`type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
72
72
73
73
🚧 Data formats TODO 🚧
@@ -88,17 +88,18 @@ That said, most boards transmit pins over HTTP, and this is going to be slow and
88
88
As a general rule of thumb, we don't recommend using pins with files over 500 MB.
89
89
If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.
90
90
91
-
91
+
<!-- #region -->
92
92
```{note}
93
93
If you are using the RStudio Connect board (`board_rsconnect`), then you must specify your pin name as
94
-
`<user_name>/<content_name>`. For example, `hadely/sales-report`.
94
+
`<user_name>/<content_name>`. For example, `hadley/sales-report`.
95
95
```
96
+
<!-- #endregion -->
96
97
97
98
98
99
## Metadata
99
100
100
101
101
-
Every pin is accompanied by some metadata that you can access with pin_meta():
102
+
Every pin is accompanied by some metadata that you can access with `pin_meta()`:
102
103
103
104
```{python}
104
105
board.pin_meta("mtcars")
@@ -139,7 +140,7 @@ While we’ll do our best to keep the automatically generated metadata consisten
139
140
> ⚠️: Warning the examples in this section use joblib to read and write data. Joblib uses the pickle format, and **pickle files are not secure**. Only read pickle files you trust. In order to read pickle files, set the `allow_pickle_read=True` argument. See: https://docs.python.org/3/library/pickle.html.
140
141
141
142
142
-
> ⚠️: versioning is not yet implemented. These docs are copied from R pins.
143
+
> ⚠️: Turning off versioning is not yet implemented; all Python pins are versioned. These docs are copied from R pins.
143
144
144
145
In many situations it's useful to version pins, so that writing to an existing pin does not replace the existing data, but instead adds a new copy.
145
146
There are two ways to turn versioning on:
@@ -186,6 +187,7 @@ board2.pin_read("x", version = version)
186
187
187
188
## 🚧 Reading and writing files
188
189
190
+
> ⚠️: `pin_upload()`and`pin_download()` are not yet implemented in Python. These docs are copied from R pins.
189
191
190
192
So far we've focussed on `pin_write()` and `pin_read()` which work with R objects.
191
193
pins also provides the lower-level `pin_upload()`and`pin_download()` which work with files on disk.
@@ -231,6 +233,7 @@ But you can `pin_download()` something that you've pinned with `pin_write()`:
231
233
232
234
## Caching
233
235
236
+
> ⚠️: `board_url` is not yet implemented in Python. These docs are copied from R pins.
234
237
235
238
The primary purpose of pins is to make it easy to share data.
236
239
But pins is also designed to help you spend as little time as possible downloading data.
Copy file name to clipboardExpand all lines: docs/intro.md
+12-10Lines changed: 12 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,12 +21,12 @@ kernelspec:
21
21
```
22
22
23
23
The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.
24
-
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~Microsoft 365 (OneDrive and SharePoint)~.
24
+
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~~Microsoft 365 (OneDrive and SharePoint)~~.
25
25
Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.
26
26
27
27
## Installation
28
28
29
-
To try out the development version of pins you'll need to install from GitHub:
29
+
To install the released version from PyPI:
30
30
31
31
```shell
32
32
python -m pip install pins
@@ -36,7 +36,7 @@ python -m pip install pins
36
36
37
37
To use the pins package, you must first create a pin board.
38
38
A good place to start is `board_folder()`, which stores pins in a directory you specify.
39
-
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your R session ends.
39
+
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your Python session ends.
40
40
This is great for examples, but obviously you shouldn't use it for real work!
41
41
42
42
```{code-cell} ipython3
@@ -47,23 +47,25 @@ board = board_temp()
47
47
board
48
48
```
49
49
50
-
You can "pin" (save) data to a board with `pin_write()`.
51
-
It takes three arguments: the board to pin to, an object, and a name:
50
+
You can "pin" (save) data to a board with the `.pin_write()` method.
51
+
It requires three arguments: an object, a name, and a pin type:
~As you can see, the data saved as an `.rds` by default~, but depending on what you're saving and who else you want to read it, you might use the `type` argument to instead save it as a `csv`, ~`json`, or `arrow`~ file.
57
+
Above, we saved the data as a CSV, but depending on
58
+
what you’re saving and who else you want to read it, you might use the
59
+
`type` argument to instead save it as a `joblib` or `arrow` file (NOTE: arrow is not yet supported).
58
60
59
-
You can later retrieve the pinned data with `pin_read()`:
61
+
You can later retrieve the pinned data with `.pin_read()`:
60
62
61
63
```{code-cell} ipython3
62
64
board.pin_read("mtcars")
63
65
```
64
66
65
67
A board on your computer is good place to start, but the real power of pins comes when you use a board that's shared with multiple people.
66
-
To get started, you can use `board_folder()` with a directory on a shared drive or in dropbox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:
68
+
To get started, you can use `board_folder()` with a directory on a shared drive or in DropBox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:
67
69
68
70
🚧 TODO: add informational messages shown in display below
69
71
@@ -81,7 +83,7 @@ board.pin_write(tidy_sales_data, "hadley/sales-summary", type = "csv")
81
83
82
84
+++
83
85
84
-
Then, someone else (or an automated Rmd report) can read and use your pin:
86
+
Then, someone else (or an automated report) can read and use your pin:
You can easily control who gets to access the data using the RStudio Connect permissions pane.
96
98
97
-
The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`), Azure's blob storage (`board_azure()`), and Microsoft SharePoint (`board_ms365()`).
99
+
The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`) and Azure's blob storage (`board_azure()`).
98
100
Learn more in [getting started](getting_started.Rmd).
0 commit comments