Skip to content

Commit f5cd3a3

Browse files
authored
Merge pull request #128 from juliasilge/refine-docs
Refine docs + README
2 parents 5828977 + 34cddc1 commit f5cd3a3

File tree

6 files changed

+40
-31
lines changed

6 files changed

+40
-31
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,7 @@ dmypy.json
133133

134134
# Pyre type checker
135135
.pyre/
136+
137+
# RStudio
138+
.Rproj.user
139+
*.Rproj

README.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ pd.set_option("display.notebook_repr_html", False)
99

1010
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/pins-python/HEAD)
1111

12-
The pins package publishes data, models, and other python objects, making it
12+
The pins package publishes data, models, and other Python objects, making it
1313
easy to share them across projects and with your colleagues. You can pin
1414
objects to a variety of pin *boards*, including folders (to share on a
1515
networked drive or with services like DropBox), RStudio Connect, and Amazon
@@ -41,7 +41,7 @@ from pins.data import mtcars
4141
board = pins.board_temp()
4242
```
4343

44-
You can pin (save) data to a board with the `.pin_write()` method. It requires three
44+
You can "pin" (save) data to a board with the `.pin_write()` method. It requires three
4545
arguments: an object, a name, and a pin type:
4646

4747
```{python}
@@ -61,7 +61,7 @@ board.pin_read("mtcars")
6161
A board on your computer is good place to start, but the real power of
6262
pins comes when you use a board that’s shared with multiple people. To
6363
get started, you can use `board_folder()` with a directory on a shared
64-
drive or in dropbox, or if you use [RStudio
64+
drive or in DropBox, or if you use [RStudio
6565
Connect](https://www.rstudio.com/products/connect/) you can use
6666
`board_rsconnect()`:
6767

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/machow/pins-python/HEAD)
44

5-
The pins package publishes data, models, and other python objects, making it
5+
The pins package publishes data, models, and other Python objects, making it
66
easy to share them across projects and with your colleagues. You can pin
77
objects to a variety of pin *boards*, including folders (to share on a
88
networked drive or with services like DropBox), RStudio Connect, and Amazon
@@ -35,7 +35,7 @@ from pins.data import mtcars
3535
board = pins.board_temp()
3636
```
3737

38-
You can pin (save) data to a board with the `.pin_write()` method. It requires three
38+
You can "pin" (save) data to a board with the `.pin_write()` method. It requires three
3939
arguments: an object, a name, and a pin type:
4040

4141

@@ -49,7 +49,7 @@ board.pin_write(mtcars.head(), "mtcars", type="csv")
4949

5050

5151

52-
Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220518T150837Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 5, 18, 15, 8, 37, 413288), hash='120a54f7e0818041'), name='mtcars', user={})
52+
Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220526T165625Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 5, 26, 16, 56, 25, 738735), hash='120a54f7e0818041'), name='mtcars', user={})
5353

5454

5555

@@ -79,7 +79,7 @@ board.pin_read("mtcars")
7979
A board on your computer is good place to start, but the real power of
8080
pins comes when you use a board that’s shared with multiple people. To
8181
get started, you can use `board_folder()` with a directory on a shared
82-
drive or in dropbox, or if you use [RStudio
82+
drive or in DropBox, or if you use [RStudio
8383
Connect](https://www.rstudio.com/products/connect/) you can use
8484
`board_rsconnect()`:
8585

docs/api/boards.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Board Methods
22
=============
33

4-
.. currentmodule:: pins
4+
.. currentmodule:: pins.boards
55

66
Constructor
77
-----------

docs/getting_started.Rmd

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ pd.options.display.max_rows = 25
2020
Getting Started
2121
===============
2222

23-
The pins package helps you publish data sets, models, and other R objects, making it easy to share them across projects and with your colleagues.
24-
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with dropbox), RStudio connect, Amazon S3, and more.
23+
The pins package helps you publish data sets, models, and other Python objects, making it easy to share them across projects and with your colleagues.
24+
You can pin objects to a variety of "boards", including local folders (to share on a networked drive or with DropBox), RStudio connect, Amazon S3, and more.
2525
This vignette will introduce you to the basics of pins.
2626

2727
```{python}
@@ -31,13 +31,13 @@ from pins import board_local, board_folder, board_temp, board_urls
3131
## Getting started
3232

3333
Every pin lives in a pin *board*, so you must start by creating a pin board.
34-
In this vignette I'll use a temporary board which is automatically deleted when your python session is over:
34+
In this vignette I'll use a temporary board which is automatically deleted when your Python session is over:
3535

3636
```{python}
3737
board = board_temp()
3838
```
3939

40-
In real-life, you'd pick a board depending on how you want to share the data.
40+
In real life, you'd pick a board depending on how you want to share the data.
4141
Here are a few options:
4242

4343

@@ -51,23 +51,23 @@ board = board_rsconnect() # share data with RStudio Connect
5151

5252
## Reading and writing data
5353

54-
Once you have a pin board, you can write data to it with `pin_write()`:
54+
Once you have a pin board, you can write data to it with the `.pin_write()` method:
5555

5656
```{python}
5757
from pins.data import mtcars
5858
5959
meta = board.pin_write(mtcars, "mtcars", type="csv")
6060
```
6161

62-
The first argument is the object to save (usually a data frame, but it can be any R object), and the second argument gives the "name" of the pin.
63-
The name is basically equivalent to a file name: you'll use it when you later want to read the data from the pin.
62+
The first argument is the object to save (usually a data frame, but it can be any Python object), and the second argument gives the "name" of the pin.
63+
The name is basically equivalent to a file name; you'll use it when you later want to read the data from the pin.
6464
The only rule for a pin name is that it can't contain slashes.
6565

6666

67-
As you can see from the output, pins has chosen to save this data to an `.rds` file.
67+
Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the
6868
But you can choose another option depending on your goals:
6969

70-
- `type = "csv"` uses `write.csv()` to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
70+
- `type = "csv"` uses `to_csv()` from pandas to create a `.csv` file. CSVs can read by any application, but only support simple columns (e.g. numbers, strings, dates), can take up a lot of disk space, and can be slow to read.
7171
- `type = "joblib"` uses `joblib.dump()` to create a binary python data file. See the [joblib docs](https://joblib.readthedocs.io/en/latest/) for more information.
7272

7373
🚧 Data formats TODO 🚧
@@ -88,17 +88,18 @@ That said, most boards transmit pins over HTTP, and this is going to be slow and
8888
As a general rule of thumb, we don't recommend using pins with files over 500 MB.
8989
If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.
9090

91-
91+
<!-- #region -->
9292
```{note}
9393
If you are using the RStudio Connect board (`board_rsconnect`), then you must specify your pin name as
94-
`<user_name>/<content_name>`. For example, `hadely/sales-report`.
94+
`<user_name>/<content_name>`. For example, `hadley/sales-report`.
9595
```
96+
<!-- #endregion -->
9697

9798

9899
## Metadata
99100

100101

101-
Every pin is accompanied by some metadata that you can access with pin_meta():
102+
Every pin is accompanied by some metadata that you can access with `pin_meta()`:
102103

103104
```{python}
104105
board.pin_meta("mtcars")
@@ -139,7 +140,7 @@ While we’ll do our best to keep the automatically generated metadata consisten
139140
> ⚠️: Warning the examples in this section use joblib to read and write data. Joblib uses the pickle format, and **pickle files are not secure**. Only read pickle files you trust. In order to read pickle files, set the `allow_pickle_read=True` argument. See: https://docs.python.org/3/library/pickle.html.
140141
141142

142-
> ⚠️: versioning is not yet implemented. These docs are copied from R pins.
143+
> ⚠️: Turning off versioning is not yet implemented; all Python pins are versioned. These docs are copied from R pins.
143144
144145
In many situations it's useful to version pins, so that writing to an existing pin does not replace the existing data, but instead adds a new copy.
145146
There are two ways to turn versioning on:
@@ -186,6 +187,7 @@ board2.pin_read("x", version = version)
186187

187188
## 🚧 Reading and writing files
188189

190+
> ⚠️: `pin_upload()` and `pin_download()` are not yet implemented in Python. These docs are copied from R pins.
189191

190192
So far we've focussed on `pin_write()` and `pin_read()` which work with R objects.
191193
pins also provides the lower-level `pin_upload()` and `pin_download()` which work with files on disk.
@@ -231,6 +233,7 @@ But you can `pin_download()` something that you've pinned with `pin_write()`:
231233

232234
## Caching
233235

236+
> ⚠️: `board_url` is not yet implemented in Python. These docs are copied from R pins.
234237
235238
The primary purpose of pins is to make it easy to share data.
236239
But pins is also designed to help you spend as little time as possible downloading data.

docs/intro.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,12 @@ kernelspec:
2121
```
2222

2323
The pins package publishes data, models, and other R objects, making it easy to share them across projects and with your colleagues.
24-
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~Microsoft 365 (OneDrive and SharePoint)~.
24+
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), RStudio Connect, Amazon S3, Azure storage and ~~Microsoft 365 (OneDrive and SharePoint)~~.
2525
Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.
2626

2727
## Installation
2828

29-
To try out the development version of pins you'll need to install from GitHub:
29+
To install the released version from PyPI:
3030

3131
```shell
3232
python -m pip install pins
@@ -36,7 +36,7 @@ python -m pip install pins
3636

3737
To use the pins package, you must first create a pin board.
3838
A good place to start is `board_folder()`, which stores pins in a directory you specify.
39-
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your R session ends.
39+
Here I'll use a special version of `board_folder()` called `board_temp()` which creates a temporary board that's automatically deleted when your Python session ends.
4040
This is great for examples, but obviously you shouldn't use it for real work!
4141

4242
```{code-cell} ipython3
@@ -47,23 +47,25 @@ board = board_temp()
4747
board
4848
```
4949

50-
You can "pin" (save) data to a board with `pin_write()`.
51-
It takes three arguments: the board to pin to, an object, and a name:
50+
You can "pin" (save) data to a board with the `.pin_write()` method.
51+
It requires three arguments: an object, a name, and a pin type:
5252

5353
```{code-cell} ipython3
5454
board.pin_write(mtcars.head(), "mtcars", type="csv")
5555
```
5656

57-
~As you can see, the data saved as an `.rds` by default~, but depending on what you're saving and who else you want to read it, you might use the `type` argument to instead save it as a `csv`, ~`json`, or `arrow`~ file.
57+
Above, we saved the data as a CSV, but depending on
58+
what you’re saving and who else you want to read it, you might use the
59+
`type` argument to instead save it as a `joblib` or `arrow` file (NOTE: arrow is not yet supported).
5860

59-
You can later retrieve the pinned data with `pin_read()`:
61+
You can later retrieve the pinned data with `.pin_read()`:
6062

6163
```{code-cell} ipython3
6264
board.pin_read("mtcars")
6365
```
6466

6567
A board on your computer is good place to start, but the real power of pins comes when you use a board that's shared with multiple people.
66-
To get started, you can use `board_folder()` with a directory on a shared drive or in dropbox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:
68+
To get started, you can use `board_folder()` with a directory on a shared drive or in DropBox, or if you use [RStudio Connect](https://www.rstudio.com/products/connect/) you can use `board_rsconnect()`:
6769

6870
🚧 TODO: add informational messages shown in display below
6971

@@ -81,7 +83,7 @@ board.pin_write(tidy_sales_data, "hadley/sales-summary", type = "csv")
8183

8284
+++
8385

84-
Then, someone else (or an automated Rmd report) can read and use your pin:
86+
Then, someone else (or an automated report) can read and use your pin:
8587

8688
+++
8789

@@ -94,5 +96,5 @@ board.pin_read("hadley/sales-summary")
9496

9597
You can easily control who gets to access the data using the RStudio Connect permissions pane.
9698

97-
The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`), Azure's blob storage (`board_azure()`), and Microsoft SharePoint (`board_ms365()`).
99+
The pins package also includes boards that allow you to share data on services like Amazon's S3 (`board_s3()`) and Azure's blob storage (`board_azure()`).
98100
Learn more in [getting started](getting_started.Rmd).

0 commit comments

Comments
 (0)