|
1 | | -# TFRecord Conversion Utilities |
| 1 | +# TFRecord Utilities (TFRUtil) |
2 | 2 |
|
3 | | -## Installing |
| 3 | +TFRUtil makes it easy to create TFRecords from images and labels in |
| 4 | +Pandas DataFrames or CSV files. |
| 5 | +Today, TFRUtil supports data stored in 'image csv format' similar to |
| 6 | +GCP AutoML Vision. |
| 7 | +In the future TFRUtil will support converting any Pandas DataFrame or CSV |
| 8 | +file into TFRecords. |
4 | 9 |
|
5 | | -1. Clone this repo. |
| 10 | +## Installation |
6 | 11 |
|
7 | | -2. Run the command `python3 setup.py` |
| 12 | +From the top directory of the repo, run the following command: |
8 | 13 |
|
9 | | -## What is TFRUtil |
10 | | -TFRUtil makes it easy to create TFRecords from images and labels using Pandas DataFrames or CSVs. |
11 | | -Today, TFRUtil supports data stored in 'image csv format' similar to GCP AutoML Vision. In the |
12 | | -future TFRUtil will support converting any dataframe or CSV file into TFRecords. |
| 14 | +```bash |
| 15 | +pip install . |
| 16 | +``` |
13 | 17 |
|
14 | | -## Using TFRUtil to create TFRecords |
| 18 | +## Usage |
15 | 19 |
|
16 | | -### Image CSV Format |
17 | | -TFRUtil currently expects data to be in the same format as [AutoML Vision](https://cloud.google.com/vision/automl/docs/prepare). This format looks like a pandas dataframe or CSV formatted as: |
| 20 | +### IPython/Jupyter |
18 | 21 |
|
19 | | -| split | image_uri | label | |
20 | | -|-------|-------------------------|-------| |
21 | | -| TRAIN | gs://foo/bar/image1.jpg | cat | |
| 22 | +#### Pandas DataFrame Conversion |
22 | 23 |
|
23 | | -Where: |
24 | | -* split can take on the values TRAIN, VALIDATION, and TEST |
25 | | -* image_uri specifies a local or google cloud storage location for the image file. |
26 | | -* label can be either a text based label that will be integerized or integer |
| 24 | +```bash |
| 25 | +import pandas as pd |
| 26 | +import tfrutil |
| 27 | +df = pd.read_csv(...) |
| 28 | +df.tensorflow.to_tfrecord(output_dir="gs://my/bucket") |
| 29 | +``` |
| 30 | + |
| 31 | +#### Using Cloud Dataflow |
| 32 | + |
| 33 | +```bash |
| 34 | +df.tensorflow.to_tfrecord( |
| 35 | + output_dir="gs://my/bucket", |
| 36 | + runner="DataFlowRunner", |
| 37 | + project="my-project", |
| 38 | + region="us-central1) |
| 39 | +``` |
| 40 | +
|
| 41 | +### Command-line interface |
27 | 42 |
|
28 | | -### Pandas API |
29 | | -TODO |
| 43 | +```bash |
| 44 | +tfrutil create-tfrecords --output_dir="gs://my/bucket" data.csv |
| 45 | +``` |
30 | 46 |
|
31 | | -### Python API |
32 | | -TODO |
| 47 | +## Input format |
| 48 | +
|
| 49 | +TFRUtil currently expects data to be in the same format as [AutoML Vision](https://cloud.google.com/vision/automl/docs/prepare). This format looks like a pandas dataframe or CSV formatted as: |
| 50 | +
|
| 51 | +| split | image_uri | label | |
| 52 | +|-------|---------------------------|-------| |
| 53 | +| TRAIN | gs://my/bucket/image1.jpg | cat | |
| 54 | +
|
| 55 | +Where: |
| 56 | +* `split` can take on the values TRAIN, VALIDATION, and TEST |
| 57 | +* `image_uri` specifies a local or google cloud storage location for the image file. |
| 58 | +* `label` can be either a text based label that will be integerized or integer |
33 | 59 |
|
34 | | -### CSV File |
35 | | -TODO |
| 60 | +## Contributing |
36 | 61 |
|
37 | | -## Using TFRutil to inspect TFRecords |
38 | | -TODO |
| 62 | +Pull requests are welcome. |
0 commit comments