Skip to content
This repository was archived by the owner on Jul 31, 2023. It is now read-only.

Commit 2e68669

Browse files
committed
Update readme for soft launch.
Change-Id: Ia6546f74ea2877b4659fc49fcc68002664d0de16
1 parent 07d8531 commit 2e68669

File tree

1 file changed

+50
-26
lines changed

1 file changed

+50
-26
lines changed

README.md

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,62 @@
1-
# TFRecord Conversion Utilities
1+
# TFRecord Utilities (TFRUtil)
22

3-
## Installing
3+
TFRUtil makes it easy to create TFRecords from images and labels in
4+
Pandas DataFrames or CSV files.
5+
Today, TFRUtil supports data stored in 'image csv format' similar to
6+
GCP AutoML Vision.
7+
In the future TFRUtil will support converting any Pandas DataFrame or CSV
8+
file into TFRecords.
49

5-
1. Clone this repo.
10+
## Installation
611

7-
2. Run the command `python3 setup.py`
12+
From the top directory of the repo, run the following command:
813

9-
## What is TFRUtil
10-
TFRUtil makes it easy to create TFRecords from images and labels using Pandas DataFrames or CSVs.
11-
Today, TFRUtil supports data stored in 'image csv format' similar to GCP AutoML Vision. In the
12-
future TFRUtil will support converting any dataframe or CSV file into TFRecords.
14+
```bash
15+
pip install .
16+
```
1317

14-
## Using TFRUtil to create TFRecords
18+
## Usage
1519

16-
### Image CSV Format
17-
TFRUtil currently expects data to be in the same format as [AutoML Vision](https://cloud.google.com/vision/automl/docs/prepare). This format looks like a pandas dataframe or CSV formatted as:
20+
### IPython/Jupyter
1821

19-
| split | image_uri | label |
20-
|-------|-------------------------|-------|
21-
| TRAIN | gs://foo/bar/image1.jpg | cat |
22+
#### Pandas DataFrame Conversion
2223

23-
Where:
24-
* split can take on the values TRAIN, VALIDATION, and TEST
25-
* image_uri specifies a local or google cloud storage location for the image file.
26-
* label can be either a text based label that will be integerized or integer
24+
```bash
25+
import pandas as pd
26+
import tfrutil
27+
df = pd.read_csv(...)
28+
df.tensorflow.to_tfrecord(output_dir="gs://my/bucket")
29+
```
30+
31+
#### Using Cloud Dataflow
32+
33+
```bash
34+
df.tensorflow.to_tfrecord(
35+
output_dir="gs://my/bucket",
36+
runner="DataFlowRunner",
37+
project="my-project",
38+
region="us-central1)
39+
```
40+
41+
### Command-line interface
2742
28-
### Pandas API
29-
TODO
43+
```bash
44+
tfrutil create-tfrecords --output_dir="gs://my/bucket" data.csv
45+
```
3046
31-
### Python API
32-
TODO
47+
## Input format
48+
49+
TFRUtil currently expects data to be in the same format as [AutoML Vision](https://cloud.google.com/vision/automl/docs/prepare). This format looks like a pandas dataframe or CSV formatted as:
50+
51+
| split | image_uri | label |
52+
|-------|---------------------------|-------|
53+
| TRAIN | gs://my/bucket/image1.jpg | cat |
54+
55+
Where:
56+
* `split` can take on the values TRAIN, VALIDATION, and TEST
57+
* `image_uri` specifies a local or google cloud storage location for the image file.
58+
* `label` can be either a text based label that will be integerized or integer
3359
34-
### CSV File
35-
TODO
60+
## Contributing
3661
37-
## Using TFRutil to inspect TFRecords
38-
TODO
62+
Pull requests are welcome.

0 commit comments

Comments
 (0)