Skip to content

Commit 4bea81b

Browse files
alambviirya
andauthored
Document ability to select directly from files in datafusion-cli (apache#4851)
* Document ability to select directly from files in datafusion-cli * prettier * Update docs/source/user-guide/cli.md Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]>
1 parent 664edea commit 4bea81b

File tree

1 file changed

+46
-17
lines changed

1 file changed

+46
-17
lines changed

docs/source/user-guide/cli.md

Lines changed: 46 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -19,30 +19,51 @@
1919

2020
# DataFusion Command-line SQL Utility
2121

22-
The DataFusion CLI is a command-line interactive SQL utility that allows
23-
queries to be executed against any supported data files. It is a convenient way to
22+
The DataFusion CLI is a command-line interactive SQL utility for executing
23+
queries against any supported data files. It is a convenient way to
2424
try DataFusion out with your own data sources, and test out its SQL support.
2525

2626
## Example
2727

2828
Create a CSV file to query.
2929

30-
```bash
31-
$ echo "1,2" > data.csv
30+
```shell
31+
$ echo "a,b" > data.csv
32+
$ echo "1,2" >> data.csv
3233
```
3334

34-
```bash
35+
Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)
36+
37+
```shell
3538
$ datafusion-cli
36-
DataFusion CLI v12.0.0
37-
❯ CREATE EXTERNAL TABLE foo STORED AS CSV LOCATION 'data.csv';
38-
0 rows in set. Query took 0.017 seconds.
39-
select * from foo;
40-
+----------+----------+
41-
| column_1 | column_2 |
42-
+----------+----------+
43-
| 1 | 2 |
44-
+----------+----------+
45-
1 row in set. Query took 0.012 seconds.
39+
DataFusion CLI v17.0.0
40+
select * from 'data.csv';
41+
+---+---+
42+
| a | b |
43+
+---+---+
44+
| 1 | 2 |
45+
+---+---+
46+
1 row in set. Query took 0.007 seconds.
47+
```
48+
49+
You can also query directories of files with compatible schemas:
50+
51+
```shell
52+
$ ls data_dir/
53+
data.csv data2.csv
54+
```
55+
56+
```shell
57+
$ datafusion-cli
58+
DataFusion CLI v16.0.0
59+
select * from 'data_dir';
60+
+---+---+
61+
| a | b |
62+
+---+---+
63+
| 3 | 4 |
64+
| 1 | 2 |
65+
+---+---+
66+
2 rows in set. Query took 0.007 seconds.
4667
```
4768
4869
## Installation
@@ -87,6 +108,8 @@ docker run -it -v $(your_data_location):/data datafusion-cli
87108
88109
## Usage
89110
111+
See the current usage using `datafusion-cli --help`:
112+
90113
```bash
91114
Apache Arrow <[email protected]>
92115
Command Line Client for DataFusion query engine.
@@ -104,10 +127,16 @@ OPTIONS:
104127
-q, --quiet Reduce printing other than the results and work quietly
105128
-r, --rc <RC>... Run the provided files on startup instead of ~/.datafusionrc
106129
-V, --version Print version information
107-
108-
Type `exit` or `quit` to exit the CLI.
109130
```
110131
132+
## Selecting files directly
133+
134+
Files can be queried directly by enclosing the file or
135+
directory name in single `'` quotes as shown in the example.
136+
137+
It is also possible to create a table backed by files by explicitly
138+
via `CREATE EXTERNAL TABLE` as shown below.
139+
111140
## Registering Parquet Data Sources
112141
113142
Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files.

0 commit comments

Comments
 (0)