Skip to content

Commit eba8f6c

Browse files
committed
Move downloading of data files for examples into the build scripts and just point the users to where these files are located instead of adding url lib requests to the python examples so we can focus on what is most important to the user
1 parent cdfb5a8 commit eba8f6c

File tree

10 files changed

+26
-52
lines changed

10 files changed

+26
-52
lines changed

.github/workflows/docs.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ jobs:
7575
set -x
7676
source venv/bin/activate
7777
cd docs
78+
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
79+
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
7880
make html
7981
8082
- name: Copy & push the generated HTML

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
pokemon.csv
22
yellow_trip_data.parquet
3+
yellow_tripdata_2021-01.parquet
4+

docs/build.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,17 @@
1919
#
2020

2121
set -e
22+
23+
if [ ! -f pokemon.csv ]; then
24+
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
25+
fi
26+
27+
if [ ! -f yellow_tripdata_2021-01.parquet ]; then
28+
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
29+
fi
30+
2231
rm -rf build 2> /dev/null
2332
rm -rf temp 2> /dev/null
2433
mkdir temp
2534
cp -rf source/* temp/
26-
make SOURCEDIR=`pwd`/temp html
35+
make SOURCEDIR=`pwd`/temp html

docs/source/index.rst

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -43,27 +43,13 @@ Example
4343

4444
.. ipython:: python
4545
46-
import datafusion
47-
from datafusion import col
48-
import pyarrow
49-
50-
# create a context
51-
ctx = datafusion.SessionContext()
52-
53-
# create a RecordBatch and a new DataFrame from it
54-
batch = pyarrow.RecordBatch.from_arrays(
55-
[pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
56-
names=["a", "b"],
57-
)
58-
df = ctx.create_dataframe([[batch]], name="batch_array")
59-
60-
# create a new statement
61-
df = df.select(
62-
col("a") + col("b"),
63-
col("a") - col("b"),
64-
)
65-
66-
df
46+
from datafusion import SessionContext
47+
48+
ctx = SessionContext()
49+
50+
df = ctx.read_csv("pokemon.csv")
51+
52+
df.show()
6753
6854
6955
.. _toc.links:

docs/source/user-guide/basics.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ source file as described in the :ref:`Introduction <guide>`, the Pokemon data se
2525

2626
.. ipython:: python
2727
28-
from datafusion import SessionContext, functions as F
28+
from datafusion import SessionContext, col, functions as F
2929
3030
ctx = SessionContext()
3131

docs/source/user-guide/common-operations/aggregations.rst

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,10 @@ to form a single summary value. For performing an aggregation, DataFusion provid
2626

2727
.. ipython:: python
2828
29-
import urllib.request
3029
from datafusion import SessionContext
3130
from datafusion import col, lit
3231
from datafusion import functions as f
3332
34-
urllib.request.urlretrieve(
35-
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
36-
"pokemon.csv",
37-
)
38-
3933
ctx = SessionContext()
4034
df = ctx.read_csv("pokemon.csv")
4135

docs/source/user-guide/common-operations/functions.rst

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,8 @@ We'll use the pokemon dataset in the following examples.
2525

2626
.. ipython:: python
2727
28-
import urllib.request
2928
from datafusion import SessionContext
3029
31-
urllib.request.urlretrieve(
32-
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
33-
"pokemon.csv",
34-
)
35-
3630
ctx = SessionContext()
3731
ctx.register_csv("pokemon", "pokemon.csv")
3832
df = ctx.table("pokemon")

docs/source/user-guide/common-operations/select-and-filter.rst

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,18 +21,15 @@ Column Selections
2121
Use :py:func:`~datafusion.dataframe.DataFrame.select` for basic column selection.
2222

2323
DataFusion can work with several file types, to start simple we can use a subset of the
24-
`TLC Trip Record Data <https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page>`_
24+
`TLC Trip Record Data <https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page>`_,
25+
which you can download `here <https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet>`_.
2526

2627
.. ipython:: python
27-
28-
import urllib.request
29-
from datafusion import SessionContext
3028
31-
urllib.request.urlretrieve("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet",
32-
"yellow_trip_data.parquet")
29+
from datafusion import SessionContext
3330
3431
ctx = SessionContext()
35-
df = ctx.read_parquet("yellow_trip_data.parquet")
32+
df = ctx.read_parquet("yellow_tripdata_2021-01.parquet")
3633
df.select("trip_distance", "passenger_count")
3734
3835
For mathematical or logical operations use :py:func:`~datafusion.col` to select columns, and give meaningful names to the resulting

docs/source/user-guide/common-operations/windows.rst

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,16 +30,10 @@ We'll use the pokemon dataset (from Ritchie Vink) in the following examples.
3030

3131
.. ipython:: python
3232
33-
import urllib.request
3433
from datafusion import SessionContext
3534
from datafusion import col
3635
from datafusion import functions as f
3736
38-
urllib.request.urlretrieve(
39-
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv",
40-
"pokemon.csv",
41-
)
42-
4337
ctx = SessionContext()
4438
df = ctx.read_csv("pokemon.csv")
4539

docs/source/user-guide/introduction.rst

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,6 @@ options for data sources. For our first example, we demonstrate using a Pokemon
5252
can download
5353
`here <https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv>`_.
5454

55-
.. code-block:: shell
56-
57-
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
58-
5955
With that file in place you can use the following python example to view the DataFrame in
6056
DataFusion.
6157

0 commit comments

Comments
 (0)