Skip to content

Commit d949e5f

Browse files
tbar4Trevor Barnes
andauthored
added a BallistaContext to ballista to allow for Remote or standalone (#1100)
* added a pycontext to ballista * added a pycontext to ballista * added a pycontext to ballista * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updating the pyballista package to ballista * changing the packagaing naming convention from pyballista to ballista * changing the packagaing naming convention from pyballista to ballista * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * updated python to have two static methods for creating a ballista context * Updating BallistaContext and Config * Updating BallistaContext and Config * updated python to have two static methods for creating a ballista context * Updating BallistaContext and Config, calling it for the night, will complete tomorrow * Updating BallistaContext and Config, calling it for the night, will complete tomorrow * Adding config to ballista context * Adding config to ballista context * Adding config to ballista context * Adding config to ballista context * Updated Builder and Docs * Updated Builder and Docs * Updated Builder and Docs * Updated Builder and Docs * Updated Builder and Docs * Updated Builder and Docs --------- Co-authored-by: Trevor Barnes <[email protected]>
1 parent 5b6b50b commit d949e5f

File tree

11 files changed

+150
-389
lines changed

11 files changed

+150
-389
lines changed

docs/source/user-guide/python.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,20 @@ popular file formats files, run it in a distributed environment, and obtain the
2828

2929
The following code demonstrates how to create a Ballista context and connect to a scheduler.
3030

31+
If you are running a standalone cluster (runs locally), all you need to do is call the stand alone cluster method `standalone()` or your BallistaContext. If you are running a cluster in remote mode, you need to provide the URL `Ballista.remote("http://my-remote-ip:50050")`.
32+
3133
```text
32-
>>> import ballista
33-
>>> ctx = ballista.BallistaContext("localhost", 50050)
34+
>>> from ballista import BallistaBuilder
35+
>>> # for a standalone instance
36+
>>> # Ballista will initiate with an empty config
37+
>>> # set config variables with `config()`
38+
>>> ballista = BallistaBuilder()\
39+
>>> .config("ballista.job.name", "example ballista")
40+
>>>
41+
>>> ctx = ballista.standalone()
42+
>>>
43+
>>> # for a remote instance provide the URL
44+
>>> ctx = ballista.remote("df://url-path-to-scheduler:50050")
3445
```
3546

3647
## SQL
@@ -103,14 +114,15 @@ The `explain` method can be used to show the logical and physical query plans fo
103114
The following example demonstrates creating arrays with PyArrow and then creating a Ballista DataFrame.
104115

105116
```python
106-
import ballista
117+
from ballista import BallistaBuilder
107118
import pyarrow
108119

109120
# an alias
121+
# TODO implement Functions
110122
f = ballista.functions
111123

112124
# create a context
113-
ctx = ballista.BallistaContext("localhost", 50050)
125+
ctx = Ballista().standalone()
114126

115127
# create a RecordBatch and a new DataFrame from it
116128
batch = pyarrow.RecordBatch.from_arrays(

python/Cargo.toml

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,14 @@ readme = "README.md"
2626
license = "Apache-2.0"
2727
edition = "2021"
2828
rust-version = "1.72"
29-
include = ["/src", "/pyballista", "/LICENSE.txt", "pyproject.toml", "Cargo.toml", "Cargo.lock"]
29+
include = ["/src", "/ballista", "/LICENSE.txt", "pyproject.toml", "Cargo.toml", "Cargo.lock"]
3030
publish = false
3131

3232
[dependencies]
3333
async-trait = "0.1.77"
34-
ballista = { path = "../ballista/client", version = "0.12.0" }
34+
ballista = { path = "../ballista/client", version = "0.12.0", features = ["standalone"] }
3535
ballista-core = { path = "../ballista/core", version = "0.12.0" }
36-
datafusion = { version = "42" }
36+
datafusion = { version = "42", features = ["pyarrow", "avro"] }
3737
datafusion-proto = { version = "42" }
3838
datafusion-python = { version = "42" }
3939

@@ -43,6 +43,4 @@ tokio = { version = "1.35", features = ["macros", "rt", "rt-multi-thread", "sync
4343

4444
[lib]
4545
crate-type = ["cdylib"]
46-
name = "pyballista"
47-
48-
46+
name = "ballista"

python/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ part of the default Cargo workspace so that it doesn't cause overhead for mainta
2929
Creates a new context and connects to a Ballista scheduler process.
3030

3131
```python
32-
from pyballista import SessionContext
33-
>>> ctx = SessionContext("localhost", 50050)
32+
from ballista import BallistaBuilder
33+
>>> ctx = BallistaBuilder().standalone()
3434
```
3535

3636
## Example SQL Usage
Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@
2525

2626
import pyarrow as pa
2727

28-
from .pyballista_internal import (
29-
SessionContext,
28+
from .ballista_internal import (
29+
BallistaBuilder,
3030
)
3131

3232
__version__ = importlib_metadata.version(__name__)
3333

3434
__all__ = [
35-
"SessionContext",
36-
]
35+
"BallistaBuilder",
36+
]
Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,50 +15,50 @@
1515
# specific language governing permissions and limitations
1616
# under the License.
1717

18-
from pyballista import SessionContext
18+
from ballista import BallistaBuilder
1919
import pytest
2020

2121
def test_create_context():
22-
SessionContext("localhost", 50050)
22+
BallistaBuilder().standalone()
2323

2424
def test_select_one():
25-
ctx = SessionContext("localhost", 50050)
25+
ctx = BallistaBuilder().standalone()
2626
df = ctx.sql("SELECT 1")
2727
batches = df.collect()
2828
assert len(batches) == 1
2929

3030
def test_read_csv():
31-
ctx = SessionContext("localhost", 50050)
31+
ctx = BallistaBuilder().standalone()
3232
df = ctx.read_csv("testdata/test.csv", has_header=True)
3333
batches = df.collect()
3434
assert len(batches) == 1
3535
assert len(batches[0]) == 1
3636

3737
def test_register_csv():
38-
ctx = SessionContext("localhost", 50050)
38+
ctx = BallistaBuilder().standalone()
3939
ctx.register_csv("test", "testdata/test.csv", has_header=True)
4040
df = ctx.sql("SELECT * FROM test")
4141
batches = df.collect()
4242
assert len(batches) == 1
4343
assert len(batches[0]) == 1
4444

4545
def test_read_parquet():
46-
ctx = SessionContext("localhost", 50050)
46+
ctx = BallistaBuilder().standalone()
4747
df = ctx.read_parquet("testdata/test.parquet")
4848
batches = df.collect()
4949
assert len(batches) == 1
5050
assert len(batches[0]) == 8
5151

5252
def test_register_parquet():
53-
ctx = SessionContext("localhost", 50050)
53+
ctx = BallistaBuilder().standalone()
5454
ctx.register_parquet("test", "testdata/test.parquet")
5555
df = ctx.sql("SELECT * FROM test")
5656
batches = df.collect()
5757
assert len(batches) == 1
5858
assert len(batches[0]) == 8
5959

6060
def test_read_dataframe_api():
61-
ctx = SessionContext("localhost", 50050)
61+
ctx = BallistaBuilder().standalone()
6262
df = ctx.read_csv("testdata/test.csv", has_header=True) \
6363
.select_columns('a', 'b') \
6464
.limit(1)
@@ -67,11 +67,12 @@ def test_read_dataframe_api():
6767
assert len(batches[0]) == 1
6868

6969
def test_execute_plan():
70-
ctx = SessionContext("localhost", 50050)
70+
ctx = BallistaBuilder().standalone()
7171
df = ctx.read_csv("testdata/test.csv", has_header=True) \
7272
.select_columns('a', 'b') \
7373
.limit(1)
74-
df = ctx.execute_logical_plan(df.logical_plan())
74+
# TODO research SessionContext Logical Plan for DataFusionPython
75+
#df = ctx.execute_logical_plan(df.logical_plan())
7576
batches = df.collect()
7677
assert len(batches) == 1
7778
assert len(batches[0]) == 1

python/examples/example.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
from ballista import BallistaBuilder
19+
from datafusion.context import SessionContext
20+
21+
# Ballista will initiate with an empty config
22+
# set config variables with `config`
23+
ctx: SessionContext = BallistaBuilder()\
24+
.config("ballista.job.name", "example ballista")\
25+
.config("ballista.shuffle.partitions", "16")\
26+
.standalone()
27+
28+
#ctx_remote: SessionContext = ballista.remote("remote_ip", 50050)
29+
30+
# Select 1 to verify its working
31+
ctx.sql("SELECT 1").show()
32+
#ctx_remote.sql("SELECT 2").show()

python/pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ requires = ["maturin>=0.15,<0.16"]
2020
build-backend = "maturin"
2121

2222
[project]
23-
name = "pyballista"
23+
name = "ballista"
2424
description = "Python client for Apache Arrow Ballista Distributed SQL Query Engine"
2525
readme = "README.md"
2626
license = {file = "LICENSE.txt"}
@@ -55,10 +55,10 @@ repository = "https://github.com/apache/arrow-ballista"
5555
profile = "black"
5656

5757
[tool.maturin]
58-
module-name = "pyballista.pyballista_internal"
58+
module-name = "ballista.ballista_internal"
5959
include = [
6060
{ path = "Cargo.lock", format = "sdist" }
6161
]
6262
exclude = [".github/**", "ci/**", ".asf.yaml"]
6363
# Require Cargo.lock is up to date
64-
locked = true
64+
locked = true

0 commit comments

Comments
 (0)