Skip to content
3 changes: 1 addition & 2 deletions dev/docker/ballista-builder.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && \
apt-get install -y nodejs && \
npm install -g yarn

# create build user with same UID as
RUN adduser -q -u $EXT_UID builder --home /home/builder && \
RUN adduser -q builder --home /home/builder && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is failing github action

mkdir -p /home/builder/workspace
USER builder

Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/deployment/cargo-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

# Deploying a standalone Ballista cluster using cargo install

A simple way to start a local cluster for testing purposes is to use cargo to install
Another simple way to start a local cluster for testing purposes is to use cargo to install
the scheduler and executor crates.

```bash
Expand Down
22 changes: 1 addition & 21 deletions docs/source/user-guide/deployment/docker-compose.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,27 +23,7 @@ Docker Compose is a convenient way to launch a cluster when testing locally.

## Build Docker Images

Run the following commands to download the [official Docker image](https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone):

```bash
docker pull ghcr.io/apache/datafusion-ballista-standalone:latest
```

Altenatively run the following commands to clone the source repository and build the Docker images from source:

```bash
git clone git@github.com:apache/datafusion-ballista.git -b latest
cd datafusion-ballista
./dev/build-ballista-docker.sh
```

This will create the following images:

- `apache/datafusion-ballista-benchmarks:latest`
- `apache/datafusion-ballista-cli:latest`
- `apache/datafusion-ballista-executor:latest`
- `apache/datafusion-ballista-scheduler:latest`
- `apache/datafusion-ballista-standalone:latest`
To create the required Docker images please refer to the [docker deployment page](docker.md).

## Start a Cluster

Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/deployment/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Run the following commands to download the [official Docker image](https://githu
docker pull ghcr.io/apache/datafusion-ballista-standalone:latest
```

Altenatively run the following commands to clone the source repository and build the Docker images from source:
Alternatively run the following commands to clone the source repository and build the Docker images from source:

```bash
git clone git@github.com:apache/datafusion-ballista.git
Expand Down
28 changes: 4 additions & 24 deletions docs/source/user-guide/deployment/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,27 +41,7 @@ microk8s enable dns

## Build Docker Images

Run the following commands to download the [official Docker image](https://github.com/apache/datafusion-ballista/pkgs/container/datafusion-ballista-standalone):

```bash
docker pull ghcr.io/apache/datafusion-ballista-standalone:0.12.0-rc4
```

Altenatively run the following commands to clone the source repository and build the Docker images from source:

```bash
git clone git@github.com:apache/datafusion-ballista.git -b 0.12.0
cd datafusion-ballista
./dev/build-ballista-docker.sh
```

This will create the following images:

- `apache/datafusion-ballista-benchmarks:0.12.0`
- `apache/datafusion-ballista-cli:0.12.0`
- `apache/datafusion-ballista-executor:0.12.0`
- `apache/datafusion-ballista-scheduler:0.12.0`
- `apache/datafusion-ballista-standalone:0.12.0`
To create the required Docker images please refer to the [docker deployment page](docker.md).

## Publishing Docker Images

Expand Down Expand Up @@ -267,9 +247,9 @@ kubectl delete -f cluster.yaml

## Autoscaling Executors

Ballista supports autoscaling for executors through [Keda](http://keda.sh). Keda allows scaling a deployment
through custom metrics which are exposed through the Ballista scheduler, and it can even scale the number of
executors down to 0 if there is no activity in the cluster.
Ballista supports autoscaling for executors through [Keda](http://keda.sh). Keda allows for the scaling of a
deployment through custom metrics which are exposed through the Ballista scheduler, and it
can even scale the number of executors down to 0 if there is no activity in the cluster.

Keda can be installed in your kubernetes cluster through a single command line:

Expand Down
9 changes: 5 additions & 4 deletions docs/source/user-guide/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,18 @@

Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache DataFusion.

Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but
Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and
Kubernetes. See the [deployment guide](deployment.md) for more information
Ballista has both scheduler and an executor component processes that are standard Rust executables.

Dockerfiles are also provided to build images for use in containerized environments, such as Docker, Docker Compose,
and Kubernetes. See the [deployment guide](deployment.md) for more information.

SQL and DataFrame queries can be submitted from Python and Rust, and SQL queries can be submitted via the Arrow
Flight SQL JDBC driver, supporting your favorite JDBC compliant tools such as [DataGrip](datagrip)
or [tableau](tableau). For setup instructions, please see the [FlightSQL guide](flightsql.md).

## How does this compare to Apache Spark?

Although Ballista is largely inspired by Apache Spark, there are some key differences.
Although Ballista is largely inspired by Apache Spark, there are some key differences:

- The choice of Rust as the main execution language means that memory usage is deterministic and avoids the overhead
of GC pauses.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user-guide/scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

The scheduler also provides a REST API that allows jobs to be monitored.

> This is optional scheduler feature which should be enabled with `rest-api` feature
> This is optional scheduler feature which should be enabled with the `rest-api` feature.

| API | Method | Description |
| ------------------------------------ | ------ | ----------------------------------------------------------------- |
Expand Down
Loading