Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,8 @@
- [Super-structured Data](tutorials/super-structured.md)
- [Pipe Joins](tutorials/join.md)
- [Shaping](tutorials/shaping.md)
- [Database](tutorials/database.md)
- [Performance](tutorials/performance.md)
- [Comparing with JQ](tutorials/jq.md)
- [For jq Users](tutorials/jq.md)
- [Formats](formats/intro.md)
- [Data Model](formats/model.md)
- [Super (SUP)](formats/sup.md)
Expand All @@ -214,7 +213,8 @@
- [Python](dev/libraries/python.md)
- [Integrations](dev/integrations/intro.md)
- [Amazon S3](dev/integrations/s3.md)
- [Fluentd](dev/integrations/fluentd.md)4
!! - [Authentication](dev/integrations/auth.md)
- [Fluentd](dev/integrations/fluentd.md)
- [Grafana](dev/integrations/grafana.md)
- [Zeek](dev/integrations/zeek/intro.md)
- [Logs](dev/integrations/zeek/logs.md)
Expand Down
42 changes: 41 additions & 1 deletion book/src/command/compile.md
Original file line number Diff line number Diff line change
@@ -1 +1,41 @@
# compile
### Command

  **compile** — compile a SuperSQL query for inspection and debugging

### Synopsis

```
super compile [ options ] query
```

### Options

* `-C` display DAG or AST as query text (default "false")
* `-dag` display output as DAG (implied by -O or -P) (default "false")
* `-files` compile query as if command-line input files are present) (default "false")
* `-I` source file containing query text (may be repeated)
* `-O` display optimized DAG (default "false")
* `-P` display parallelized DAG (default "0")

Additional options of the [super top-level command](super.md#options)

### Description

This command parses a [SuperSQL](../super-sql/intro.md) query
and emits the resulting abstract syntax tree (AST) or
runtime directed acyclic graph (DAG) in the output
format desired. Use `-dag` to specify the DAG form; otherwise, the
AST form is assumed.

The `-C` option causes the output to be shown as query language
source instead of the AST. This is particularly helpful to
see how SQP queries in their abbreviated form are translated
into the exanded, pedantic form of piped SQL. The DAG can
also be formatted as query-style text but the resulting text is
informational only and does not conform to any query syntax. When
`-C` is specified, the result is sent to stdout and the `-f` and
`-o` options have no effect.

This command is often used for dev and test but is also useful to
advanced users for understanding how SuperSQL syntax is parsed
into an AST or compiled into a runtime DAG.
26 changes: 25 additions & 1 deletion book/src/command/db-auth.md
Original file line number Diff line number Diff line change
@@ -1 +1,25 @@
# auth
### Command

  **auth** — connect to a database and authenticate

### Synopsis

```
super db auth login|logout|method|verify
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

> **TODO: rename this command. it's really about connecting to a database.
> authenticating is something you do to connect.**

Access to a lake can be secured with [Auth0 authentication](https://auth0.com/).
A [guide](../dev/integrations/auth.md) is available with example configurations.
Please reach out to us on our [community Slack](https://www.brimdata.io/join-slack/)
if you have feedback on your experience or need additional help.
46 changes: 45 additions & 1 deletion book/src/command/db-branch.md
Original file line number Diff line number Diff line change
@@ -1 +1,45 @@
# branch
### Command

  **branch** — create a new branch on a pool

### Synopsis

```
super db branch [options] [name]
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

The `branch` command creates a branch with the name `name` that points
to the tip of the working branch or, if the `name` argument is not provided,
lists the existing branches of the selected pool.

For example, this branch command
```
super db branch -use logs@main staging
```
creates a new branch called "staging" in pool "logs", which points to
the same commit object as the "main" branch. Once created, commits
to the "staging" branch will be added to the commit history without
affecting the "main" branch and each branch can be queried independently
at any time.

Supposing the `main` branch of `logs` was already the working branch,
then you could create the new branch called "staging" by simply saying
```
super db branch staging
```
Likewise, you can delete a branch with `-d`:
```
super db branch -d staging
```
and list the branches as follows:
```
super db branch
```
35 changes: 34 additions & 1 deletion book/src/command/db-create.md
Original file line number Diff line number Diff line change
@@ -1 +1,34 @@
# create
### Command

  **create** — create a new pool in a database

### Synopsis

```
super db create [-orderby key[,key...][:asc|:desc]] <name>
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

The `create` command creates a new data pool with the given name,
which may be any valid UTF-8 string.

The `-orderby` option indicates the [sort key](db.md#sort-key) that is used to sort
the data in the pool, which may be in ascending or descending order.

If a sort key is not specified, then it defaults to
the [special value `this`](../super-sql/intro.md#pipe-scoping).

> **TODO: if we have no sort key, then there should be no sort key**

A newly created pool is initialized with a branch called `main`.

> Pools can be used without thinking about branches. When referencing a pool without
> a branch, the tooling presumes the "main" branch as the default, and everything
> can be done on main without having to think about branching.
34 changes: 33 additions & 1 deletion book/src/command/db-delete.md
Original file line number Diff line number Diff line change
@@ -1 +1,33 @@
# delete
### Command

&emsp; **delete** &mdash; delete data from a pool

### Synopsis

```
super db delete [options] <id> [<id>...]
super db delete [options] -where <filter>
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

The `delete` command removes one or more data objects indicated by their ID from a pool.
This command
simply removes the data from the branch without actually deleting the
underlying data objects thereby allowing time travel to work in the face
of deletes. Permanent deletion of underlying data objects is handled by the
separate [`vacuum`](db-vacuum.md) command.

If the `-where` flag is specified, delete will remove all values for which the
provided filter expression is true. The value provided to `-where` must be a
single filter expression, e.g.:

```
super db delete -where 'ts > 2022-10-05T17:20:00Z and ts < 2022-10-05T17:21:00Z'
```
23 changes: 22 additions & 1 deletion book/src/command/db-drop.md
Original file line number Diff line number Diff line change
@@ -1 +1,22 @@
# drop
### Command

&emsp; **drop** &mdash; remove a pool from a database

### Synopsis

```
super db drop [options] <name>|<id>
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

The `drop` command deletes a pool and all of its constituent data.
As this is a DANGER ZONE command, you must confirm that you want to delete
the pool to proceed. The `-f` option can be used to force the deletion
without confirmation.
28 changes: 27 additions & 1 deletion book/src/command/db-init.md
Original file line number Diff line number Diff line change
@@ -1 +1,27 @@
# init
### Command

&emsp; **init** &mdash; create and initialize a new database

### Synopsis

```
super db init [path]
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

A new database is created and initialized with the `init` command. The `path` argument
is a [storage path](../database/intro.md#storage-layer)
and is optional. If not present, the path
is [determined automatically](db.md#database-connection).

If the database already exists, `init` reports an error and does nothing.

Otherwise, the `init` command writes the initial cloud objects to the
storage path to create a new, empty database at the specified path.
115 changes: 114 additions & 1 deletion book/src/command/db-load.md
Original file line number Diff line number Diff line change
@@ -1 +1,114 @@
# load
### Command

&emsp; **load** &mdash; load data into database

### Synopsis

```
super db load [options] input [input ...]
```

### Options

TODO

Additional options of the [db sub-command](db.md#options)

### Description

The `load` command commits new data to a branch of a pool.

Run `super db load -h` for a list of command-line options.

Note that there is no need to define a schema or insert data into
a "table" as all super-structured data is _self describing_ and can be queried in a
schema-agnostic fashion. Data of any _shape_ can be stored in any pool
and arbitrary data _shapes_ can coexist side by side.

As with [`super`](super.md),
the [input arguments](super.md#options) can be in
any [supported format](super.md#supported-formats) and
the input format is auto-detected if `-i` is not provided. Likewise,
the inputs may be URLs, in which case, the `load` command streams
the data from a Web server or [S3](../dev/integrations/s3.md)
and into the database.

When data is loaded, it is broken up into objects of a target size determined
by the pool's `threshold` parameter (which defaults to 500MiB but can be configured
when the pool is created). Each object is sorted by the [sort key](db.md#sort-key) but
a sequence of objects is not guaranteed to be globally sorted. When lots
of small or unsorted commits occur, data can be fragmented. The performance
impact of fragmentation can be eliminated by regularly [compacting](db-manage.md)
pools.

For example, this command
```
super db load sample1.json sample2.bsup sample3.sup
```
loads files of varying formats in a single commit to the working branch.

An alternative branch may be specified with a branch reference with the
`-use` option, i.e., `<pool>@<branch>`. Supposing a branch
called `live` existed, data can be committed into this branch as follows:
```
super db load -use logs@live sample.bsup
```
Or, as mentioned above, you can set the default branch for the load command
via [`use`](db-use.md):
```
super db use logs@live
super db load sample.bsup
```
During a `load` operation, a commit is broken out into units called _data objects_
where a target object size is configured into the pool,
typically 100MB-1GB. The records within each object are sorted by the sort key.
A data object is presumed by the implementation
to fit into the memory of an intake worker node
so that such a sort can be trivially accomplished.

Data added to a pool can arrive in any order with respect to its sort key.
While each object is sorted before it is written,
the collection of objects is generally not sorted.

Each load operation creates a single [commit](../database/intro.md#commit-objects),
which includes:
* an author and message string,
* a timestamp computed by the server, and
* an optional metadata field of any type expressed as a Super (SUP) value.
This data has the type signature:
```
{
author: string,
date: time,
message: string,
meta: <any>
}
```
where `<any>` is the type of any optionally attached metadata .
For example, this command sets the `author` and `message` fields:
```
super db load -user [email protected] -message "new version of prod dataset" ...
```
If these fields are not specified, then the system will fill them in
with the user obtained from the session and a message that is descriptive
of the action.

The `date` field here is used by the database for
[time travel](../database/intro.md#time-travel)
through the branch and pool history, allowing you to see the state of
branches at any time in their commit history.

Arbitrary metadata expressed as any [SUP value](../formats/sup.md)
may be attached to a commit via the `-meta` flag. This allows an application
or user to transactionally commit metadata alongside committed data for any
purpose. This approach allows external applications to implement arbitrary
data provenance and audit capabilities by embedding custom metadata in the
commit history.

Since commit objects are stored as super-structured data, the metadata can easily be
queried by running the `log -f bsup` to retrieve the log in BSUP format,
for example, and using [`super`](super.md) to pull the metadata out
as in:
```
super db log -f bsup | super -c 'has(meta) | values {id,meta}' -
```
Loading