Skip to content

Conversation

@mccanne
Copy link
Collaborator

@mccanne mccanne commented Nov 14, 2024

No description provided.

@mccanne mccanne assigned philrz and unassigned philrz Nov 14, 2024
@mccanne mccanne requested review from a team and philrz November 14, 2024 23:43
super -f arrows file1.json file2.parquet file3.csv > file-combined.arrows
```
When `super` is run with a query that has no "from" operator and no input arguments,
the SuperSQL query is fed a single `null` value analagous to SQL's default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the SuperSQL query is fed a single `null` value analagous to SQL's default
the SuperSQL query is fed a single `null` value analogous to SQL's default

select value 1+1
```
To learn more about shortcuts, refer to the SuperSQL
[documenation on shortcuts](../language/pipeline-model.md#implied-operators).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[documenation on shortcuts](../language/pipeline-model.md#implied-operators).
[documentation on shortcuts](../language/pipeline-model.md#implied-operators).

`super` supports a number of [input](#input-formats) and [output](#output-formats) formats, but the super formats
([Super Binary](../formats/bsup.md),
[Super Columnar](../formats/csup.md),
and [Super JSON](../formats/jsup.md)) tend to the most versatile and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and [Super JSON](../formats/jsup.md)) tend to the most versatile and
and [Super JSON](../formats/jsup.md)) tend to be the most versatile and

...
wget https://data.gharchive.org/2023-02-08-23.json.gz
```
We downloadied these files into a directory called `gharchive_gz`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We downloadied these files into a directory called `gharchive_gz`
We downloaded these files into a directory called `gharchive_gz`

`super` with Super Binary is substantially faster than the relational systems for
the search use cases and performs on par with the others for traditional OLAP queries,
except for the union query, where the super-structured data model trounces the relational
model (by over 100X!) for stiching together disparate data types for analysis in an aggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
model (by over 100X!) for stiching together disparate data types for analysis in an aggregation.
model (by over 100X!) for stitching together disparate data types for analysis in an aggregation.

## Appendix 1: Preparing the Test Data

We used the Bash `time` command to measure elapsed time.
For our tests, We diverged a bit from the methodology in the DuckDB blog and wanted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For our tests, We diverged a bit from the methodology in the DuckDB blog and wanted
For our tests, we diverged a bit from the methodology in the DuckDB blog and wanted

```
duckdb gha.db -c "CREATE TABLE gha AS FROM read_json('gharchive_gz/*.json.gz', union_by_name=true)"
```
We now have the `duckdb` database file for out GitHub Archive data called `gha.db`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We now have the `duckdb` database file for out GitHub Archive data called `gha.db`
We now have the `duckdb` database file for our GitHub Archive data called `gha.db`

Copy link
Contributor

@philrz philrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up suggestions to fix some obvious typos and such. There's more changes I'd propose but I'm fine with seeing this merged and I could put up my proposals in a follow-on PR.

@mccanne mccanne merged commit 5f18349 into main Nov 15, 2024
4 checks passed
@mccanne mccanne deleted the super-doc-updates branch November 15, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants