-
Notifications
You must be signed in to change notification settings - Fork 1
Update README.md #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update README.md #10
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,22 +12,22 @@ To see the generated code, look in [tests/expand](tests/expand) or run `cargo ex | |
| ## Supported data types | ||
|
|
||
| - pco supports `u16`, `u32`, `u64`, `i16`, `i32`, `i64`, `f16`, `f32`, `f64` | ||
| - pco_store adds support for `SystemTime`, `bool` | ||
| - pco_store adds support for `SystemTime` (mapped to ???), `bool` (mapped to ???) | ||
|
|
||
| ## Performance | ||
|
|
||
| Numeric compression algorithms take advantage of the mathematic relationships between a series of numbers to compress them to a higher degree than binary compression can. Of the numeric compression algorithms available in Rust, pco achieves both the best compression ratio and the best round-trip read and write time. | ||
| Numeric compression algorithms take advantage of the mathematic relationships between a series of numbers to compress them to a higher degree than binary compression can. Of the numeric compression algorithms available in Rust, in our tests pco achieves both the best compression ratio and the best round-trip read and write time. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might read better:
|
||
|
|
||
| Compared to Postgres array data types, pco_store improves the compression ratio by 2x and improves read and write time by 5x in the included [benchmarks](benches). Better compression ratios can be expected with larger datasets. | ||
| Compared to Postgres array data types compressed with pglz, pco_store improves the compression ratio by 2x and improves read and write time by 5x in the included [benchmarks](benches). Better compression ratios can be expected with larger datasets. | ||
|
|
||
| ## Usage | ||
|
|
||
| The `pco_store::store` procedural macro accepts these arguments: | ||
|
|
||
| - `timestamp` accepts the field name for a timestamp in the struct. Timestamps are internally stored as an `i64` microsecond offset from the Unix epoch. This adds `start_at` and `end_at` timestamp columns to the resulting table. A composite index should cover `start_at` and `end_at`. | ||
| - `group_by` accepts one or more field names that are stored as uncompressed fields on the Postgres table that all other fields are grouped by. The fields are added as `load` filters, and `store` automatically groups the input data by them. A composite index should cover these fields. | ||
| - `float_round` sets the number of fractional decimal points to retain for float values. This helps improve the compression ratio when you don't need the full precision of the source data. Internally this stores the values as `i64`, with the fractional precision retained by multiplying by 10^N at write time, and then at read time casting to float and dividing by 10^N. Users should confirm that the generated integer values won't overflow past `i64::MAX`. | ||
| - `table_name` overrides the Postgres table name. By default it underscores and pluralizes the struct name, so `QueryStat` becomes `query_stats`. | ||
| - `timestamp` accepts the field name for a timestamp in the struct. Timestamps are internally stored as an `i64` microsecond offset from the Unix epoch. This requires `start_at` and `end_at` timestamp columns on the underlying table. A composite index should cover `start_at` and `end_at`. | ||
| - `group_by` accepts one or more field names that are stored as uncompressed fields on the Postgres table that all other fields are grouped by. The fields are required for `load`, and `store` automatically groups the input data by them. A composite index should cover these fields. | ||
| - `float_round` sets the number of fractional decimal points to retain for float values. This helps improve the compression ratio when you don't need the full precision of the source data. Internally this stores the values as `i64`, with the fractional precision retained by multiplying by 10^N at write time, and then at read time casting to float and dividing by 10^N. Users should confirm that the generated integer values won't overflow past `i64::MAX` (larger values will wrap around and become negative). | ||
| - `table_name` overrides the Postgres table name that is used. By default it underscores and pluralizes the struct name, so `QueryStat` becomes `query_stats`. | ||
|
|
||
| Additional notes: | ||
|
|
||
|
|
@@ -67,9 +67,9 @@ CREATE INDEX ON query_stats USING btree (database_id); | |
| CREATE INDEX ON query_stats USING btree (end_at, start_at); | ||
| ``` | ||
|
|
||
| `STORAGE EXTERNAL` is set so that Postgres doesn't try to compress the already-compressed fields | ||
| The pco-compressed columns are expected to typically be in Postgres [TOAST](https://www.postgresql.org/docs/current/storage-toast.html). Using `STORAGE EXTERNAL` is recommended so that Postgres doesn't try to compress the already-compressed fields, speeding up writes. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the TOAST sentence should be removed because it's just explaining how Postgres works, and pco_store would work whether or not TOAST existed in its current form. I'd also reword the existing sentence:
|
||
|
|
||
| This uses a `(end_at, start_at)` index because it's more selective than `(start_at, end_at)` for common use cases. For example when loading the last week of stats, the `end_at` filter is what's doing the work to filter out rows. | ||
| Its recommended to index `(end_at, start_at)`, because it's more selective than `(start_at, end_at)` for common use cases. For example when loading the last week of stats, the `end_at` filter is what's doing the work to filter out rows. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo: |
||
| ```sql | ||
| end_at >= now() - interval '7 days' AND start_at <= now() | ||
| ``` | ||
|
|
@@ -85,13 +85,13 @@ async fn example() -> anyhow::Result<()> { | |
|
|
||
| // Write | ||
| let stats = vec![QueryStat { database_id, collected_at: end - Duration::from_secs(120), fingerprint: 1, calls: 1, total_time: 1.0 }]; | ||
| QueryStats::store(db, stats).await?; | ||
| CompressedQueryStats::store(db, stats).await?; | ||
| let stats = vec![QueryStat { database_id, collected_at: end - Duration::from_secs(60), fingerprint: 1, calls: 1, total_time: 1.0 }]; | ||
| QueryStats::store(db, stats).await?; | ||
| CompressedQueryStats::store(db, stats).await?; | ||
|
|
||
| // Read | ||
| let mut calls = 0; | ||
| for group in QueryStats::load(db, &[database_id], start, end).await? { | ||
| for group in CompressedQueryStats::load(db, &[database_id], start, end).await? { | ||
| for stat in group.decompress()? { | ||
| calls += stat.calls; | ||
| } | ||
|
|
@@ -106,16 +106,16 @@ async fn example() -> anyhow::Result<()> { | |
| assert_eq!(2, db.query_one("SELECT count(*) FROM query_stats", &[]).await?.get::<_, i64>(0)); | ||
| transaction!(db, { | ||
| let mut stats = Vec::new(); | ||
| for group in QueryStats::delete(db, &[database_id], start, end).await? { | ||
| for group in CompressedQueryStats::delete(db, &[database_id], start, end).await? { | ||
| for stat in group.decompress()? { | ||
| stats.push(stat); | ||
| } | ||
| } | ||
| assert_eq!(0, db.query_one("SELECT count(*) FROM query_stats", &[]).await?.get::<_, i64>(0)); | ||
| QueryStats::store(db, stats).await?; | ||
| CompressedQueryStats::store(db, stats).await?; | ||
| }); | ||
| assert_eq!(1, db.query_one("SELECT count(*) FROM query_stats", &[]).await?.get::<_, i64>(0)); | ||
| let group = QueryStats::load(db, &[database_id], start, end).await?.remove(0); | ||
| let group = CompressedQueryStats::load(db, &[database_id], start, end).await?.remove(0); | ||
| assert_eq!(group.start_at, end - Duration::from_secs(120)); | ||
| assert_eq!(group.end_at, end - Duration::from_secs(60)); | ||
| let stats = group.decompress()?; | ||
|
|
@@ -181,3 +181,8 @@ These crates also implement numeric compression: | |
| [stream-vbyte]: https://crates.io/crates/stream-vbyte | ||
| [bitpacking]: https://crates.io/crates/bitpacking | ||
| [tsz-compress]: https://crates.io/crates/tsz-compress | ||
|
|
||
| ## License | ||
|
|
||
| Licensed under the MIT license, see LICENSE file for details. | ||
| Copyright (c) 2025, Duboce Labs, Inc. (pganalyze) <team@pganalyze.com> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.