Skip to content

Commit 323bbc5

Browse files
stayseesongmarkzegarelli
andauthored
Apply suggestions from code review
Co-authored-by: markzegarelli <[email protected]>
1 parent 3699e7e commit 323bbc5

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

src/connections/storage/warehouses/choose-warehouse.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ Both Redshift and BigQuery are attractive cloud-hosted, affordable, and performa
1717

1818
## Architecture
1919

20-
When you provision a Redshift cluster, you're renting a server from Amazon Web Services. Your cluster comprises of [nodes](http://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html), each with dedicated memory, CPU, and disk storage. These nodes handle data storage, query execution, and - if your cluster contains multiple nodes - a leader node will handle coordination across the cluster.
20+
When you provision a Redshift cluster, you're renting a server from Amazon Web Services. Your cluster consists of [nodes](http://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html), each with dedicated memory, CPU, and disk storage. These nodes handle data storage, query execution, and - if your cluster contains multiple nodes - a leader node will handle coordination across the cluster.
2121

2222
Redshift performance and storage capacity is a function of cluster size and cluster type. As your storage or performance requirements change, you can scale up or down your cluster as needed.
2323

2424
With BigQuery, you're not constrained by the storage capacity or compute resources of a given cluster. Instead, you can load large amounts of data into BigQuery without running out of memory, and execute complex queries without maxing out CPU.
2525

26-
This is possible because BigQuery takes advantage of distributed storage and networking to separate data storage from compute power. Data distributes across many servers in the Google cloud using their [Colossus distributed file system](https://cloud.google.com/blog/big-data/2016/01/bigquery-under-the-hood). When you execute a query, the [Dremel query engine](https://cloud.google.com/blog/big-data/2016/01/bigquery-under-the-hood) splits the query into smaller sub-tasks, distributes the sub-tasks to computers across Google data centers, and then re-assembles them into your results.
26+
This is possible because BigQuery takes advantage of distributed storage and networking to separate data storage from compute power. Google's[Colossus distributed file system](https://cloud.google.com/blog/big-data/2016/01/bigquery-under-the-hood) distributes data across many servers in the Google cloud. When you execute a query, the [Dremel query engine](https://cloud.google.com/blog/big-data/2016/01/bigquery-under-the-hood) splits the query into smaller sub-tasks, distributes the sub-tasks to computers across Google data centers, and then re-assembles them into your results.
2727

2828
## Pricing
2929

src/connections/storage/warehouses/redshift-tuning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ To help you improve your query performance, this guide takes you through common
1313

1414
As your data volume grows and your team writes more queries, you might be running out of space in your cluster.
1515

16-
To check if you're getting close to your max, run this query. It will tell you the percentage of storage used in your cluster. Segment recommends never exceeding 75-80% of your storage capacity. If you're nearing capacity, consider adding some more nodes.
16+
To check if you're getting close to your max, run this query. It will tell you the percentage of storage used in your cluster. Segment recommends that you don't exceed 75-80% of your storage capacity. If you approach that limit, consider adding more nodes to your cluster.
1717

1818
![](images/asset_HvZs8FpE.png)
1919

@@ -61,7 +61,7 @@ As mentioned before, Redshift schedules and prioritizes queries using [Workload
6161

6262
The default configuration is a single queue with only 5 queries running concurrently, but Segment discovered that the default only works well for low-volume warehouses. More often than not, adjusting this configuration can improve your sync times.
6363

64-
Before Segment's SQL statements, Segment uses `set query_group to "segment";` to group all the queries together. This allows you to create a queue just for Segment that isolates from your own queries. The maximum concurrency that Redshift supports is 50 across _all_ query groups, and resources like memory distribute evenly across all those queries.
64+
Before Segment's SQL statements, Segment uses `set query_group to "segment";` to group all the queries together. This allows you to create a queue that isolates Segment's queries from your own. The maximum concurrency that Redshift supports is 50 across _all_ query groups, and resources like memory distribute evenly across all those queries.
6565

6666
Segment's initial recommendation is for 2 WLM queues:
6767

0 commit comments

Comments
 (0)