Skip to content

Commit a171b75

Browse files
committed
OLAP DB documentation (bulk of it)
1 parent 5f1a1cc commit a171b75

File tree

7 files changed

+203
-42
lines changed

7 files changed

+203
-42
lines changed
Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,47 @@
11
# WideWorldImportersDW OLAP Database Catalog
22

3-
This folder contains documentation for the sample.
3+
The WideWorldImportersDW database is used for data warehousing and analytical processing. The transactional data about sales and purchases is generated in the WideWorldImporters database, and loaded into the WideWorldImportersDW database using a [daily ETL process](wwi-etl.md).
44

5-
Start with [root.md](root.md)
5+
The data in WideWorldImportersDW thus mirrors the data in WideWorldImporters, but the tables are organized differently. WideWorldImportersDW uses the [star schema](https://wikipedia.org/wiki/Star_schema) approach for its table design. Besides the fact and dimension tables, the database includes a number of staging tables that are used in the ETL process.
66

7-
Note that these contents will most likely be migrated to MSDN.
7+
## Schemas
8+
9+
The different types of tables are organized in three schemas.
10+
11+
|Schema|Description|
12+
|-----------------------------|---------------------|
13+
|Dimension|Dimension tables.|
14+
|Fact|Fact tables.|
15+
|Integration|Staging tables and other objects needed for ETL.|
16+
17+
## Tables
18+
19+
The dimension and fact tables are listed below. The tables in the Integration schema are used only for the ETL process, and are not listed.
20+
21+
### Dimension tables
22+
23+
WideWorldImportersDW has the following dimension tables. The description includes the relationship with the source tables in the WideWorldImporters database.
24+
25+
|Table|Source tables|
26+
|-----------------------------|---------------------|
27+
|City|`Application.Cities`, `Application.StateProvinces`, `Application.Countries`.|
28+
|Customer|`Sales.Customers`, `Sales.BuyingGroups`, `Sales.CustomerCategories`.|
29+
|Date|New table with information about dates, including financial year (based on November 1st start for financial year).|
30+
|Employee|`Application.People`.|
31+
|StockItem|`Warehouse.StockItems`, `Warehouse.Colors`, `Warehouse.PackageType`.|
32+
|Supplier|`Purchasing.Suppliers`, `Purchasing.SupplierCategories`.|
33+
|PaymentMethod|`Application.PaymentMethods`.|
34+
|TransactionType|`Application.TransactionTypes`.|
35+
36+
### Fact tables
37+
38+
WideWorldImportersDW has the following dimension tables. The description includes the relationship with the source tables in the WideWorldImporters database, as well as the classes of analytics/reporting queries each fact table is typically used with.
39+
40+
|Table|Source tables|Sample Analytics|
41+
|-----------------------------|---------------------|
42+
|Order|`Sales.Orders` and `Sales.OrderLines`|Sales people, picker/packer productivity, and on time to pick orders. In addition, low stock situations leading to back orders.|
43+
|Sale|`Sales.Invoices` and `Sales.InvoiceLines`|Sales dates, delivery dates, profitability over time, profitability by sales person.|
44+
|Purchase|`Purchasing.PurchaseOrderLines`|Expected vs actual lead times|
45+
|Transaction|`Sales.CustomerTransactions` and `Purchasing.SupplierTransactions`|Measuring issue dates vs finalization dates, and amounts.|
46+
|Movement|`Warehouse.StockTransactions`|Movements over time.|
47+
|Stock Holding|`Warehouse.StockItemHoldings`|On-hand stock levels and value|
Lines changed: 46 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,50 @@
11
# WideWorldImportersDW Installation and Configuration
22

3-
This folder contains documentation for the sample.
43

5-
Start with [root.md](root.md)
4+
- [SQL Server 2016](https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016) (or higher) or [Azure SQL Database](https://azure.microsoft.com/services/sql-database/). To use the Full version of the sample, use SQL Server Evaluation/Developer/Enterprise Edition.
5+
- [SQL Server Management Studio](https://msdn.microsoft.com/library/mt238290.aspx). For the best results use the April 2016 preview or later.
66

7-
Note that these contents will most likely be migrated to MSDN.
7+
## Download
8+
9+
The latest release of the sample:
10+
11+
[wide-world-importers-v0.1](https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v0.1)
12+
13+
Download the sample WideWorldImportersDW database backup/bacpac that corresponds to your edition of SQL Server or Azure SQL Database.
14+
15+
Source code to recreate the sample database is available from the following location. Note that data population is based on ETL from the OLTP database (WideWorldImporters):
16+
17+
[wide-world-importers](https://github.com/Microsoft/sql-server-samples/tree/master/samples/databases/wide-world-importers/wwi-dw-database-scripts)
18+
19+
## Install
20+
21+
22+
### SQL Server
23+
24+
To restore a backup to a SQL Server instance, you can use Management Studio.
25+
1. Open SQL Server Management Studio and connect to the target SQL Server instance.
26+
2. Right-click on the **Databases** node, and select **Restore Database**.
27+
3. Select **Device** and click on the button **...**
28+
4. In the dialog **Select backup devices**, click **Add**, navigate to the database backup in the filesystem of the server, and select the backup. Click **OK**.
29+
5. If needed, change the target location for the data and log files, in the **Files** pane. Note that it is best practice to place data and log files on different drives.
30+
6. Click **OK**. This will initiate the database restore. After it completes, you will have the database WideWorldImporters installed on your SQL Server instance.
31+
32+
### Azure SQL Database
33+
34+
To import a bacpac into a new SQL Database, you can use Management Studio.
35+
1. (optional) If you do not yet have a SQL Server in Azure, navigate to the [Azure portal](https://portal.azure.com/) and create a new SQL Database. In the process of create a database, you will create a server. Make note of the server.
36+
- See [this tutorial](https://azure.microsoft.com/documentation/articles/sql-database-get-started/) to create a database in minutes
37+
2. Open SQL Server Management Studio and connect to your server in Azure.
38+
3. Right-click on the **Databases** node, and select **Import Data-Tier Application**.
39+
4. In the **Import Settings** select **Import from local disk** and select the bacpac of the sample database from your file system.
40+
5. Under **Database Settings** change the database name to *WideWorldImportersDW* and select the target edition and service objective to use.
41+
6. Click **Next** and **Finish** to kick off deployment. It will take a few minutes to complete. When specifying a service objective lower than S2 it may take longer.
42+
43+
## Configuration
44+
45+
The sample database can make use of PolyBase to query files in Hadoop or Azure blob storage. However, that feature is not installed by default with SQL Server - you need to select it during SQL Server setup. Therefore, a post-installation step is required.
46+
47+
1. In SQL Server Management Studio, connect to the WideWorldImportersDW database and open a new query window.
48+
2. Run the following T-SQL command to enable the use of PolyBase in the database:
49+
50+
EXECUTE [Application].[Configuration_ApplyPolyBase]
Lines changed: 83 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,87 @@
11
# WideWorldImportersDW Use of SQL Server Features and Capabilities
22

3-
This folder contains documentation for the sample.
3+
WideWorldImportersDW is designed to showcase many of the key features of SQL Server that are suitable for data warehousing and analytics. The following is a list of SQL Server features and capabilities, and a description of how they are used in WideWorldImportersDW.
44

5-
Start with [root.md](root.md)
5+
## PolyBase
66

7-
Note that these contents will most likely be migrated to MSDN.
7+
[Applies to SQL Server (2016 and later)]
8+
9+
PolyBase is used to combine sales information from WideWorldImportersDW with a public data set about demographics to understand which cities might be of interest for further expansion of sales.
10+
11+
To enable the use of PolyBase in the sample database, make sure it is installed, and run the following statement in the database:
12+
13+
EXEC [Application].[Configuration_ApplyPolybase]
14+
15+
This will create an external table `dbo.CityPopulationStatistics` that references a public data set that contains population data for cities in the United States, hosted in Azure blob storage. The following query returns the data from that external data set:
16+
17+
SELECT CityID, StateProvinceCode, CityName, YearNumber, LatestRecordedPopulation FROM dbo.CityPopulationStatistics;
18+
19+
To understand which cities might be of interest for further expansion, the following query looks at the growth rate of cities, and returns the top 100 largest cities with significant growth, and where Wide World Importers does not have a sales presence. The query involves a join between the remote table `dbo.CityPopulationStatistics` and the local table `Dimension.City`, and a filter involving the local table `Fact.Sales`.
20+
21+
WITH PotentialCities
22+
AS
23+
(
24+
SELECT cps.CityName,
25+
cps.StateProvinceCode,
26+
MAX(cps.LatestRecordedPopulation) AS PopulationIn2016,
27+
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
28+
/ MIN(cps.LatestRecordedPopulation) AS GrowthRate
29+
FROM dbo.CityPopulationStatistics AS cps
30+
WHERE cps.LatestRecordedPopulation IS NOT NULL
31+
AND cps.LatestRecordedPopulation <> 0
32+
GROUP BY cps.CityName, cps.StateProvinceCode
33+
),
34+
InterestingCities
35+
AS
36+
(
37+
SELECT DISTINCT pc.CityName,
38+
pc.StateProvinceCode,
39+
pc.PopulationIn2016,
40+
FLOOR(pc.GrowthRate) AS GrowthRate
41+
FROM PotentialCities AS pc
42+
INNER JOIN Dimension.City AS c
43+
ON pc.CityName = c.City
44+
WHERE GrowthRate > 2.0
45+
AND NOT EXISTS (SELECT 1 FROM Fact.Sale AS s WHERE s.[City Key] = c.[City Key])
46+
)
47+
SELECT TOP(100) CityName, StateProvinceCode, PopulationIn2016, GrowthRate
48+
FROM InterestingCities
49+
ORDER BY PopulationIn2016 DESC;
50+
51+
## Clustered Columnstore Indexes
52+
53+
(Full version of the sample)
54+
55+
Clustered Columnstore Indexes (CCI) are used with all the fact tables, to reduce storage footprint and improve query performance. With the use of CCI, the base storage for the fact tables uses column compression.
56+
57+
Nonclustered indexes are used on top of the clustered columnstore index, to facilitate primary key and foreign key constraints. These constraints were added out of an abundance of caution - the ETL process sources the data from the WideWorldImporters database, which has constraints to enforce integrity. Removing primary and foreign key constraints, and their supporting indexes, would reduce the storage footprint of the fact tables.
58+
59+
**Data size**
60+
61+
The sample database has limited data size, to make it easy to download and install the sample. However, to see the real performance benefits of columnstore indexes, you would want to use a larger data set.
62+
63+
You can run the following statement to increase the size of the `Fact.Sales` table by inserting another 12 million rows of sample data. These rows are all inserted for the year 2012, such that there is no interference with the ETL process.
64+
65+
EXECUTE [Application].[Configuration_PopulateLargeSaleTable]
66+
67+
This statement will take around 5 minutes to run. To insert more than 12 million rows, pass the desired number of rows to insert as a parameter to this stored procedure.
68+
69+
To compare query performance with and without columnstore, you can drop and/or recreate the clustered columnstore index.
70+
71+
To drop the index:
72+
73+
DROP INDEX [CCX_Fact_Order] ON [Fact].[Order]
74+
75+
To recreate:
76+
77+
CREATE CLUSTERED COLUMNSTORE INDEX [CCX_Fact_Order] ON [Fact].[Order]
78+
79+
## Partitioning
80+
81+
(Full version of the sample)
82+
83+
Data size in a Data Warehouse can grow very large. Therefore it is best practice to use partitioning to manage the storage of the large tables in the database.
84+
85+
All of the larger fact tables are partitioned by year. The only exception is `Fact.Stock Holdings`, which is not date-based and has limited data size compared with the other fact tables.
86+
87+
The partition function used for all partitioned tables is `PF_Date`, and the partition scheme being used is `PS_Date`.

samples/databases/wide-world-importers/documentation/wwi-oltp-htap-catalog.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# WideWorldImporters Database Catalog
22

3-
The WideWorldImporters database contains all the transaction information and daily data for sales and purchases.
3+
The WideWorldImporters database contains all the transaction information and daily data for sales and purchases, as well as sensor data for vehicles and cold rooms.
44

55
## Schemas
66

samples/databases/wide-world-importers/documentation/wwi-oltp-htap-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ The latest release of the sample:
1111

1212
[wide-world-importers-v0.1](https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v0.1)
1313

14-
Download the sample database backup/bacpac that corresponds to your edition of SQL Server or Azure SQL Database.
14+
Download the sample WideWorldImporters database backup/bacpac that corresponds to your edition of SQL Server or Azure SQL Database.
1515

1616
Source code to recreate the sample database is available from the following location. Note that recreating the sample will result in slight differences in the data, since there is a random factor in the data generation:
1717

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# WideWorldImporters Sample Queries
22

3-
This folder contains documentation for the sample.
3+
Refer to the sample-scripts.zip file that is included with the release of the sample, or refer to the source code:
44

5-
Start with [root.md](root.md)
6-
7-
Note that these contents will most likely be migrated to MSDN.
5+
[wide-world-importers/sample-scripts](https://github.com/Microsoft/sql-server-samples/tree/master/samples/databases/wide-world-importers/sample-scripts)

samples/databases/wide-world-importers/sample-scripts/polybase/DemonstratePolybase.sql

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
USE WideWorldImportersDW;
55
GO
66

7-
-- WideWorldImporters have customers in a variety of cities but feel they are likely missing
8-
-- other important cities. They have decided to try to find other cities have a growth rate of more
7+
-- WideWorldImporters have customers in a variety of cities but feel they are likely missing
8+
-- other important cities. They have decided to try to find other cities have a growth rate of more
99
-- than 20% over the last 3 years, and where they do not have existing customers.
1010
-- They have obtained census data (a CSV file) and have loaded it into an Azure storage account.
1111
-- They want to combine that data with other data in their main OLTP database to work out where
@@ -22,28 +22,28 @@ GO
2222
-- Expand the dbo.CityPopulationStatistics table, expand the list of columns and note the
2323
-- values that are contained. Let's look at the data:
2424

25-
SELECT * FROM dbo.CityPopulationStatistics;
25+
SELECT CityID, StateProvinceCode, CityName, YearNumber, LatestRecordedPopulation FROM dbo.CityPopulationStatistics;
2626
GO
2727

2828
-- How did that work? First the procedure created an external data source like this:
2929
/*
3030
31-
CREATE EXTERNAL DATA SOURCE AzureStorage
32-
WITH
31+
CREATE EXTERNAL DATA SOURCE AzureStorage
32+
WITH
3333
(
3434
TYPE=HADOOP, LOCATION = 'wasbs://[email protected]'
3535
);
3636
3737
*/
38-
-- This shows how to connect to AzureStorage. Next the procedure created an
38+
-- This shows how to connect to AzureStorage. Next the procedure created an
3939
-- external file format to describe the layout of the CSV file:
4040
/*
4141
42-
CREATE EXTERNAL FILE FORMAT CommaDelimitedTextFileFormat
43-
WITH
42+
CREATE EXTERNAL FILE FORMAT CommaDelimitedTextFileFormat
43+
WITH
4444
(
45-
FORMAT_TYPE = DELIMITEDTEXT,
46-
FORMAT_OPTIONS
45+
FORMAT_TYPE = DELIMITEDTEXT,
46+
FORMAT_OPTIONS
4747
(
4848
FIELD_TERMINATOR = ','
4949
)
@@ -61,35 +61,35 @@ CREATE EXTERNAL TABLE dbo.CityPopulationStatistics
6161
YearNumber int NOT NULL,
6262
LatestRecordedPopulation bigint NULL
6363
)
64-
WITH
65-
(
66-
LOCATION = '/',
64+
WITH
65+
(
66+
LOCATION = '/',
6767
DATA_SOURCE = AzureStorage,
6868
FILE_FORMAT = CommaDelimitedTextFileFormat,
6969
REJECT_TYPE = VALUE,
7070
REJECT_VALUE = 4 -- skipping 1 header row per file
7171
);
7272
7373
*/
74-
-- From that point onwards, the external table can be used like a local table. Let's run that
74+
-- From that point onwards, the external table can be used like a local table. Let's run that
7575
-- query that they wanted to use to find out which cities they should be finding new customers
7676
-- in. We'll start building the query by grouping the cities from the external table
7777
-- and finding those with more than a 20% growth rate for the period:
7878

7979
WITH PotentialCities
8080
AS
8181
(
82-
SELECT cps.CityName,
82+
SELECT cps.CityName,
8383
cps.StateProvinceCode,
8484
MAX(cps.LatestRecordedPopulation) AS PopulationIn2016,
85-
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
85+
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
8686
/ MIN(cps.LatestRecordedPopulation) AS GrowthRate
8787
FROM dbo.CityPopulationStatistics AS cps
8888
WHERE cps.LatestRecordedPopulation IS NOT NULL
89-
AND cps.LatestRecordedPopulation <> 0
89+
AND cps.LatestRecordedPopulation <> 0
9090
GROUP BY cps.CityName, cps.StateProvinceCode
9191
)
92-
SELECT *
92+
SELECT CityName, StateProvinceCode, PopulationIn2016, GrowthRate
9393
FROM PotentialCities
9494
WHERE GrowthRate > 2.0;
9595
GO
@@ -100,31 +100,31 @@ GO
100100
WITH PotentialCities
101101
AS
102102
(
103-
SELECT cps.CityName,
103+
SELECT cps.CityName,
104104
cps.StateProvinceCode,
105105
MAX(cps.LatestRecordedPopulation) AS PopulationIn2016,
106-
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
106+
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
107107
/ MIN(cps.LatestRecordedPopulation) AS GrowthRate
108108
FROM dbo.CityPopulationStatistics AS cps
109109
WHERE cps.LatestRecordedPopulation IS NOT NULL
110-
AND cps.LatestRecordedPopulation <> 0
110+
AND cps.LatestRecordedPopulation <> 0
111111
GROUP BY cps.CityName, cps.StateProvinceCode
112112
),
113113
InterestingCities
114114
AS
115115
(
116-
SELECT DISTINCT pc.CityName,
117-
pc.StateProvinceCode,
116+
SELECT DISTINCT pc.CityName,
117+
pc.StateProvinceCode,
118118
pc.PopulationIn2016,
119119
FLOOR(pc.GrowthRate) AS GrowthRate
120120
FROM PotentialCities AS pc
121121
INNER JOIN Dimension.City AS c
122-
ON pc.CityName = c.City
122+
ON pc.CityName = c.City
123123
WHERE GrowthRate > 2.0
124124
AND NOT EXISTS (SELECT 1 FROM Fact.Sale AS s WHERE s.[City Key] = c.[City Key])
125125
)
126-
SELECT TOP(100) *
127-
FROM InterestingCities
126+
SELECT TOP(100) CityName, StateProvinceCode, PopulationIn2016, GrowthRate
127+
FROM InterestingCities
128128
ORDER BY PopulationIn2016 DESC;
129129
GO
130130

@@ -136,4 +136,4 @@ DROP EXTERNAL FILE FORMAT CommaDelimitedTextFileFormat;
136136
GO
137137
DROP EXTERNAL DATA SOURCE AzureStorage;
138138
GO
139-
*/
139+
*/

0 commit comments

Comments
 (0)