Skip to content

Commit 901f1c4

Browse files
authored
Merge pull request #1112 from anosov1960/master
Added instructions for running from PC
2 parents 508a6ba + 97fe307 commit 901f1c4

File tree

215 files changed

+38108
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

215 files changed

+38108
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
-- Demonstrate WorldWideImporters Polybase connections
2+
-- Requires PolyBase to be installed.
3+
4+
USE WideWorldImportersDW;
5+
GO
6+
7+
-- WideWorldImporters have customers in a variety of cities but feel they are likely missing
8+
-- other important cities. They have decided to try to find other cities have a growth rate of more
9+
-- than 20% over the last 3 years, and where they do not have existing customers.
10+
-- They have obtained census data (a CSV file) and have loaded it into an Azure storage account.
11+
-- They want to combine that data with other data in their main OLTP database to work out where
12+
-- they should try to find new customers.
13+
14+
-- First, let's apply Polybase connectivity and set up an external table to point to the data
15+
-- in the Azure storage account.
16+
17+
EXEC [Application].Configuration_ApplyPolybase;
18+
GO
19+
20+
-- In Object Explorer, refresh the WideWorldImporters database, then expand the Tables node.
21+
-- Note that SQL Server 2016 added a new entry here for External Tables. Expand that node.
22+
-- Expand the dbo.CityPopulationStatistics table, expand the list of columns and note the
23+
-- values that are contained. Let's look at the data:
24+
25+
SELECT CityID, StateProvinceCode, CityName, YearNumber, LatestRecordedPopulation FROM dbo.CityPopulationStatistics;
26+
GO
27+
28+
-- How did that work? First the procedure created an external data source like this:
29+
/*
30+
31+
CREATE EXTERNAL DATA SOURCE AzureStorage
32+
WITH
33+
(
34+
TYPE=HADOOP, LOCATION = 'wasbs://[email protected]'
35+
);
36+
37+
*/
38+
-- This shows how to connect to AzureStorage. Next the procedure created an
39+
-- external file format to describe the layout of the CSV file:
40+
/*
41+
42+
CREATE EXTERNAL FILE FORMAT CommaDelimitedTextFileFormat
43+
WITH
44+
(
45+
FORMAT_TYPE = DELIMITEDTEXT,
46+
FORMAT_OPTIONS
47+
(
48+
FIELD_TERMINATOR = ','
49+
)
50+
);
51+
52+
*/
53+
-- Finally the external table was defined like this:
54+
/*
55+
56+
CREATE EXTERNAL TABLE dbo.CityPopulationStatistics
57+
(
58+
CityID int NOT NULL,
59+
StateProvinceCode nvarchar(5) NOT NULL,
60+
CityName nvarchar(50) NOT NULL,
61+
YearNumber int NOT NULL,
62+
LatestRecordedPopulation bigint NULL
63+
)
64+
WITH
65+
(
66+
LOCATION = '/',
67+
DATA_SOURCE = AzureStorage,
68+
FILE_FORMAT = CommaDelimitedTextFileFormat,
69+
REJECT_TYPE = VALUE,
70+
REJECT_VALUE = 4 -- skipping 1 header row per file
71+
);
72+
73+
*/
74+
-- From that point onwards, the external table can be used like a local table. Let's run that
75+
-- query that they wanted to use to find out which cities they should be finding new customers
76+
-- in. We'll start building the query by grouping the cities from the external table
77+
-- and finding those with more than a 20% growth rate for the period:
78+
79+
WITH PotentialCities
80+
AS
81+
(
82+
SELECT cps.CityName,
83+
cps.StateProvinceCode,
84+
MAX(cps.LatestRecordedPopulation) AS PopulationIn2016,
85+
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
86+
/ MIN(cps.LatestRecordedPopulation) AS GrowthRate
87+
FROM dbo.CityPopulationStatistics AS cps
88+
WHERE cps.LatestRecordedPopulation IS NOT NULL
89+
AND cps.LatestRecordedPopulation <> 0
90+
GROUP BY cps.CityName, cps.StateProvinceCode
91+
)
92+
SELECT CityName, StateProvinceCode, PopulationIn2016, GrowthRate
93+
FROM PotentialCities
94+
WHERE GrowthRate > 2.0;
95+
GO
96+
97+
-- Now let's combine that with our local city and sales data to exclude those where we already
98+
-- have customers. We'll find the 100 most interesting cities based upon population.
99+
100+
WITH PotentialCities
101+
AS
102+
(
103+
SELECT cps.CityName,
104+
cps.StateProvinceCode,
105+
MAX(cps.LatestRecordedPopulation) AS PopulationIn2016,
106+
(MAX(cps.LatestRecordedPopulation) - MIN(cps.LatestRecordedPopulation)) * 100.0
107+
/ MIN(cps.LatestRecordedPopulation) AS GrowthRate
108+
FROM dbo.CityPopulationStatistics AS cps
109+
WHERE cps.LatestRecordedPopulation IS NOT NULL
110+
AND cps.LatestRecordedPopulation <> 0
111+
GROUP BY cps.CityName, cps.StateProvinceCode
112+
),
113+
InterestingCities
114+
AS
115+
(
116+
SELECT DISTINCT pc.CityName,
117+
pc.StateProvinceCode,
118+
pc.PopulationIn2016,
119+
FLOOR(pc.GrowthRate) AS GrowthRate
120+
FROM PotentialCities AS pc
121+
INNER JOIN Dimension.City AS c
122+
ON pc.CityName = c.City
123+
WHERE GrowthRate > 2.0
124+
AND NOT EXISTS (SELECT 1 FROM Fact.Sale AS s WHERE s.[City Key] = c.[City Key])
125+
)
126+
SELECT TOP(100) CityName, StateProvinceCode, PopulationIn2016, GrowthRate
127+
FROM InterestingCities
128+
ORDER BY PopulationIn2016 DESC;
129+
GO
130+
131+
-- Clean up if required
132+
/*
133+
DROP EXTERNAL TABLE dbo.CityPopulationStatistics;
134+
GO
135+
DROP EXTERNAL FILE FORMAT CommaDelimitedTextFileFormat;
136+
GO
137+
DROP EXTERNAL DATA SOURCE AzureStorage;
138+
GO
139+
*/
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Sample Querying of External Data Source in WideWorldImportersDW
2+
3+
This script demonstrates the use of PolyBase to query an external data source.
4+
5+
Demographics data is available in Azure blob storage. This data is joined with sales data recorded in the local database to determine which would be good candidates for future expansion of the business.
6+
7+
### Contents
8+
9+
[About this sample](#about-this-sample)<br/>
10+
[Before you begin](#before-you-begin)<br/>
11+
[Running the sample](#run-this-sample)<br/>
12+
[Sample details](#sample-details)<br/>
13+
[Disclaimers](#disclaimers)<br/>
14+
[Related links](#related-links)<br/>
15+
16+
17+
<a name=about-this-sample></a>
18+
19+
## About this sample
20+
21+
<!-- Delete the ones that don't apply -->
22+
1. **Applies to:** SQL Server 2016 (or higher), Azure SQL Database
23+
1. **Key features:** PolyBase
24+
1. **Workload:** Analytics
25+
1. **Programming Language:** T-SQL
26+
1. **Authors:** Greg Low, Jos de Bruijn
27+
1. **Update history:** 26 May 2016 - initial revision
28+
29+
<a name=before-you-begin></a>
30+
31+
## Before you begin
32+
33+
To run this sample, you need the following prerequisites.
34+
35+
**Software prerequisites:**
36+
37+
<!-- Examples -->
38+
1. SQL Server 2016 (or higher) with PolyBase, connected to the internet.
39+
2. SQL Server Management Studio
40+
3. The WideWorldImportersDW database (Full version).
41+
42+
<a name=run-this-sample></a>
43+
44+
## Running the sample
45+
46+
1. Execute the sample script.
47+
48+
2. Inspect external tables in the database.
49+
50+
3. Review query results.
51+
52+
## Sample details
53+
54+
The sample script performs a configuration and runs three queries:
55+
56+
1. An external table `dbo.CitePopulationStatistics` is created in the database, pointing to a data set in Azure blob storage.
57+
58+
2. The data in Azure storage is queried through Transact-SQL, showing all the data in the data source.
59+
60+
3. Cities with a significant growth rate (>= 20%) are identified.
61+
62+
4. Top cities for potential expansion are identified based on external data as well as sales data in the local database.
63+
64+
<a name=disclaimers></a>
65+
66+
## Disclaimers
67+
The code included in this sample is not intended to be used for production purposes.
68+
69+
<a name=related-links></a>
70+
71+
## Related Links
72+
<!-- Links to more articles. Remember to delete "en-us" from the link path. -->
73+
For more information, see these articles:
74+
- [Get started with PolyBase](https://msdn.microsoft.com/library/mt163689.aspx)
75+
- [PolyBase: Gaining insights from HDFS and relational data in SQL Server 2016 (video)](https://channel9.msdn.com/Events/DataDriven/SQLServer2016/PolyBase)
10.1 KB
Loading
24.4 KB
Loading
34.7 KB
Loading
27.1 KB
Loading
27.5 KB
Loading
15.7 KB
Loading
36.1 KB
Loading
15.5 KB
Loading

0 commit comments

Comments
 (0)