Skip to content

Commit a712f07

Browse files
authored
Merge pull request #1 from ananto-msft/tina-private-preview
Tina private preview
2 parents 9f8b0f1 + f1de60a commit a712f07

File tree

11 files changed

+863
-0
lines changed

11 files changed

+863
-0
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Azure Arc Data Controller clusters
2+
3+
Installation instructions for SQL Server 2019 big data clusters can be found [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sql-server-ver15).
4+
5+
## Samples Setup
6+
7+
**Before you begin**, load the sample data into your big data cluster. For instructions, see [Load sample data into a SQL Server 2019 big data cluster](https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data).
8+
9+
## Executing the sample scripts
10+
The scripts should be executed in a specific order to test the various features. Execute the scripts from each folder in below order:
11+
12+
1. __[spark/data-loading/transform-csv-files.ipynb](spark/data-loading/transform-csv-files.ipynb)__
13+
1. __[data-virtualization/generic-odbc](data-virtualization/generic-odbc)__
14+
1. __[data-virtualization/hadoop](data-virtualization/hadoop)__
15+
1. __[data-virtualization/storage-pool](data-virtualization/storage-pool)__
16+
1. __[data-virtualization/oracle](data-virtualization/oracle)__
17+
1. __[data-pool](data-pool/)__
18+
1. __[machine-learning/sql/r](machine-learning/sql/r)__
19+
1. __[machine-learning/sql/python](machine-learning/sql/python)__
20+
21+
## __[data-pool](data-pool/)__
22+
23+
SQL Server 2019 big data cluster contains a data pool which consists of many SQL Server instances to store data & query in a scale-out manner.
24+
25+
### Data ingestion using Spark
26+
The sample script [data-pool/data-ingestion-spark.sql](data-pool/data-ingestion-spark.sql) shows how to perform data ingestion from Spark into data pool table(s).
27+
28+
### Data ingestion using sql
29+
The sample script [data-pool/data-ingestion-sql.sql](data-pool/data-ingestion-sql.sql) shows how to perform data ingestion from T-SQL into data pool table(s).
30+
31+
## __[data-virtualization](data-virtualization/)__
32+
33+
SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources.
34+
35+
### External table over Generic ODBC data source
36+
The [data-virtualization/generic-odbc](data-virtualization/generic-odbc) folder contains samples that demonstrate how to query data in MySQL & PostgreSQL using external tables and generic ODBC data source. The generic ODBC data soruce can be used only in SQL Server 2019 on Windows.
37+
38+
### External table over Hadoop
39+
The [data-virtualization/hadoop](data-virtualization/hadoop) folder contains samples that demonstrate how to query data in HDFS using external tables. This demonstrates the functionality available from SQL Server 2016 using the HADOOP data source.
40+
41+
### External table over Oracle
42+
The [data-virtualization/oracle](data-virtualization/oracle) folder contains samples that demonstrate how to query data in Oracle using external tables.
43+
44+
### External table over Storage Pool
45+
SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The [data-virtualization/storage-pool](data-virtualization/storage-pool) folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.
46+
47+
## __[deployment](deployment/)__
48+
49+
The [deployment](deployment) folder contains the scripts for deploying a Kubernetes cluster for SQL Server 2019 big data cluster.
50+
51+
## __[machine-learning](machine-learning/)__
52+
53+
SQL Server 2016 added support executing R scripts from T-SQL. SQL Server 2017 added support for executing Python scripts from T-SQL. SQL Server 2019 adds support for executing Java code from T-SQL. SQL Server 2019 big data cluster adds support for executing Spark code inside the big data cluster.
54+
55+
### SQL Server Machine Learning Services
56+
The [machine-learning\sql](machine-learning\sql) folder contains the sample SQL scripts that show how to invoke R, Python, and Java code from T-SQL.
57+
58+
### Spark Machine Learning
59+
The [machine-learning\spark](machine-learning\spark) folder contains the Spark samples.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
2+
# Creating a Kubernetes cluster for SQL Server 2019 big data cluster
3+
4+
SQL Server 2019 big data cluster is deployed as docker containers on a Kubernetes cluster. These samples provide scripts that can be used to provision a Kubernetes clusters using different environments.
5+
6+
## __[Deploy a Kubernetes cluster using kubeadm](kubeadm/)__
7+
8+
Use the scripts in the **kubeadm** folder to deploy a Kubernetes cluster over one or more Linux machines (physical or virtualized) using `kubeadm` utility.
9+
10+
## __[Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS)](aks/)__
11+
12+
Using the sample Python script in **aks** folder, you will deploy a Kubernetes cluster in Azure using AKS and a SQL Server big data cluster using on top of it.
13+
14+
## __[Push SQL Server big data cluster images to your own private Docker repository](offline/)__
15+
16+
Using the sample Python script in **offline** folder, you will push the necessary images required for the deployment to your own repository.
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
# Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS)
3+
4+
Using this sample Python script, you will deploy a Kubernetes cluster in Azure using AKS and a SQL Server big data cluster using this AKS cluster as its environment. The script can be run from any client OS.
5+
6+
7+
## Pre-requisites
8+
9+
1. Install latest version of [az cli](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
10+
1. Running the script will require: [python minimum version 3.0](https://www.python.org/downloads)
11+
1. Install the latest version of [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
12+
1. Ensure you have installed `mssqlctl` CLI and its prerequisites:
13+
- Install [pip3](https://pip.pypa.io/en/stable/installing/).
14+
- Install/update requests package. Run the command below using elevated priviledges (sudo or admin cmd window):
15+
```
16+
python -m pip install requests
17+
python -m pip install requests --upgrade
18+
```
19+
- Install latest version of the cluster management tool **azdata** (previously named mssqlctl) using below command. Run the command below using elevated priviledges (sudo or admin cmd window):
20+
```
21+
pip3 install -r https://aka.ms/azdata
22+
```
23+
1. Login into your Azure account. Run this command:
24+
```
25+
az login
26+
```
27+
28+
## Instructions
29+
30+
Run the script using:
31+
```
32+
python deploy-sql-big-data-aks.py
33+
```
34+
35+
>**Note**
36+
>
37+
>If you have both python3 and python2 on your client machine and in the path, you will have to run the command using python3:
38+
>```
39+
>python3 deploy-sql-big-data-aks.py
40+
>```
41+
42+
43+
When prompted, provide your input for Azure subscription ID, Azure resource group to create the resources in, and Docker credentials. Optionally, you can also provide your input for below configurations or use the defaults provided:
44+
- azure_region
45+
- vm_size - we recommend to use a VM size to accommodate your workload. For an optimal experience while you are validating basic scenarios, we recommend at least 8 vCPUs and 64GB memory across all agent nodes in the cluster. The script uses **Standard_L8s** as default. A default size configuration also uses about 24 disks for persistent volume claims across all components.
46+
- aks_node_count - this is the number of the worker nodes for the AKS cluster, excluding master node. The script is using a default of 1 agent node. This is the minimum required for this VM size to have enough resources and disks to provision all the necessary persistent volumes.
47+
- cluster_name - this value is used for both AKS cluster and SQL big data cluster created on top of AKS. Note that the name of the SQL big data cluster is going to be a Kubernetes namespace
48+
- password - same value is going to be used for all accounts that require user password input: SQL Server master instance account created for the below **username**, controller user and Knox **root** user
49+
- username - this is the username for the accounts provisioned during deployment for the controller admin account and SQL Server master instance account. Note that **sa** SQL Server account is disabled automatically for you, as a best practice. Username for Knox gateway account is going to be **root**.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
#
2+
# Prerequisites:
3+
#
4+
# Azure CLI (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli), python3 (https://www.python.org/downloads), azdata CLI (pip3 install -r https://aka.ms/azdata)
5+
#
6+
# Run `az login` at least once BEFORE running this script
7+
#
8+
9+
from subprocess import check_output, CalledProcessError, STDOUT, Popen, PIPE
10+
import os
11+
import getpass
12+
13+
def executeCmd (cmd):
14+
if os.name=="nt":
15+
process = Popen(cmd.split(),stdin=PIPE, shell=True)
16+
else:
17+
process = Popen(cmd.split(),stdin=PIPE)
18+
stdout, stderr = process.communicate()
19+
if (stderr is not None):
20+
raise Exception(stderr)
21+
22+
#
23+
# MUST INPUT THESE VALUES!!!!!
24+
#
25+
SUBSCRIPTION_ID = input("Provide your Azure subscription ID:").strip()
26+
GROUP_NAME = input("Provide Azure resource group name to be created:").strip()
27+
# Use this only if you are using a private registry different than default Micrososft registry (mcr).
28+
#DOCKER_USERNAME = input("Provide your Docker username:").strip()
29+
#DOCKER_PASSWORD = getpass.getpass("Provide your Docker password:").strip()
30+
31+
#
32+
# Optionally change these configuration settings
33+
#
34+
AZURE_REGION=input("Provide Azure region - Press ENTER for using `westus`:").strip() or "westus"
35+
VM_SIZE=input("Provide VM size for the AKS cluster - Press ENTER for using `Standard_L8s`:").strip() or "Standard_L8s"
36+
AKS_NODE_COUNT=input("Provide number of worker nodes for AKS cluster - Press ENTER for using `1`:").strip() or "1"
37+
38+
#This is both Kubernetes cluster name and SQL Big Data cluster name
39+
CLUSTER_NAME=input("Provide name of AKS cluster and SQL big data cluster - Press ENTER for using `sqlbigdata`:").strip() or "sqlbigdata"
40+
41+
#This password will be use for Controller user, Knox user and SQL Server Master SA accounts
42+
#
43+
AZDATA_USERNAME=input("Provide username to be used for Controller and SQL Server master accounts - Press ENTER for using `admin`:").strip() or "admin"
44+
AZDATA_PASSWORD = getpass.getpass("Provide password to be used for Controller user, Knox user (root) and SQL Server Master accounts - Press ENTER for using `MySQLBigData2019`").strip() or "MySQLBigData2019"
45+
46+
# Docker registry details
47+
# Use this only if you are using a private registry different than mcr. If so, make sure you are also setting the environment variables for DOCKER_USERNAME and DOCKER_PASSWORD
48+
# DOCKER_REGISTRY="<your private registry>"
49+
# DOCKER_REPOSITORY="<your private repository>"
50+
# DOCKER_IMAGE_TAG="<your Docker image tag>"
51+
52+
print ('Setting environment variables')
53+
os.environ['AZDATA_PASSWORD'] = AZDATA_PASSWORD
54+
os.environ['AZDATA_USERNAME'] = AZDATA_USERNAME
55+
# Use this only if you are using a private registry different than mcr. If so, you must set the environment variables for DOCKER_USERNAME and DOCKER_PASSWORD
56+
# os.environ['DOCKER_USERNAME']=DOCKER_USERNAME
57+
# os.environ['DOCKER_PASSWORD']=DOCKER_PASSWORD
58+
os.environ['ACCEPT_EULA']="Yes"
59+
60+
print ("Set azure context to subcription: "+SUBSCRIPTION_ID)
61+
command = "az account set -s "+ SUBSCRIPTION_ID
62+
executeCmd (command)
63+
64+
print ("Creating azure resource group: "+GROUP_NAME)
65+
command="az group create --name "+GROUP_NAME+" --location "+AZURE_REGION
66+
executeCmd (command)
67+
68+
print("Creating AKS cluster: "+CLUSTER_NAME)
69+
command = "az aks create --name "+CLUSTER_NAME+" --resource-group "+GROUP_NAME+" --generate-ssh-keys --node-vm-size "+VM_SIZE+" --node-count "+AKS_NODE_COUNT
70+
executeCmd (command)
71+
72+
command = "az aks get-credentials --overwrite-existing --name "+CLUSTER_NAME+" --resource-group "+GROUP_NAME+" --admin"
73+
executeCmd (command)
74+
75+
print("Creating SQL Big Data cluster:" +CLUSTER_NAME)
76+
command="azdata bdc config init --source aks-dev-test --target custom --force"
77+
executeCmd (command)
78+
79+
command="azdata bdc config replace -c custom/bdc.json -j ""metadata.name=" + CLUSTER_NAME + ""
80+
executeCmd (command)
81+
82+
# Use this only if you are using a private registry different than default Micrososft registry (mcr).
83+
# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.registry=" + DOCKER_REGISTRY + ""
84+
# executeCmd (command)
85+
86+
# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.repository=" + DOCKER_REPOSITORY + ""
87+
# executeCmd (command)
88+
89+
# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.imageTag=" + DOCKER_IMAGE_TAG + ""
90+
# executeCmd (command)
91+
92+
command="azdata bdc create -c custom --accept-eula yes"
93+
executeCmd (command)
94+
95+
command="azdata login -n " + CLUSTER_NAME
96+
executeCmd (command)
97+
98+
print("")
99+
print("SQL Server big data cluster endpoints: ")
100+
command="azdata bdc endpoint list -o table"
101+
executeCmd(command)
102+
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Create a Kubernetes cluster using Kubeadm on Ubuntu 16.04 LTS or 18.04 LTS
2+
3+
4+
## __[ubuntu](ubuntu/)__
5+
6+
This folder contains scripts that provide a template for deploying a Kubernetes cluster using kubeadm on one or more Linux machines.
7+
8+
## __[ubuntu-single-node-vm](ubuntu-single-node-vm/)__
9+
10+
This folder contains a sample script that can be used to create a single-node Kubernetes cluster on a Linux machine and deploy SQL Server big data cluster.
11+
12+
## __[ubuntu-single-node-vm-ad](ubuntu-single-node-vm-ad/)__
13+
14+
This folder contains a sample script that can be used to create a single-node Kubernetes cluster on a Linux machine and deploy SQL Server big data cluster with Active Directory integration.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
2+
# Deploy a SQL Server big data cluster on single node Kubernetes cluster (kubeadm)
3+
4+
Using this sample bash script, you will deploy a single node Kubernetes cluster using kubeadm and a SQL Server big data cluster on top of it. The script must be run from the VM you are planning to use for your kubeadm deployment.
5+
6+
## Pre-requisites
7+
8+
1. A vanilla Ubuntu 16.04 or 18.04 virtual or physical machine. All dependencies will be setup by the script. Using Azure Linux VMs is not yet supported.
9+
1. Machine should have at least 8 CPUs, 64GB RAM and 100GB disk space. After installing the images you will be left with 50GB for data/logs across all components.
10+
1. Update existing packages using commands below to ensure that the OS image is up to date
11+
12+
``` bash
13+
sudo apt update&&apt upgrade -y
14+
sudo systemctl reboot
15+
```
16+
17+
## Recommended Virtual Machine settings
18+
19+
1. Use static memory configuration for the virtual machine. For example, in hyper-v installations do not use dynamic memory allocation but instead allocate the recommended 64 GB or higher.
20+
21+
1. Use checkpoint or snapshot capability in your hyper visor so that you can rollback the virtual machine to a clean state.
22+
23+
## Instructions to deploy SQL Server big data cluster
24+
25+
1. Download the script on the VM you are planning to use for the deployment
26+
27+
``` bash
28+
curl --output setup-bdc.sh https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu-single-node-vm/setup-bdc.sh
29+
```
30+
31+
2. Make the script executable
32+
33+
``` bash
34+
chmod +x setup-bdc.sh
35+
```
36+
37+
3. Run the script (make sure you are running with sudo)
38+
39+
``` bash
40+
sudo ./setup-bdc.sh
41+
```
42+
43+
4. Refresh alias setup for azdata
44+
45+
``` bash
46+
source ~/.bashrc
47+
```
48+
49+
When prompted, provide your input for the password that will be used for all external endpoints: controller, SQL Server master and gateway. The password should be sufficiently complex based on existing rules for SQL Server password. The controller username is defaulted to *admin*.
50+
51+
## Cleanup
52+
53+
1. The [cleanup-bdc.sh](cleanup-bdc.sh/) script is provided as convenience to reset the environment in case of errors. However, we recommend that you use a virtual machine for testing purposes and use the snapshot capability in your hyper-visor to rollback the virtual machine to a clean state.
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
#!/bin/bash
2+
3+
if [ "$EUID" -ne 0 ]
4+
then echo "Please run as root"
5+
exit
6+
fi
7+
DIR_PREFIX=$1
8+
9+
kubeadm reset --force
10+
11+
# Clean up azdata-cli package.
12+
#
13+
unalias azdata
14+
sudo dpkg --remove --force-all azdata-cli
15+
16+
systemctl stop kubelet
17+
rm -rf /var/lib/cni/
18+
rm -rf /var/lib/etcd/
19+
rm -rf /run/flannel/
20+
rm -rf /var/lib/kubelet/*
21+
rm -rf /etc/cni/
22+
rm -rf /etc/kubernetes/
23+
24+
ip link set cni0 down
25+
#brctl delbr cni0
26+
ip link set flannel.1 down
27+
#brctl delbr flannel.1
28+
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
29+
30+
rm -rf .azdata/
31+
rm -rf bdcdeploy/
32+
33+
# Remove mounts.
34+
#
35+
SERVICE_STOP_FAILED=0
36+
37+
systemctl | grep "/var/lib/kubelet/pods" | while read -r line; do
38+
39+
# Retrieve the mount path
40+
#
41+
MOUNT_PATH=`echo "$line" | grep -v echo | egrep -oh -m 1 "(/var/lib/kubelet/pods).+"`
42+
43+
if [ -z "$MOUNT_PATH" ]; then
44+
continue
45+
fi
46+
47+
if [[ ! -d "$MOUNT_PATH" ]] && [[ ! -f "$MOUNT_PATH" ]]; then
48+
49+
SERVICE=$(echo $line | cut -f1 -d' ')
50+
51+
echo "Mount "$MOUNT_PATH" no longer exists."
52+
echo "Stopping orphaned mount service: '$SERVICE'"
53+
54+
systemctl stop $SERVICE
55+
56+
if [ $? -ne 0 ]; then
57+
SERVICE_STOP_FAILED=1
58+
fi
59+
60+
echo ""
61+
fi
62+
done
63+
64+
if [ $SERVICE_STOP_FAILED -ne 0 ]; then
65+
echo "Not all services were stopped successfully. Please check the above output for more inforamtion."
66+
else
67+
echo "All orphaned services successfully stopped."
68+
fi
69+
70+
# Clean the mounted volumes.
71+
#
72+
73+
for i in $(seq 1 40); do
74+
75+
vol="vol$i"
76+
77+
sudo umount /mnt/local-storage/$vol
78+
79+
sudo rm -rf /mnt/local-storage/$vol
80+
81+
done
82+
83+
# Reset kube
84+
#
85+
sudo apt-get purge -y kubeadm --allow-change-held-packages
86+
sudo apt-get purge -y kubectl --allow-change-held-packages
87+
sudo apt-get purge -y kubelet --allow-change-held-packages
88+
sudo apt-get purge -y kubernetes-cni --allow-change-held-packages
89+
sudo apt-get purge -y kube* --allow-change-held-packages
90+
sudo apt -y autoremove
91+
sudo rm -rf ~/.kube
92+
93+
# Clean up working folders.
94+
#
95+
export AZUREARCDATACONTROLLER_DIR=aadatacontroller
96+
if [ -d "$AZUREARCDATACONTROLLER_DIR" ]; then
97+
echo "Removing working directory $AZUREARCDATACONTROLLER_DIR."
98+
rm -f -r $AZUREARCDATACONTROLLER_DIR
99+
fi

0 commit comments

Comments
 (0)