microsoft
diff --git a/‎samples/features/azure-arc-data-controller/README.md‎
Lines changed: 59 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/README.md‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/README.md‎
Lines changed: 16 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/README.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/aks/README.md‎
Lines changed: 49 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/aks/README.md‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/aks/deploy-controller-aks.py‎
Lines changed: 102 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/aks/deploy-controller-aks.py‎
Lines changed: 102 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/kubeadm/README.md‎
Lines changed: 14 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/kubeadm/README.md‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/kubeadm/ubuntu-single-node-vm/README.md‎
Lines changed: 53 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/kubeadm/ubuntu-single-node-vm/README.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎samples/features/azure-arc-data-controller/deployment/kubeadm/ubuntu-single-node-vm/cleanup-controller.sh‎
Lines changed: 99 additions & 0 deletions b/‎samples/features/azure-arc-data-controller/deployment/kubeadm/ubuntu-single-node-vm/cleanup-controller.sh‎
Lines changed: 99 additions & 0 deletions
@@ -0,0 +1,59 @@
+# Azure Arc Data Controller clusters
+
+Installation instructions for SQL Server 2019 big data clusters can be found [here](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-guidance?view=sql-server-ver15).
+
+## Samples Setup
+
+**Before you begin**, load the sample data into your big data cluster. For instructions, see [Load sample data into a SQL Server 2019 big data cluster](https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data).
+
+## Executing the sample scripts
+The scripts should be executed in a specific order to test the various features. Execute the scripts from each folder in below order:
+
+1. __[spark/data-loading/transform-csv-files.ipynb](spark/data-loading/transform-csv-files.ipynb)__
+1. __[data-virtualization/generic-odbc](data-virtualization/generic-odbc)__
+1. __[data-virtualization/hadoop](data-virtualization/hadoop)__
+1. __[data-virtualization/storage-pool](data-virtualization/storage-pool)__
+1. __[data-virtualization/oracle](data-virtualization/oracle)__
+1. __[data-pool](data-pool/)__
+1. __[machine-learning/sql/r](machine-learning/sql/r)__
+1. __[machine-learning/sql/python](machine-learning/sql/python)__
+
+## __[data-pool](data-pool/)__
+
+SQL Server 2019 big data cluster contains a data pool which consists of many SQL Server instances to store data & query in a scale-out manner.
+
+### Data ingestion using Spark
+The sample script [data-pool/data-ingestion-spark.sql](data-pool/data-ingestion-spark.sql) shows how to perform data ingestion from Spark into data pool table(s).
+
+### Data ingestion using sql
+The sample script [data-pool/data-ingestion-sql.sql](data-pool/data-ingestion-sql.sql) shows how to perform data ingestion from T-SQL into data pool table(s).
+
+## __[data-virtualization](data-virtualization/)__
+
+SQL Server 2019 or SQL Server 2019 big data cluster can use PolyBase external tables to connect to other data sources. 
+
+### External table over Generic ODBC data source
+The [data-virtualization/generic-odbc](data-virtualization/generic-odbc) folder contains samples that demonstrate how to query data in MySQL & PostgreSQL using external tables and generic ODBC data source. The generic ODBC data soruce can be used only in SQL Server 2019 on Windows.
+
+### External table over Hadoop
+The [data-virtualization/hadoop](data-virtualization/hadoop) folder contains samples that demonstrate how to query data in HDFS using external tables. This demonstrates the functionality available from SQL Server 2016 using the HADOOP data source.
+
+### External table over Oracle
+The [data-virtualization/oracle](data-virtualization/oracle) folder contains samples that demonstrate how to query data in Oracle using external tables.
+
+### External table over Storage Pool
+SQL Server 2019 big data cluster contains a storage pool consisting of HDFS, Spark and SQL Server instances. The [data-virtualization/storage-pool](data-virtualization/storage-pool) folder contains samples that demonstrate how to query data in HDFS inside SQL Server 2019 big data cluster.
+
+## __[deployment](deployment/)__
+
+The [deployment](deployment) folder contains the scripts for deploying a Kubernetes cluster for SQL Server 2019 big data cluster.
+
+## __[machine-learning](machine-learning/)__
+
+SQL Server 2016 added support executing R scripts from T-SQL. SQL Server 2017 added support for executing Python scripts from T-SQL. SQL Server 2019 adds support for executing Java code from T-SQL. SQL Server 2019 big data cluster adds support for executing Spark code inside the big data cluster.
+
+### SQL Server Machine Learning Services
+The [machine-learning\sql](machine-learning\sql) folder contains the sample SQL scripts that show how to invoke R, Python, and Java code from T-SQL.
+
+### Spark Machine Learning
+The [machine-learning\spark](machine-learning\spark) folder contains the Spark samples.
@@ -0,0 +1,16 @@
+
+# Creating a Kubernetes cluster for SQL Server 2019 big data cluster
+
+SQL Server 2019 big data cluster is deployed as docker containers on a Kubernetes cluster. These samples provide scripts that can be used to provision a Kubernetes clusters using different environments.
+
+## __[Deploy a Kubernetes cluster using kubeadm](kubeadm/)__
+
+Use the scripts in the **kubeadm** folder to deploy a Kubernetes cluster over one or more Linux machines (physical or virtualized) using `kubeadm` utility.
+
+## __[Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS)](aks/)__
+
+Using the sample Python script in **aks** folder, you will deploy a Kubernetes cluster in Azure using AKS and a SQL Server big data cluster using on top of it.
+
+## __[Push SQL Server big data cluster images to your own private Docker repository](offline/)__
+
+Using the sample Python script in **offline** folder, you will push the necessary images required for the deployment to your own repository.
@@ -0,0 +1,49 @@
+
+# Deploy a SQL Server big data cluster on Azure Kubernetes Service (AKS) 
+
+Using this sample Python script, you will deploy a Kubernetes cluster in Azure using AKS and a SQL Server big data cluster using this AKS cluster as its environment. The script can be run from any client OS.
+
+
+## Pre-requisites
+
+1. Install latest version of [az cli](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
+1. Running the script will require: [python minimum version 3.0](https://www.python.org/downloads)
+1. Install the latest version of [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
+1. Ensure you have installed `mssqlctl` CLI and its prerequisites:
+    - Install [pip3](https://pip.pypa.io/en/stable/installing/).
+    - Install/update requests package. Run the command below using elevated priviledges (sudo or admin cmd window):
+        ```
+        python -m pip install requests
+        python -m pip install requests --upgrade
+        ```
+    - Install latest version of the cluster management tool **azdata** (previously named mssqlctl) using below command. Run the command below using elevated priviledges (sudo or admin cmd window):
+        ```
+        pip3 install -r https://aka.ms/azdata
+        ```
+1. Login into your Azure account. Run this command:
+```
+az login
+```
+
+## Instructions
+
+Run the script using:
+```
+python deploy-sql-big-data-aks.py
+```
+
+>**Note**
+>
+>If you have both python3 and python2 on your client machine and in the path, you will have to run the command using python3:
+>```
+>python3 deploy-sql-big-data-aks.py
+>```
+
+
+When prompted, provide your input for Azure subscription ID, Azure resource group to create the resources in, and Docker credentials. Optionally, you can also provide your input for below configurations or use the defaults provided:
+- azure_region
+- vm_size - we recommend to use a VM size to accommodate your workload. For an optimal experience while you are validating basic scenarios, we recommend at least 8 vCPUs and 64GB memory across all agent nodes in the cluster. The script uses **Standard_L8s** as default. A default size configuration also uses about 24 disks for persistent volume claims across all components.
+- aks_node_count - this is the number of the worker nodes for the AKS cluster, excluding master node. The script is using a default of 1 agent node. This is the minimum required for this VM size to have enough resources and disks to provision all the necessary persistent volumes.
+- cluster_name - this value is used for both AKS cluster and SQL big data cluster created on top of AKS. Note that the name of the SQL big data cluster is going to be a Kubernetes namespace
+- password - same value is going to be used for all accounts that require user password input: SQL Server master instance account created for the below **username**, controller user and Knox **root** user
+- username - this is the username for the accounts provisioned during deployment for the controller admin account and SQL Server master instance account. Note that **sa** SQL Server account is disabled automatically for you, as a best practice. Username for Knox gateway account is going to be **root**.
@@ -0,0 +1,102 @@
+#
+# Prerequisites: 
+# 
+# Azure CLI (https://docs.microsoft.com/en-us/cli/azure/install-azure-cli), python3 (https://www.python.org/downloads), azdata CLI (pip3 install -r https://aka.ms/azdata)
+#
+# Run `az login` at least once BEFORE running this script
+#
+
+from subprocess import check_output, CalledProcessError, STDOUT, Popen, PIPE
+import os
+import getpass
+
+def executeCmd (cmd):
+    if os.name=="nt":
+        process = Popen(cmd.split(),stdin=PIPE, shell=True)
+    else:
+        process = Popen(cmd.split(),stdin=PIPE)
+    stdout, stderr = process.communicate()
+    if (stderr is not None):
+        raise Exception(stderr)
+
+#
+# MUST INPUT THESE VALUES!!!!!
+#
+SUBSCRIPTION_ID = input("Provide your Azure subscription ID:").strip()
+GROUP_NAME = input("Provide Azure resource group name to be created:").strip()
+# Use this only if you are using a private registry different than default Micrososft registry (mcr). 
+#DOCKER_USERNAME = input("Provide your Docker username:").strip()
+#DOCKER_PASSWORD  = getpass.getpass("Provide your Docker password:").strip()
+
+#
+# Optionally change these configuration settings
+#
+AZURE_REGION=input("Provide Azure region - Press ENTER for using `westus`:").strip() or "westus"
+VM_SIZE=input("Provide VM size for the AKS cluster - Press ENTER for using  `Standard_L8s`:").strip() or "Standard_L8s"
+AKS_NODE_COUNT=input("Provide number of worker nodes for AKS cluster - Press ENTER for using  `1`:").strip() or "1"
+
+#This is both Kubernetes cluster name and SQL Big Data cluster name
+CLUSTER_NAME=input("Provide name of AKS cluster and SQL big data cluster - Press ENTER for using  `sqlbigdata`:").strip() or "sqlbigdata"
+
+#This password will be use for Controller user, Knox user and SQL Server Master SA accounts
+# 
+AZDATA_USERNAME=input("Provide username to be used for Controller and SQL Server master accounts - Press ENTER for using  `admin`:").strip() or "admin"
+AZDATA_PASSWORD = getpass.getpass("Provide password to be used for Controller user, Knox user (root) and SQL Server Master accounts - Press ENTER for using  `MySQLBigData2019`").strip() or "MySQLBigData2019"
+
+# Docker registry details
+# Use this only if you are using a private registry different than mcr. If so, make sure you are also setting the environment variables for DOCKER_USERNAME and DOCKER_PASSWORD
+# DOCKER_REGISTRY="<your private registry>"
+# DOCKER_REPOSITORY="<your private repository>"
+# DOCKER_IMAGE_TAG="<your Docker image tag>"
+
+print ('Setting environment variables')
+os.environ['AZDATA_PASSWORD'] = AZDATA_PASSWORD
+os.environ['AZDATA_USERNAME'] = AZDATA_USERNAME
+# Use this only if you are using a private registry different than mcr. If so, you must set the environment variables for DOCKER_USERNAME and DOCKER_PASSWORD
+# os.environ['DOCKER_USERNAME']=DOCKER_USERNAME
+# os.environ['DOCKER_PASSWORD']=DOCKER_PASSWORD
+os.environ['ACCEPT_EULA']="Yes"
+
+print ("Set azure context to subcription: "+SUBSCRIPTION_ID)
+command = "az account set -s "+ SUBSCRIPTION_ID
+executeCmd (command)
+
+print ("Creating azure resource group: "+GROUP_NAME)
+command="az group create --name "+GROUP_NAME+" --location "+AZURE_REGION
+executeCmd (command)
+
+print("Creating AKS cluster: "+CLUSTER_NAME)
+command = "az aks create --name "+CLUSTER_NAME+" --resource-group "+GROUP_NAME+" --generate-ssh-keys --node-vm-size "+VM_SIZE+" --node-count "+AKS_NODE_COUNT
+executeCmd (command)
+
+command = "az aks get-credentials --overwrite-existing --name "+CLUSTER_NAME+" --resource-group "+GROUP_NAME+" --admin"
+executeCmd (command)
+
+print("Creating SQL Big Data cluster:" +CLUSTER_NAME)
+command="azdata bdc config init --source aks-dev-test --target custom --force"
+executeCmd (command)
+
+command="azdata bdc config replace -c custom/bdc.json -j ""metadata.name=" + CLUSTER_NAME + ""
+executeCmd (command)
+
+# Use this only if you are using a private registry different than default Micrososft registry (mcr). 
+# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.registry=" + DOCKER_REGISTRY + ""
+# executeCmd (command)
+
+# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.repository=" + DOCKER_REPOSITORY + ""
+# executeCmd (command)
+
+# command="azdata bdc config replace -c custom/control.json -j ""$.spec.controlPlane.spec.docker.imageTag=" + DOCKER_IMAGE_TAG + ""
+# executeCmd (command)
+
+command="azdata bdc create -c custom --accept-eula yes"
+executeCmd (command)
+
+command="azdata login -n " + CLUSTER_NAME
+executeCmd (command)
+
+print("")
+print("SQL Server big data cluster endpoints: ")
+command="azdata bdc endpoint list -o table"
+executeCmd(command)
+
@@ -0,0 +1,14 @@
+# Create a Kubernetes cluster using Kubeadm on Ubuntu 16.04 LTS or 18.04 LTS
+
+
+## __[ubuntu](ubuntu/)__
+
+This folder contains scripts that provide a template for deploying a Kubernetes cluster using kubeadm on one or more Linux machines.
+
+## __[ubuntu-single-node-vm](ubuntu-single-node-vm/)__
+
+This folder contains a sample script that can be used to create a single-node Kubernetes cluster on a Linux machine and deploy SQL Server big data cluster.
+
+## __[ubuntu-single-node-vm-ad](ubuntu-single-node-vm-ad/)__
+
+This folder contains a sample script that can be used to create a single-node Kubernetes cluster on a Linux machine and deploy SQL Server big data cluster with Active Directory integration.
@@ -0,0 +1,53 @@
+
+# Deploy a SQL Server big data cluster on single node Kubernetes cluster (kubeadm)
+
+Using this sample bash script, you will deploy a single node Kubernetes cluster using  kubeadm and a SQL Server big data cluster on top of it. The script must be run from the VM you are planning to use for your kubeadm deployment.
+
+## Pre-requisites
+
+1. A vanilla Ubuntu 16.04 or 18.04 virtual or physical machine. All dependencies will be setup by the script. Using Azure Linux VMs is not yet supported.
+1. Machine should have at least 8 CPUs, 64GB RAM and 100GB disk space. After installing the images you will be left with 50GB for data/logs across all components.
+1. Update existing packages using commands below to ensure that the OS image is up to date
+
+``` bash
+sudo apt update&&apt upgrade -y
+sudo systemctl reboot
+```
+
+## Recommended Virtual Machine settings
+
+1. Use static memory configuration for the virtual machine. For example, in hyper-v installations do not use dynamic memory allocation but instead allocate the recommended 64 GB or higher.
+
+1. Use checkpoint or snapshot capability in your hyper visor so that you can rollback the virtual machine to a clean state.
+
+## Instructions to deploy SQL Server big data cluster
+
+1. Download the script on the VM you are planning to use for the deployment
+
+``` bash
+curl --output setup-bdc.sh https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu-single-node-vm/setup-bdc.sh
+```
+
+2. Make the script executable
+
+``` bash
+chmod +x setup-bdc.sh
+```
+
+3. Run the script (make sure you are running with sudo)
+
+``` bash
+sudo ./setup-bdc.sh
+```
+
+4. Refresh alias setup for azdata
+
+``` bash
+source ~/.bashrc
+```
+
+When prompted, provide your input for the password that will be used for all external endpoints: controller, SQL Server master and gateway. The password should be sufficiently complex based on existing rules for SQL Server password. The controller username is defaulted to *admin*.
+
+## Cleanup
+
+1. The [cleanup-bdc.sh](cleanup-bdc.sh/) script is provided as convenience to reset the environment in case of errors. However, we recommend that you use a virtual machine for testing purposes and use the snapshot capability in your hyper-visor to rollback the virtual machine to a clean state.
@@ -0,0 +1,99 @@
+#!/bin/bash
+
+if [ "$EUID" -ne 0 ]
+  then echo "Please run as root"
+  exit
+fi
+DIR_PREFIX=$1
+
+kubeadm reset --force
+
+# Clean up azdata-cli package.
+#
+unalias azdata
+sudo dpkg --remove --force-all azdata-cli
+
+systemctl stop kubelet
+rm -rf /var/lib/cni/
+rm -rf /var/lib/etcd/
+rm -rf /run/flannel/
+rm -rf /var/lib/kubelet/*
+rm -rf /etc/cni/
+rm -rf /etc/kubernetes/
+
+ip link set cni0 down
+#brctl delbr cni0
+ip link set flannel.1 down
+#brctl delbr flannel.1
+iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
+
+rm -rf .azdata/
+rm -rf bdcdeploy/
+
+# Remove mounts.
+#
+SERVICE_STOP_FAILED=0
+
+systemctl | grep "/var/lib/kubelet/pods" | while read -r line; do
+
+    # Retrieve the mount path
+    #
+    MOUNT_PATH=`echo "$line"  | grep -v echo | egrep -oh -m 1 "(/var/lib/kubelet/pods).+"`
+
+    if [ -z "$MOUNT_PATH" ]; then
+        continue
+    fi
+
+    if [[ ! -d "$MOUNT_PATH" ]] && [[ ! -f "$MOUNT_PATH" ]]; then
+
+        SERVICE=$(echo $line | cut -f1 -d' ')
+
+        echo "Mount "$MOUNT_PATH" no longer exists."
+        echo "Stopping orphaned mount service: '$SERVICE'"
+
+        systemctl stop $SERVICE
+
+        if [ $? -ne 0 ]; then
+            SERVICE_STOP_FAILED=1
+        fi
+
+        echo ""
+    fi
+done
+
+if [ $SERVICE_STOP_FAILED -ne 0 ]; then
+    echo "Not all services were stopped successfully. Please check the above output for more inforamtion."
+else
+    echo "All orphaned services successfully stopped."
+fi
+
+# Clean the mounted volumes.
+#
+
+for i in $(seq 1 40); do
+
+  vol="vol$i"
+
+  sudo umount /mnt/local-storage/$vol
+
+  sudo rm -rf /mnt/local-storage/$vol
+
+done
+
+# Reset kube
+#
+sudo apt-get purge -y kubeadm --allow-change-held-packages 
+sudo apt-get purge -y kubectl --allow-change-held-packages
+sudo apt-get purge -y kubelet --allow-change-held-packages
+sudo apt-get purge -y kubernetes-cni --allow-change-held-packages
+sudo apt-get purge -y kube* --allow-change-held-packages
+sudo apt -y autoremove
+sudo rm -rf ~/.kube
+
+# Clean up working folders.
+# 
+export AZUREARCDATACONTROLLER_DIR=aadatacontroller
+if [ -d "$AZUREARCDATACONTROLLER_DIR" ]; then
+    echo "Removing working directory $AZUREARCDATACONTROLLER_DIR."
+    rm -f -r $AZUREARCDATACONTROLLER_DIR
+fi