Skip to content

Commit 305cd8a

Browse files
committed
Freshness update for dsvm-tools-data-platforms.md . . .
1 parent b301461 commit 305cd8a

File tree

1 file changed

+29
-29
lines changed

1 file changed

+29
-29
lines changed

articles/machine-learning/data-science-virtual-machine/dsvm-tools-data-platforms.md

Lines changed: 29 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -9,85 +9,87 @@ ms.service: data-science-vm
99
author: jesscioffi
1010
ms.author: jcioffi
1111
ms.topic: conceptual
12-
ms.date: 10/04/2022
12+
ms.reviewer: franksolomon
13+
ms.date: 04/16/2024
1314

1415
---
1516

1617
# Data platforms supported on the Data Science Virtual Machine
1718

18-
With a Data Science Virtual Machine (DSVM), you can build your analytics against a wide range of data platforms. In addition to interfaces to remote data platforms, the DSVM provides a local instance for rapid development and prototyping.
19+
With a Data Science Virtual Machine (DSVM), you can build your analytics resources against a wide range of data platforms. In addition to interfaces to remote data platforms, the DSVM provides a local instance for rapid development and prototyping.
1920

20-
The following data platform tools are supported on the DSVM.
21+
The DSVM supports these data platform tools:
2122

2223
## SQL Server Developer Edition
2324

2425
| Category | Value |
2526
| ------------- | ------------- |
2627
| What is it? | A local relational database instance |
2728
| Supported DSVM editions | Windows 2019, Linux (SQL Server 2019) |
28-
| Typical uses | <ul><li>Rapid development locally with smaller dataset</li><li>Run In-database R</li></ul> |
29-
| Links to samples | <ul><li>A small sample of a New York City dataset is loaded into the SQL database:<br/> `nyctaxi`</li><li>Jupyter sample showing Microsoft Machine Learning Server and in-database analytics can be found at:<br/> `~notebooks/SQL_R_Services_End_to_End_Tutorial.ipynb`</li></ul> |
29+
| Typical uses | <ul><li>Rapid local development, with a smaller dataset</li><li>Run In-database R</li></ul> |
30+
| Links to samples | <ul><li>A small sample of a New York City dataset is loaded into the SQL database:<br/> `nyctaxi`</li><li>Find a Jupyter sample that shows Microsoft Machine Learning Server and in-database analytics at:<br/> `~notebooks/SQL_R_Services_End_to_End_Tutorial.ipynb`</li></ul> |
3031
| Related tools on the DSVM | <ul><li>SQL Server Management Studio</li><li>ODBC/JDBC drivers</li><li>pyodbc, RODBC</li></ul> |
3132

3233
> [!NOTE]
3334
> SQL Server Developer Edition can be used only for development and test purposes. You need a license or one of the SQL Server VMs to run it in production.
3435
3536
> [!NOTE]
36-
> Support for Machine Learning Server Standalone will end July 1, 2021. We will remove it from the DSVM images after
37-
> June, 30. Existing deployments will continue to have access to the software but due to the reached support end date,
38-
> there will be no support for it after July 1, 2021.
37+
> Support for Machine Learning Server Standalone ended on July 1, 2021. We will remove it from the DSVM images after
38+
> June 30. Existing deployments will continue to have access to the software but due to the reached support end date,
39+
> support for it ended after July 1, 2021.
3940
4041
> [!NOTE]
41-
> We will remove SQL Server Developer Edition from DSVM images by end of November, 2021. Existing deployments will continue to have SQL Server Developer Edition installed. In new deployemnts, if you would like to have access to SQL Server Developer Edition you can install and use via Docker support see [Quickstart: Run SQL Server container images with Docker](/sql/linux/quickstart-install-connect-docker?view=sql-server-ver15&pivots=cs1-bash&preserve-view=true)
42+
> We will remove SQL Server Developer Edition from DSVM images by end of November, 2021. Existing deployments will continue to have SQL Server Developer Edition installed. In new deployemnts, if you would like to have access to the SQL Server Developer Edition, you can install and use the SQL Server Developer Edition via Docker support. Visit [Quickstart: Run SQL Server container images with Docker](/sql/linux/quickstart-install-connect-docker?view=sql-server-ver15&pivots=cs1-bash&preserve-view=true) for more information.
4243
4344
### Windows
4445

4546
#### Setup
4647

47-
The database server is already preconfigured and the Windows services related to SQL Server (like `SQL Server (MSSQLSERVER)`) are set to run automatically. The only manual step involves enabling In-database analytics by using Microsoft Machine Learning Server. You can enable analytics by running the following command as a one-time action in SQL Server Management Studio (SSMS). Run this command after you log in as the machine administrator, open a new query in SSMS, and make sure the selected database is `master`:
48+
The database server is already preconfigured, and the Windows services related to SQL Server (for example, `SQL Server (MSSQLSERVER)`) are set to run automatically. The only manual step involves enabling in-database analytics through use of Microsoft Machine Learning Server. Run the following command to enable analytics as a one-time action in SQL Server Management Studio (SSMS). Run this command after you log in as the machine administrator, open a new query in SSMS, and select the `master` database:
4849

4950
```sql
5051
CREATE LOGIN [%COMPUTERNAME%\SQLRUserGroup] FROM WINDOWS
5152
```
5253

5354
(Replace %COMPUTERNAME% with your VM name.)
5455

55-
To run SQL Server Management Studio, you can search for "SQL Server Management Studio" on the program list, or use Windows Search to find and run it. When prompted for credentials, select **Windows Authentication** and use the machine name or ```localhost``` in the **SQL Server Name** field.
56+
To run SQL Server Management Studio, you can search for "SQL Server Management Studio" on the program list, or use Windows Search to find and run it. When prompted for credentials, select **Windows Authentication**, and use either the machine name or ```localhost``` in the **SQL Server Name** field.
5657

5758
#### How to use and run it
5859

5960
By default, the database server with the default database instance runs automatically. You can use tools like SQL Server Management Studio on the VM to access the SQL Server database locally. Local administrator accounts have admin access on the database.
6061

61-
Also, the DSVM comes with ODBC and JDBC drivers to talk to SQL Server, Azure SQL databases, and Azure Synapse Analytics from applications written in multiple languages, including Python and Machine Learning Server.
62+
Additionally, the DSVM comes with ODBC and JDBC drivers to talk to
63+
- SQL Server
64+
- Azure SQL databases
65+
- Azure Synapse Analytics
66+
resources from applications written in multiple languages, including Python and Machine Learning Server.
6267

63-
#### How is it configured and installed on the DSVM?
64-
65-
SQL Server is installed in the standard way. It can be found at `C:\Program Files\Microsoft SQL Server`. The In-database Machine Learning Server instance is found at `C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES`. The DSVM also has a separate standalone Machine Learning Server instance, which is installed at `C:\Program Files\Microsoft\R Server\R_SERVER`. These two Machine Learning Server instances don't share libraries.
68+
#### How is it configured and installed on the DSVM?
6669

70+
SQL Server is installed in the standard way. You can find it at `C:\Program Files\Microsoft SQL Server`. You can find the In-database Machine Learning Server instance at `C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES`. The DSVM also has a separate standalone Machine Learning Server instance, installed at `C:\Program Files\Microsoft\R Server\R_SERVER`. These two Machine Learning Server instances don't share libraries.
6771

6872
### Ubuntu
6973

70-
To use SQL Server Developer Edition on an Ubuntu DSVM, you need to install it first. [Quickstart: Install SQL Server and create a database on Ubuntu](/sql/linux/quickstart-install-connect-ubuntu) tells you how.
71-
72-
74+
You must first install SQL Server Developer Edition on an Ubuntu DSVM before you use it. Visit [Quickstart: Install SQL Server and create a database on Ubuntu](/sql/linux/quickstart-install-connect-ubuntu) for more information.
7375

7476
## Apache Spark 2.x (Standalone)
7577

7678
| Category | Value |
7779
| ------------- | ------------- |
7880
| What is it? | A standalone (single node in-process) instance of the popular Apache Spark platform; a system for fast, large-scale data processing and machine-learning |
7981
| Supported DSVM editions | Linux |
80-
| Typical uses | <ul><li>Rapid development of Spark/PySpark applications locally with a smaller dataset and later deployment on large Spark clusters such as Azure HDInsight</li><li>Test Microsoft Machine Learning Server Spark context</li><li>Use SparkML or Microsoft's open-source [MMLSpark](https://github.com/Azure/mmlspark) library to build ML applications</li></ul> |
82+
| Typical uses | <ul><li>Rapid development of Spark/PySpark applications locally with a smaller dataset, and later deployment on large Spark clusters such as Azure HDInsight</li><li>Test Microsoft Machine Learning Server Spark context</li><li>Use SparkML or the Microsoft open-source [MMLSpark](https://github.com/Azure/mmlspark) library to build ML applications</li></ul> |
8183
| Links to samples | Jupyter sample:<ul><li>~/notebooks/SparkML/pySpark</li><li>~/notebooks/MMLSpark</li></ul><p>Microsoft Machine Learning Server (Spark context): /dsvm/samples/MRS/MRSSparkContextSample.R</p> |
8284
| Related tools on the DSVM | <ul><li>PySpark, Scala</li><li>Jupyter (Spark/PySpark Kernels)</li><li>Microsoft Machine Learning Server, SparkR, Sparklyr</li><li>Apache Drill</li></ul> |
8385

8486
### How to use it
85-
You can submit Spark jobs on the command line by running the `spark-submit` or `pyspark` command. You can also create a Jupyter notebook by creating a new notebook with the Spark kernel.
87+
You can run the `spark-submit` or `pyspark` command to submit Spark jobs on the command line. You can also create a new notebook with the Spark kernel to create a Jupyter notebook.
8688

87-
You can use Spark from R by using libraries like SparkR, Sparklyr, and Microsoft Machine Learning Server, which are available on the DSVM. See pointers to samples in the preceding table.
89+
To use Spark from R, you use libraries like SparkR, Sparklyr, and Microsoft Machine Learning Server, which are available on the DSVM. See links to samples in the preceding table.
8890

8991
### Setup
90-
Before running in a Spark context in Microsoft Machine Learning Server on Ubuntu Linux DSVM edition, you must complete a one-time setup step to enable a local single node Hadoop HDFS and Yarn instance. By default, Hadoop services are installed but disabled on the DSVM. To enable them, run the following commands as root the first time:
92+
Before you run in a Spark context in Microsoft Machine Learning Server on Ubuntu Linux DSVM edition, you must complete a one-time setup step to enable a local single node Hadoop HDFS and Yarn instance. By default, Hadoop services are installed but disabled on the DSVM. To enable them, run these commands as root the first time:
9193

9294
```bash
9395
echo -e 'y\n' | ssh-keygen -t rsa -P '' -f ~hadoop/.ssh/id_rsa
@@ -99,19 +101,17 @@ chown hadoop:hadoop ~hadoop/.ssh/authorized_keys
99101
systemctl start hadoop-namenode hadoop-datanode hadoop-yarn
100102
```
101103

102-
You can stop the Hadoop-related services when you no longer need them by running ```systemctl stop hadoop-namenode hadoop-datanode hadoop-yarn```.
103-
104-
A sample that demonstrates how to develop and test MRS in a remote Spark context (which is the standalone Spark instance on the DSVM) is provided and available in the `/dsvm/samples/MRS` directory.
104+
To stop the Hadoop-related services when you no longer need them, run ```systemctl stop hadoop-namenode hadoop-datanode hadoop-yarn```.
105105

106+
A sample that demonstrates how to develop and test MRS in a remote Spark context (the standalone Spark instance on the DSVM) is provided and available in the `/dsvm/samples/MRS` directory.
106107

107108
### How is it configured and installed on the DSVM?
108109
|Platform|Install Location ($SPARK_HOME)|
109110
|:--------|:--------|
110111
|Linux | /dsvm/tools/spark-X.X.X-bin-hadoopX.X|
111112

113+
Libraries to access data from Azure Blob storage or Azure Data Lake Storage, using the Microsoft MMLSpark machine-learning libraries, are preinstalled in $SPARK_HOME/jars. These JARs are automatically loaded when Spark launches. By default, Spark uses data located on the local disk.
112114

113-
Libraries to access data from Azure Blob storage or Azure Data Lake Storage, using the Microsoft MMLSpark machine-learning libraries, are preinstalled in $SPARK_HOME/jars. These JARs are automatically loaded when Spark starts up. By default, Spark uses data on the local disk.
114-
115-
For the Spark instance on the DSVM to access data stored in Blob storage or Azure Data Lake Storage, you must create and configure the `core-site.xml` file based on the template found in $SPARK_HOME/conf/core-site.xml.template. You must also have the appropriate credentials to access Blob storage and Azure Data Lake Storage. (Note that the template files use placeholders for Blob storage and Azure Data Lake Storage configurations.)
115+
The Spark instance on the DSVM can access data stored in Blob storage or Azure Data Lake Storage. You must first create and configure the `core-site.xml` file, based on the template found in $SPARK_HOME/conf/core-site.xml.template. You must also have the appropriate credentials to access Blob storage and Azure Data Lake Storage. The template files use placeholders for Blob storage and Azure Data Lake Storage configurations.
116116

117-
For more detailed info about creating Azure Data Lake Storage service credentials, see [Authentication with Azure Data Lake Storage Gen1](../../data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory.md). After the credentials for Blob storage or Azure Data Lake Storage are entered in the core-site.xml file, you can reference the data stored in those sources through the URI prefix of wasb:// or adl://.
117+
For more information about creation of Azure Data Lake Storage service credentials, visit [Authentication with Azure Data Lake Storage Gen1](../../data-lake-store/data-lake-store-service-to-service-authenticate-using-active-directory.md). After you enter the credentials for Blob storage or Azure Data Lake Storage in the core-site.xml file, you can reference the data stored in those sources through the URI prefix of wasb:// or adl://.

0 commit comments

Comments
 (0)