Skip to content

Commit 315ebf5

Browse files
committed
Freshness update for reference-known-issues.md . . .
1 parent 557f3ab commit 315ebf5

File tree

1 file changed

+47
-70
lines changed

1 file changed

+47
-70
lines changed

articles/machine-learning/data-science-virtual-machine/reference-known-issues.md

Lines changed: 47 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -8,100 +8,86 @@ ms.service: data-science-vm
88
author: michalmar
99
ms.author: mimarusa
1010
ms.topic: reference
11-
ms.date: 08/02/2021
12-
11+
ms.reviewer: franksolomon
12+
ms.date: 04/29/2024
1313
---
1414

15-
# Known issues and troubleshooting the Azure Data Science Virtual Machine
16-
17-
This article helps you find and correct errors or failures you might come across when using the Azure Data Science
18-
Virtual Machine.
15+
# Troubleshooting issues with the Azure Data Science Virtual Machine
1916

17+
This article explains how to find and correct errors or failures you might come across when using the Azure Data Science Virtual Machine.
2018

2119
## Ubuntu
2220

23-
### Fix GPU on NVIDIA A100 GPU Chip - Azure NDasrv4 Series
21+
### Fix GPU on NVIDIA A100 GPU Chip - Azure NDasrv4 Series
2422

25-
The ND A100 v4 series virtual machine is a new flagship addition to the Azure GPU family, designed for high-end Deep Learning training and tightly-coupled scale-up and scale-out HPC workloads.
23+
The ND A100 v4 series virtual machine is a flagship addition to the Azure GPU family. It handles high-end Deep Learning training and tightly coupled, scaled up, and scaled out HPC workloads.
2624

27-
Due to different architecture it requires different setup for your high-demanding workloads to benefit from GPU acceleration using TensorFlow or PyTorch frameworks.
25+
Because of its unique architecture, it needs a different setup for high-demand workloads, to benefit from GPU acceleration using TensorFlow or PyTorch frameworks.
2826

29-
We are working towards supporting the ND A100 machines GPUs out-of-the-box. Meanwhile you can make your GPU working by adding NVIDIA's Fabric Manager and updating drivers.
27+
We're building out-of-the-box support for ND A100 machines GPUs. Meanwhile, your GPU can handle Ubuntu if you add the NVIDIA Fabric Manager, and update the drivers. Follow these steps at the terminal:
3028

31-
Follow these simple steps while in Terminal:
32-
33-
1. Add NVIDIA's repository to install/update drivers - step-by-step instructions can be found [here](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
34-
2. [OPTIONAL] You can also update your CUDA drivers (from repository above)
35-
3. Install NVIDIA's Fabric Manager drivers:
29+
1. Add the NVIDIA repository to install or update drivers - find step-by-step instructions at [this resource](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
30+
2. [OPTIONAL] You can also update your CUDA drivers, from that repository
31+
3. Install the NVIDIA Fabric Manager drivers:
3632

3733
```
3834
sudo apt-get install cuda-drivers-460
3935
sudo apt-get install cuda-drivers-fabricmanager-460
4036
```
4137
42-
4. Reboot your VM (to get your drivers ready)
43-
5. Enable and start newly installed NVIDIA Fabric Manager service:
38+
4. Reboot your VM (to prepare the drivers)
39+
5. Enable and launch the newly installed NVIDIA Fabric Manager service:
4440
4541
```
4642
sudo systemctl enable nvidia-fabricmanager
4743
sudo systemctl start nvidia-fabricmanager
4844
```
4945
50-
You can now check your drivers and GPU working by running:
46+
Run this code sample to verify that your GPU and your drivers work:
5147
```
5248
systemctl status nvidia-fabricmanager.service
53-
```
49+
```
5450
55-
After which you should see Fabric Manager service running
56-
![nvidia-fabric-manager-status](./media/nvidia-fabricmanager-status-ok-marked.png)
51+
This screenshot shows the Fabric Manager service running:
5752
53+
:::image type="content" source="./media/nvidia-fabricmanager-status-ok-marked.png" alt-text="Screenshot showing the Fabric Manager service running." lightbox= "./media/nvidia-fabricmanager-status-ok-marked.png":::
5854
5955
### Connection to desktop environment fails
6056
61-
If you can connect to the DSVM over SSH terminal but not over x2go, you might have set the wrong session type in x2go.
62-
To connect to the DSVM's desktop environment, you need the session type in *x2go/session preferences/session* set to
63-
*XFCE*. Other desktop environments are currently not supported.
57+
If you can connect to the DSVM over SSH terminal, but you can't connect over x2go, x2go might have the wrong session type setting. To connect to the DSVM desktop environment, set the session type in *x2go/session preferences/session* to *XFCE*. Other desktop environments are currently not supported.
6458
6559
### Fonts look wrong when connecting to DSVM using x2go
6660
67-
When you connect to x2go and some fonts look wrong, it might be related to a session setting in x2go. Before connecting
68-
to the DSVM, uncheck the "Set display DPI" checkbox in the "Input/Output" tab of the session preferences dialog.
61+
A specific x2go session setting can cause some of the fonts look wrong when you connect to x2go. Before you connect to the DSVM, uncheck the "Set display DPI" checkbox in the "Input/Output" tab of the session preferences dialog.
6962
7063
### Prompted for unknown password
7164
72-
When you create a DSVM setting *Authentication type* to *SSH Public Key* (which is recommended over using password
73-
authentication), you will not be given a password. However, in some scenarios, some applications will still ask you for
74-
a password. Run `sudo passwd <user_name>` to create a new password for a certain user. With `sudo passwd`, you can
75-
create a new password for the root user.
65+
You can set the DSVM *Authentication type* setting to *SSH Public Key*. This is recommended, instead of password authentication. You don't receive a password if you use *SSH Public Key*. However, in some scenarios, some applications still request a password. Run `sudo passwd <user_name>` to create a new password for a specific user. With `sudo passwd`, you can create a new password for the root user.
7666
77-
Running these command will not change the configuration of SSH, and allowed sign-in mechanisms will be kept the same.
67+
Running this command doesn't change the SSH configuration, and permitted sign-in mechanisms remain the same.
7868
7969
### Prompted for password when running sudo command
8070
81-
When running a `sudo` command on an Ubuntu machine, you might be asked to enter your password again and again to confirm
82-
that you are really the user who is logged in. This behavior is expected, and it is the default in Ubuntu. However, in some scenarios, a repeated authentication is not necessary and rather annoying.
71+
When you run a `sudo` command on an Ubuntu machine, you might get a request to repeatedly enter your password to verify that you're the logged-in user. This is expected default Ubuntu behavior. However, in some situations, a repeated authentication isn't necessary and rather annoying.
8372
84-
To disable reauthentication for most cases, you can run the following command in a terminal.
73+
To disable reauthentication for most cases, you can run this command in a terminal:
8574
8675
`echo -e "\n$USER ALL=(ALL) NOPASSWD: ALL\n" | sudo tee -a /etc/sudoers`
8776
88-
After restarting the terminal, sudo will not ask for another login and will consider the authentication from your
89-
session login as sufficient.
77+
After you restart the terminal, sudo won't ask for another sign-in and it will consider the authentication from your
78+
session sign in as sufficient.
9079
91-
### Cannot use docker as non-root user
80+
### Can't use docker as nonroot user
9281
93-
In order to use docker as a non-root user, your user needs to be member of the docker group. You can run the
94-
`getent group docker` command to check which users belong to that group. To add your user to the docker group, run
95-
`sudo usermod -aG docker $USER`.
82+
To use docker as a nonroot user, your user needs membership in the docker group. The `getent group docker` command returns a list of users that belong to that group. To add your user to the docker group, run `sudo usermod -aG docker $USER`.
9683
97-
### Docker containers cannot interact with the outside via network
84+
### Docker containers can't interact with the outside via network
9885
99-
By default, docker adds new containers to the so-called "bridge network", which is `172.17.0.0/16`. If the subnet of
100-
that bridge network overlaps with the subnet of your DSVM or with another private subnet you have in your subscription,
101-
no network communication between the host and the container is possible. In that case, web applications running in the container cannot be reached, and the container cannot update packages from apt.
86+
By default, Docker adds new containers to the so-called "bridge network": `172.17.0.0/16`. The subnet of
87+
that bridge network could overlap with the subnet of your DSVM, or with another private subnet you have in your subscription. In this case, no network communication between the host and the container is possible. Additionally, web applications that run in the container can't be reached, and the container can't update packages from apt.
10288
103-
To fix the issue, you need to reconfigure docker to use an IP address space for its bridge network that does not overlap
104-
with other networks of your subscription. For example, by adding
89+
To fix the issue, you must reconfigure Docker to use an IP address space for its bridge network that doesn't overlap
90+
with other networks of your subscription. For example, if you add
10591
10692
```json
10793
"default-address-pools": [
@@ -112,43 +98,36 @@ with other networks of your subscription. For example, by adding
11298
]
11399
```
114100

115-
to the JSON document contained in file `/etc/docker/daemon.json`, docker will assign another subnet to the bridge
116-
network. (The file needs to be edited using sudo, for example by running `sudo nano /etc/docker/daemon.json`.)
101+
to the `/etc/docker/daemon.json` JSON file, Docker assigns another subnet to the bridge
102+
network. You must edit the file with sudo, for example by running `sudo nano /etc/docker/daemon.json`.
117103

118-
After the change, the docker service needs to be restarted by running `service docker restart`.
119-
120-
To check if your changes have taken effect, you can run `docker network inspect bridge`. The value under
121-
*IPAM.Config.Subnet* should correspond to the address pool specified above.
104+
After the change, run `service docker restart` to restart the Docker service. To determine whether or not your changes took effect, can run `docker network inspect bridge`. The value under *IPAM.Config.Subnet* should correspond to the address pool specified earlier.
122105

123106
### GPU(s) not available in docker container
124107

125-
The docker installed on the DSVM supports GPUs by default. However, there is a few prerequisite that must be met.
108+
The Docker resource installed on the DSVM supports GPUs by default. However, that support requires certain prerequisites.
126109

127-
* Obviously, the VM size of the DSVM has to include at least one GPU.
128-
* When starting your docker container with `docker run`, you need to add a *--gpus* parameter, for example, `--gpus all`.
129-
* VM sizes that include NVIDIA A100 GPUs need additional software packages installed, esp. the
110+
* The VM size of the DSVM must include at least one GPU.
111+
* When you start your docker container with `docker run`, you must add a *--gpus* parameter: for example, `--gpus all`.
112+
* VM sizes that include NVIDIA A100 GPUs require other software packages installed, especially the
130113
[NVIDIA Fabric Manager](https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf). These packages
131-
might not be pre-installed in your image yet.
132-
114+
might not be preinstalled in your image.
133115

134116
## Windows
135117

136118
### Virtual Machine Generation 2 (Gen 2) not working
137-
When you try to create Data Science VM based on Virtual Machine Generation 2 (Gen 2) it fails.
138-
139-
Currently, we maintain and provide images for Data Science VM based on Windows 2019 Server only for Generation 1 virtual machines. [Gen 2](../../virtual-machines/generation-2.md) are not yet supported and we plan to support them in near future.
119+
When you try to create Data Science VM based on Virtual Machine Generation 2 (Gen 2), it fails.
140120

121+
At this time, we maintain and provide images for Data Science Virtual Machines (DSVMs) based on Windows 2019 Server, only for Generation 1 DSVMs. [Gen 2](../../virtual-machines/generation-2.md) aren't yet supported, but we plan to support them in near future.
141122

142123
### Accessing SQL Server
143124

144-
When you try to connect to the pre-installed SQL Server instance, you might encounter a "login failed" error. To
145-
successfully connect to the SQL Server instance, you need to run the program you are connecting with, for example, SQL Server
146-
Management Studio (SSMS), in administrator mode. The administrator mode is required because by DSVM's default, only
147-
administrators are allowed to connect.
125+
When you try to connect to the preinstalled SQL Server instance, you might encounter a "login failed" error. To
126+
successfully connect to the SQL Server instance, you must run the program to which you want to connect - for example, SQL Server Management Studio (SSMS) - in administrator mode. The administrator mode is required because by DSVM default behavior, only administrators can connect.
148127

149-
### Hyper-V does not work
128+
### Hyper-V doesn't work
150129

151-
That Hyper-V initially doesn't work on Windows is expected behavior. For boot performance, we've disabled some services.
130+
As expected behavior, Hyper-V doesn't initially work on Windows. For best performance, we disabled some services.
152131
To enable Hyper-V:
153132

154133
1. Open the search bar on your Windows DSVM
@@ -158,6 +137,4 @@ To enable Hyper-V:
158137

159138
Your final screen should look like this:
160139

161-
162-
163-
![Enable Hyper-V](./media/workaround/hyperv-enable-dsvm.png)
140+
:::image type="content" source="./media/workaround/hyperv-enable-dsvm.png" alt-text="Screenshot showing the Hyper-V service running." lightbox= "./media/workaround/hyperv-enable-dsvm.png":::

0 commit comments

Comments
 (0)