You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/data-science-virtual-machine/reference-known-issues.md
+47-70Lines changed: 47 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,100 +8,86 @@ ms.service: data-science-vm
8
8
author: michalmar
9
9
ms.author: mimarusa
10
10
ms.topic: reference
11
-
ms.date: 08/02/2021
12
-
11
+
ms.reviewer: franksolomon
12
+
ms.date: 04/29/2024
13
13
---
14
14
15
-
# Known issues and troubleshooting the Azure Data Science Virtual Machine
16
-
17
-
This article helps you find and correct errors or failures you might come across when using the Azure Data Science
18
-
Virtual Machine.
15
+
# Troubleshooting issues with the Azure Data Science Virtual Machine
19
16
17
+
This article explains how to find and correct errors or failures you might come across when using the Azure Data Science Virtual Machine.
20
18
21
19
## Ubuntu
22
20
23
-
### Fix GPU on NVIDIA A100 GPU Chip - Azure NDasrv4 Series
21
+
### Fix GPU on NVIDIA A100 GPU Chip - Azure NDasrv4 Series
24
22
25
-
The ND A100 v4 series virtual machine is a new flagship addition to the Azure GPU family, designed for high-end Deep Learning training and tightly-coupled scale-up and scale-out HPC workloads.
23
+
The ND A100 v4 series virtual machine is a flagship addition to the Azure GPU family. It handles high-end Deep Learning training and tightlycoupled, scaled up, and scaled out HPC workloads.
26
24
27
-
Due to different architecture it requires different setup for your high-demanding workloads to benefit from GPU acceleration using TensorFlow or PyTorch frameworks.
25
+
Because of its unique architecture, it needs a different setup for high-demand workloads, to benefit from GPU acceleration using TensorFlow or PyTorch frameworks.
28
26
29
-
We are working towards supporting the ND A100 machines GPUs out-of-the-box. Meanwhile you can make your GPU working by adding NVIDIA's Fabric Manager and updating drivers.
27
+
We're building out-of-the-box support for ND A100 machines GPUs. Meanwhile, your GPU can handle Ubuntu if you add the NVIDIA Fabric Manager, and update the drivers. Follow these steps at the terminal:
30
28
31
-
Follow these simple steps while in Terminal:
32
-
33
-
1. Add NVIDIA's repository to install/update drivers - step-by-step instructions can be found [here](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
34
-
2.[OPTIONAL] You can also update your CUDA drivers (from repository above)
35
-
3. Install NVIDIA's Fabric Manager drivers:
29
+
1. Add the NVIDIA repository to install or update drivers - find step-by-step instructions at [this resource](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html#ubuntu-lts)
30
+
2.[OPTIONAL] You can also update your CUDA drivers, from that repository
This screenshot shows the Fabric Manager service running:
57
52
53
+
:::image type="content" source="./media/nvidia-fabricmanager-status-ok-marked.png" alt-text="Screenshot showing the Fabric Manager service running." lightbox= "./media/nvidia-fabricmanager-status-ok-marked.png":::
58
54
59
55
### Connection to desktop environment fails
60
56
61
-
If you can connect to the DSVM over SSH terminal but not over x2go, you might have set the wrong session type in x2go.
62
-
To connect to the DSVM's desktop environment, you need the session type in *x2go/session preferences/session* set to
63
-
*XFCE*. Other desktop environments are currently not supported.
57
+
If you can connect to the DSVM over SSH terminal, but you can't connect over x2go, x2go might have the wrong session type setting. To connect to the DSVM desktop environment, set the session type in *x2go/session preferences/session* to *XFCE*. Other desktop environments are currently not supported.
64
58
65
59
### Fonts look wrong when connecting to DSVM using x2go
66
60
67
-
When you connect to x2go and some fonts look wrong, it might be related to a session setting in x2go. Before connecting
68
-
to the DSVM, uncheck the "Set display DPI" checkbox in the "Input/Output" tab of the session preferences dialog.
61
+
A specific x2go session setting can cause some of the fonts look wrong when you connect to x2go. Before you connect to the DSVM, uncheck the "Set display DPI" checkbox in the "Input/Output" tab of the session preferences dialog.
69
62
70
63
### Prompted for unknown password
71
64
72
-
When you create a DSVM setting *Authentication type* to *SSH Public Key* (which is recommended over using password
73
-
authentication), you will not be given a password. However, in some scenarios, some applications will still ask you for
74
-
a password. Run `sudo passwd <user_name>` to create a new password for a certain user. With `sudo passwd`, you can
75
-
create a new password for the root user.
65
+
You can set the DSVM *Authentication type* setting to *SSH Public Key*. This is recommended, instead of password authentication. You don't receive a password if you use *SSH Public Key*. However, in some scenarios, some applications still request a password. Run `sudo passwd <user_name>` to create a new password for a specific user. With `sudo passwd`, you can create a new password for the root user.
76
66
77
-
Running these command will not change the configuration of SSH, and allowed sign-in mechanisms will be kept the same.
67
+
Running this command doesn't change the SSH configuration, and permitted sign-in mechanisms remain the same.
78
68
79
69
### Prompted for password when running sudo command
80
70
81
-
When running a `sudo` command on an Ubuntu machine, you might be asked to enter your password again and again to confirm
82
-
that you are really the user who is logged in. This behavior is expected, and it is the default in Ubuntu. However, in some scenarios, a repeated authentication is not necessary and rather annoying.
71
+
When you run a `sudo` command on an Ubuntu machine, you might get a request to repeatedly enter your password to verify that you're the logged-in user. This is expected default Ubuntu behavior. However, in some situations, a repeated authentication isn't necessary and rather annoying.
83
72
84
-
To disable reauthentication for most cases, you can run the following command in a terminal.
73
+
To disable reauthentication for most cases, you can run this command in a terminal:
85
74
86
75
`echo -e "\n$USER ALL=(ALL) NOPASSWD: ALL\n" | sudo tee -a /etc/sudoers`
87
76
88
-
After restarting the terminal, sudo will not ask for another login and will consider the authentication from your
89
-
session login as sufficient.
77
+
After you restart the terminal, sudo won't ask for another sign-in and it will consider the authentication from your
78
+
session sign in as sufficient.
90
79
91
-
### Cannot use docker as non-root user
80
+
### Can't use docker as nonroot user
92
81
93
-
In order to use docker as a non-root user, your user needs to be member of the docker group. You can run the
94
-
`getent group docker` command to check which users belong to that group. To add your user to the docker group, run
95
-
`sudo usermod -aG docker $USER`.
82
+
To use docker as a nonroot user, your user needs membership in the docker group. The `getent group docker` command returns a list of users that belong to that group. To add your user to the docker group, run `sudo usermod -aG docker $USER`.
96
83
97
-
### Docker containers cannot interact with the outside via network
84
+
### Docker containers can't interact with the outside via network
98
85
99
-
By default, docker adds new containers to the so-called "bridge network", which is `172.17.0.0/16`. If the subnet of
100
-
that bridge network overlaps with the subnet of your DSVM or with another private subnet you have in your subscription,
101
-
no network communication between the host and the container is possible. In that case, web applications running in the container cannot be reached, and the container cannot update packages from apt.
86
+
By default, Docker adds new containers to the so-called "bridge network": `172.17.0.0/16`. The subnet of
87
+
that bridge network could overlap with the subnet of your DSVM, or with another private subnet you have in your subscription. In this case, no network communication between the host and the container is possible. Additionally, web applications that run in the container can't be reached, and the container can't update packages from apt.
102
88
103
-
To fix the issue, you need to reconfigure docker to use an IP address space for its bridge network that does not overlap
104
-
with other networks of your subscription. For example, by adding
89
+
To fix the issue, you must reconfigure Docker to use an IP address space for its bridge network that doesn't overlap
90
+
with other networks of your subscription. For example, if you add
105
91
106
92
```json
107
93
"default-address-pools": [
@@ -112,43 +98,36 @@ with other networks of your subscription. For example, by adding
112
98
]
113
99
```
114
100
115
-
to the JSON document contained in file `/etc/docker/daemon.json`, docker will assign another subnet to the bridge
116
-
network. (The file needs to be edited using sudo, for example by running `sudo nano /etc/docker/daemon.json`.)
101
+
to the `/etc/docker/daemon.json` JSON file, Docker assigns another subnet to the bridge
102
+
network. You must edit the file with sudo, for example by running `sudo nano /etc/docker/daemon.json`.
117
103
118
-
After the change, the docker service needs to be restarted by running `service docker restart`.
119
-
120
-
To check if your changes have taken effect, you can run `docker network inspect bridge`. The value under
121
-
*IPAM.Config.Subnet* should correspond to the address pool specified above.
104
+
After the change, run `service docker restart` to restart the Docker service. To determine whether or not your changes took effect, can run `docker network inspect bridge`. The value under *IPAM.Config.Subnet* should correspond to the address pool specified earlier.
122
105
123
106
### GPU(s) not available in docker container
124
107
125
-
The docker installed on the DSVM supports GPUs by default. However, there is a few prerequisite that must be met.
108
+
The Docker resource installed on the DSVM supports GPUs by default. However, that support requires certain prerequisites.
126
109
127
-
*Obviously, the VM size of the DSVM has to include at least one GPU.
128
-
* When starting your docker container with `docker run`, you need to add a *--gpus* parameter, for example, `--gpus all`.
129
-
* VM sizes that include NVIDIA A100 GPUs need additional software packages installed, esp. the
110
+
*The VM size of the DSVM must include at least one GPU.
111
+
* When you start your docker container with `docker run`, you must add a *--gpus* parameter: for example, `--gpus all`.
112
+
* VM sizes that include NVIDIA A100 GPUs require other software packages installed, especially the
130
113
[NVIDIA Fabric Manager](https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf). These packages
131
-
might not be pre-installed in your image yet.
132
-
114
+
might not be preinstalled in your image.
133
115
134
116
## Windows
135
117
136
118
### Virtual Machine Generation 2 (Gen 2) not working
137
-
When you try to create Data Science VM based on Virtual Machine Generation 2 (Gen 2) it fails.
138
-
139
-
Currently, we maintain and provide images for Data Science VM based on Windows 2019 Server only for Generation 1 virtual machines. [Gen 2](../../virtual-machines/generation-2.md) are not yet supported and we plan to support them in near future.
119
+
When you try to create Data Science VM based on Virtual Machine Generation 2 (Gen 2), it fails.
140
120
121
+
At this time, we maintain and provide images for Data Science Virtual Machines (DSVMs) based on Windows 2019 Server, only for Generation 1 DSVMs. [Gen 2](../../virtual-machines/generation-2.md) aren't yet supported, but we plan to support them in near future.
141
122
142
123
### Accessing SQL Server
143
124
144
-
When you try to connect to the pre-installed SQL Server instance, you might encounter a "login failed" error. To
145
-
successfully connect to the SQL Server instance, you need to run the program you are connecting with, for example, SQL Server
146
-
Management Studio (SSMS), in administrator mode. The administrator mode is required because by DSVM's default, only
147
-
administrators are allowed to connect.
125
+
When you try to connect to the preinstalled SQL Server instance, you might encounter a "login failed" error. To
126
+
successfully connect to the SQL Server instance, you must run the program to which you want to connect - for example, SQL Server Management Studio (SSMS) - in administrator mode. The administrator mode is required because by DSVM default behavior, only administrators can connect.
148
127
149
-
### Hyper-V does not work
128
+
### Hyper-V doesn't work
150
129
151
-
That Hyper-V initially doesn't work on Windows is expected behavior. For boot performance, we've disabled some services.
130
+
As expected behavior, Hyper-V doesn't initially work on Windows. For best performance, we disabled some services.
0 commit comments