-
Notifications
You must be signed in to change notification settings - Fork 9.2k
[CF1] ZTIA troubleshooting guide #25733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: production
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,3 +188,292 @@ The following SSH features are not supported: | |
### Session duration | ||
|
||
SSH sessions have a maximum expected duration of 10 hours. For more information, refer to the [Troubleshooting FAQ](/cloudflare-one/faq/troubleshooting/#long-lived-ssh-sessions-frequently-disconnect). | ||
|
||
## Troubleshooting | ||
|
||
Failure to connect to your SSH endpoint could be the result of multiple variables. Use the following steps to investigate and resolve the source of your connection failure. | ||
|
||
1. [Verify that your Access policies](#1-review-access-policies) allow the user to access the target machine. | ||
2. [Check Cloudflare Tunnel](#2-check-target-machine-connection) health. | ||
3. [Confirm user existence](#3-confirm-user-existence-on-the-target-server) on the target server. | ||
4. [Check your `sshd_config` file](#4-debug-sshd_config-file-misconfiguration) for misconfiguration. | ||
|
||
### 1. Review Access policies | ||
|
||
A user may be blocked by an Access policy from reaching an SSH target because: | ||
|
||
- An Access policy exists that denies that user access, or | ||
- No explicit allow Access policy exists and Access is set to deny the user by default. | ||
|
||
:::note[Access policies and infrastructure applications] | ||
|
||
The Access infrastructure application (created in [step 5](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#5-add-an-infrastructure-application)) is the policy container for your SSH server. Cloudflare refers to your SSH server as a [target](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#4-add-a-target). | ||
|
||
[Access policies](/cloudflare-one/policies/access/policy-management/) are the rules attached to this Access infrastructure application, determining who can connect and what UNIX usernames they can log in as on the server. Cloudflare will not create new users on the target. UNIX users must already be present on the server. | ||
|
||
You were guided to create an Access policy for your SSH target in [substep 9 of step 5: Add an infrastructure application](#5-add-an-infrastructure-application). | ||
|
||
::: | ||
|
||
#### End users | ||
|
||
As an end user, run [`warp-cli target list`](/cloudflare-one/applications/non-http/infrastructure-apps/#display-available-targets) to verify that you have access to the target machine. | ||
|
||
<Render file="tunnel/warp-cli-target-list" product="cloudflare-one" /> | ||
|
||
- If the target appears in the list, confirm that the username you are attempting to connect with is shown in the output. If the username is not shown, an administrator must find the Access policy associated with the target machine and add that username to the Access policy. An administrator should have created an Access policy in [substep 9 of step 5: Add an infrastructure application](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#5-add-an-infrastructure-application). If the username is shown, that means the Access policy should be granting access and you should ensure that the tunnel is healthy in [step 2](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#2-check-target-machine-connection). | ||
|
||
- If the target does not appear in the list, an administrator must audit your organization's policies for the target machine in the Zero Trust dashboard for potential misconfigurations that may be blocking access. | ||
|
||
#### Administrators | ||
|
||
As an admin, instead of running `warp-cli target list`, you can use the Access logs to review if an Access policy is causing connection issues. Reviewing logs is useful when troubleshooting connection issues on behalf of the end user. | ||
|
||
:::note | ||
|
||
You will need Cloudflare dashboard access and log view [permissions](/cloudflare-one/roles-permissions/) to proceed with this step. | ||
|
||
::: | ||
|
||
1. In [Zero Trust](https://one.dash.cloudflare.com/), go to **Logs** > **Access**. | ||
|
||
2. Select the application you are testing or filter _Infrastructure_ as the App Type. | ||
|
||
3. Review the **Decision**. If the **Decision** is `Access denied`, select the application and copy the name under App. | ||
|
||
If the decision is `Access granted`, Access policies are not interfering with your connection attempts and your connection issue is due to the Cloudflare Tunnel, SSH server, or the `sshd_config` file. | ||
|
||
4. Go to **Access** > **Applications**. | ||
|
||
5. Input the app name in the search bar and select the application. | ||
|
||
6. Select **Configure**. | ||
|
||
7. Go to [**Policies**](/cloudflare-one/policies/access/policy-management/#test-your-policies) to review what criteria may be blocking the user. | ||
|
||
By editing a [policy](/cloudflare-one/policies/access/) that is explicitly blocking the user or adding a new policy to explicitly allow the user, the connection issue should be resolved. After saving your policy changes, attempt to connect to the target machine as the end user. | ||
|
||
If you are still having connection issues after auditing your Access policies, review Tunnel health in the following step. | ||
|
||
### 2. Check target machine connection | ||
|
||
If the end user cannot connect to the target SSH machine, the tunnel you set up in [step 1: Connect the server to Cloudflare](#1-connect-the-server-to-cloudflare) may be down or inactive. | ||
|
||
To check the status of your Tunnel: | ||
|
||
1. In [Zero Trust](https://one.dash.cloudflare.com/), go to **Networks** > **Routes**. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Routes are displayed in the main UI at Networks > Tunnels. Is there any reason why we're not making use of it and are introducing an extra step? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @nikitacano step 1 above involves looking at the IP of the target, step 2 continues by finding the name of the tunnel that's associated with the IP. We have to assume that the user may not know the exact name of the tunnel. If we jump straight to Tunnels page, they're missing the IP component - which is not completely visible nor searchable on the Tunnels page, but is searchable in the Routes page. |
||
2. Search your IP to find the Tunnel associated with the IP. | ||
|
||
This IP will be visible in the `warp-cli target list` output in [the previous step](#1-review-access-policies). If you are an admin, you can also go to **Networks** > **Targets** and find the IP next to your Hostname. | ||
|
||
3. Copy the Tunnel name. | ||
4. Go to **Networks** > **Tunnels** and search by your Tunnel name. | ||
5. Review that the [Tunnel status](/cloudflare-one/connections/connect-networks/monitor-tunnels/notifications/#available-notifications) says `Active`, and not `Down`, `Degraded`, or `Inactive`. | ||
|
||
| Status | Meaning | Recommended Action | | ||
|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| **Healthy** | The tunnel is active and serving traffic through four connections to the Cloudflare global network. | No action is required. Your Tunnel is running correctly. | | ||
| **Inactive** | The Tunnel has been created (via the API or dashboard) but the `cloudflared` connector has never been run to establish a connection. | Run the tunnel as a service (recommended) or use the `cloudflared tunnel run` command on your origin server to connect the tunnel to Cloudflare. Refer to [substep 6 of step 1 in the Create a Tunnel dashboard guide](/cloudflare-one/connections/connect-networks/get-started/create-remote-tunnel/#1-create-a-tunnel) or step 4 in the [Create a Tunnel API guide](/cloudflare-one/connections/connect-networks/get-started/create-remote-tunnel/#1-create-a-tunnel). | | ||
| **Down** | The Tunnel was previously connected but is currently disconnected because the `cloudflared` process has stopped. | 1. Ensure the `cloudflared` service or process is actively running on your server. <br /> 2. Check for server-side issues, such as the machine being powered off, an application crash, or recent network changes. | | ||
| **Degraded** | The `cloudflared` connector is running and the tunnel is serving traffic, but at least one individual connection has failed. Further degradation in [tunnel availability](/cloudflare-one/connections/connect-networks/configure-tunnels/tunnel-availability/) could risk the tunnel going down and failing to serve traffic. | 1. Review your `cloudflared` logs for connection failures or error messages. <br /> 2. Investigate local network and firewall rules to ensure they are not blocking connections to the [Cloudflare Tunnel IPs and ports](/cloudflare-one/connections/connect-networks/configure-tunnels/tunnel-with-firewall/). <br /> | | ||
|
||
For detailed steps on troubleshooting, refer to the [Troubleshooting Tunnel documentation](/cloudflare-one/connections/connect-networks/troubleshoot-tunnels/). Review the [Tunnel with Firewall documentation](/cloudflare-one/connections/connect-networks/configure-tunnels/tunnel-with-firewall/#test-connectivity) to ensure your network is correctly configured to allow `cloudflared` connections. | ||
|
||
After you have verified that there are no issues with your Tunnel's health, confirm the user's existence on the target SSH server in the following step. | ||
|
||
### 3. Confirm user existence on the target server | ||
|
||
To verify the existence of the end user on the target SSH server, run the `id <USERNAME>` command on the target SSH server to verify that the end user's username exists. If the username does not exist, you must add the user to the server. | ||
|
||
If the user exists on the target machine, debug your `sshd_config` file in the following step. | ||
|
||
### 4. Debug `sshd_config` file misconfiguration | ||
|
||
One reason a user is failing to connect to your SSH endpoint might be the result of a misconfigured `sshd_config` file. Follow the steps below to audit your `sshd_config` file for misconfigurations. | ||
|
||
#### Review your `sshd` logs | ||
|
||
`sshd` logs can confirm whether or not the user is making it to the server. The location of your `sshd` logs is defined in your `sshd_config`. The logs location is likely at `journalctl -u ssh` on Ubuntu and `tail /var/log/auth.log` for Red Hat. | ||
|
||
Using your `sshd` logs, validate that SSH connection attempts are arriving to the SSH target machine. | ||
|
||
#### Review your `sshd_config` file for misconfigurations | ||
|
||
To rule out any issues in your `sshd_config` file, compare your existing `sshd_config` file with the example below to verify if any directives are causing authentication issues. The following example `sshd_config` file will result in successful authentication: | ||
|
||
<details> | ||
<summary>Example `sshd_config` file</summary> | ||
|
||
``` | ||
# This is the sshd server system-wide configuration file. See | ||
# sshd_config(5) for more information. | ||
|
||
# The strategy used for options in the default sshd_config shipped with | ||
# OpenSSH is to specify options with their default value where | ||
# possible, but leave them commented. Uncommented options override the | ||
# default value. | ||
|
||
PubkeyAuthentication yes | ||
TrustedUserCAKeys /etc/ssh/ca.pub | ||
|
||
Include /etc/ssh/sshd_config.d/*.conf | ||
|
||
# When systemd socket activation is used (the default), the socket | ||
# configuration must be re-generated after changing Port, AddressFamily, or | ||
# ListenAddress. | ||
# | ||
# For changes to take effect, run: | ||
# | ||
# systemctl daemon-reload | ||
# systemctl restart ssh.socket | ||
# | ||
#Port 22 | ||
#AddressFamily any | ||
#ListenAddress 0.0.0.0 | ||
#ListenAddress :: | ||
|
||
#HostKey /etc/ssh/ssh_host_rsa_key | ||
#HostKey /etc/ssh/ssh_host_ecdsa_key | ||
#HostKey /etc/ssh/ssh_host_ed25519_key | ||
|
||
# Ciphers and keying | ||
#RekeyLimit default none | ||
|
||
# Logging | ||
#SyslogFacility AUTH | ||
LogLevel DEBUG3 | ||
|
||
# Authentication: | ||
|
||
#LoginGraceTime 2m | ||
PermitRootLogin yes | ||
#StrictModes yes | ||
#MaxAuthTries 6 | ||
#MaxSessions 10 | ||
|
||
|
||
|
||
# Expect .ssh/authorized_keys2 to be disregarded by default in future. | ||
#AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2 | ||
|
||
#AuthorizedPrincipalsFile none | ||
|
||
#AuthorizedKeysCommand none | ||
#AuthorizedKeysCommandUser nobody | ||
|
||
# For this to work you will also need host keys in /etc/ssh/ssh_known_hosts | ||
#HostbasedAuthentication no | ||
# Change to yes if you don't trust ~/.ssh/known_hosts for | ||
# HostbasedAuthentication | ||
#IgnoreUserKnownHosts no | ||
# Don't read the user's ~/.rhosts and ~/.shosts files | ||
#IgnoreRhosts yes | ||
|
||
# To disable tunneled clear text passwords, change to no here! | ||
#PasswordAuthentication yes | ||
#PermitEmptyPasswords no | ||
|
||
# Change to yes to enable challenge-response passwords (beware issues with | ||
# some PAM modules and threads) | ||
KbdInteractiveAuthentication no | ||
|
||
# Kerberos options | ||
#KerberosAuthentication no | ||
#KerberosOrLocalPasswd yes | ||
#KerberosTicketCleanup yes | ||
#KerberosGetAFSToken no | ||
|
||
# GSSAPI options | ||
#GSSAPIAuthentication no | ||
#GSSAPICleanupCredentials yes | ||
#GSSAPIStrictAcceptorCheck yes | ||
#GSSAPIKeyExchange no | ||
|
||
# Set this to 'yes' to enable PAM authentication, account processing, | ||
# and session processing. If this is enabled, PAM authentication will | ||
# be allowed through the KbdInteractiveAuthentication and | ||
# PasswordAuthentication. Depending on your PAM configuration, | ||
# PAM authentication via KbdInteractiveAuthentication may bypass | ||
# the setting of "PermitRootLogin yes | ||
# If you just want the PAM account and session checks to run without | ||
# PAM authentication, then enable this but set PasswordAuthentication | ||
# and KbdInteractiveAuthentication to 'no'. | ||
UsePAM yes | ||
|
||
#AllowAgentForwarding yes | ||
#AllowTcpForwarding yes | ||
#GatewayPorts no | ||
X11Forwarding yes | ||
#X11DisplayOffset 10 | ||
#X11UseLocalhost yes | ||
#PermitTTY yes | ||
PrintMotd no | ||
#PrintLastLog yes | ||
#TCPKeepAlive yes | ||
#PermitUserEnvironment no | ||
#Compression delayed | ||
#ClientAliveInterval 0 | ||
#ClientAliveCountMax 3 | ||
#UseDNS no | ||
#PidFile /run/sshd.pid | ||
#MaxStartups 10:30:100 | ||
#PermitTunnel no | ||
#ChrootDirectory none | ||
#VersionAddendum none | ||
|
||
# no default banner path | ||
#Banner none | ||
|
||
# Allow client to pass locale environment variables | ||
AcceptEnv LANG LC_* | ||
|
||
# override default of no subsystems | ||
Subsystem sftp /usr/lib/openssh/sftp-server | ||
|
||
# Example of overriding settings on a per-user basis | ||
#Match User anoncvs | ||
# X11Forwarding no | ||
# AllowTcpForwarding no | ||
# PermitTTY no | ||
# ForceCommand cvs server | ||
``` | ||
|
||
</details> | ||
|
||
#### Replace and test with example configuration | ||
|
||
The next steps will walk you through a troubleshooting regimen. You will temporarily replace your existing `sshd_config` file with the provided example to rule out configuration issues. Before proceeding, carefully [review and compare both files](#review-your-sshd_config-file-for-misconfigurations) to identify any conflicting directives. | ||
|
||
:::caution[You may lose access to your SSH server] | ||
|
||
These troubleshooting steps could result in you being locked out of your SSH server because your existing auth may rely on existing configuration that is not in the [example file](#review-your-sshd_config-file-for-misconfigurations). Proceed with utmost caution. | ||
|
||
::: | ||
|
||
1. Back up the existing `sshd_config` file. | ||
|
||
```sh | ||
mv /etc/ssh/sshd_config /etc/ssh/sshd_config.bak | ||
``` | ||
|
||
2. Create a new `sshd_config` file. | ||
|
||
```sh | ||
vi /etc/ssh/sshd_config | ||
``` | ||
|
||
3. Enter insert mode by pressing the 'i' key on your keyboard. | ||
|
||
4. Paste in the [example file](#review-your-sshd_config-file-for-misconfigurations). | ||
|
||
5. Exit insert mode by pressing the escape (`esc`) key. | ||
6. Enter `:x` to save and exit. | ||
7. [Reload](#reload-your-ssh-server) your SSH server. | ||
|
||
:::caution[Do not restart] | ||
Restarting your `sshd` service will result in the termination of your current SSH connection. Make sure to reload instead of restarting to avoid terminating all currently open SSH sessions. | ||
::: | ||
|
||
<Render file="ssh/restart-server" product="cloudflare-one" /> | ||
|
||
By completing all four troubleshooting steps, you should have resolved any connection issues caused by misconfiguration of the SSH server. If issues persist, [recheck `sshd` logs](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#review-your-sshd-logs). The example [`sshd_config` shared above](/cloudflare-one/connections/connect-networks/use-cases/ssh/ssh-infrastructure-access/#review-your-sshd_config-file-for-misconfigurations) enables debug logging and may expose more specific issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*target