Skip to content

Commit 89c5c0d

Browse files
committed
incomplete JAAS, ZK, diagnostics checklists. A new error message!
1 parent 06caa47 commit 89c5c0d

File tree

7 files changed

+322
-9
lines changed

7 files changed

+322
-9
lines changed

SUMMARY.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,17 @@
66
* [HDFS and Kerberos](sections/hdfs.md)
77
* [UGI](sections/ugi.md)
88
* [Java and JDK Versions](sections/jdk_versions.md)
9+
* [JAAS](sections/jaas.md)
10+
* [Keytabs](sections/keytabs.md)
911
* [Hadoop IPC Security](sections/ipc.md)
1012
* [Web and REST](sections/web_and_rest.md)
1113
* [YARN and YARN Applications](sections/yarn.md)
1214
* [Zookeeper](sections/zookeeper.md)
13-
* [JJAS](sections/jaas.md)
1415
* [Testing](sections/testing.md)
1516
* [Low-Level Secrets](sections/secrets.md)
1617
* [Error Messages to Fear](sections/errors.md)
1718
* [The Limits of Hadoop Security](sections/the_limits_of_hadoop_security.md)
1819
* [Checklists](sections/checklists.md)
1920
* [Glossary](sections/glossary.md)
2021
* [Bibliography](sections/biblography.md)
21-
* [Acknowledgements](sections/acknowledgements.md)
22+
* [Acknowledgements](sections/acknowledgements.md)

sections/checklists.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,3 +101,17 @@
101101

102102
[ ] Code invoking Jersey Client reacts to 401/403 exception responses when using Authentication Token by deleting creating a new Auth Token and re-issuing request. (this triggers re-authentication)
103103

104+
### Debugging Workflow
105+
106+
[ ] host has an IP address (`ifconfig` / `ipconfig`)
107+
[ ] host has an FQDN: `hostname -f`
108+
[ ] FQDN resolves to hostname `nslookup $hostname`
109+
[ ] hostname responds to pings `ping $hostname`
110+
[ ] reverse DNS lookup of IPAddr returns hostname
111+
[ ] clock is in sync with rest of cluster: `date`
112+
113+
[ ] keytab exists
114+
[ ] keytab is readable by account running service.
115+
[ ] keytab contains principals in listing `ktlist -kt $keytab`
116+
[ ] keytab FQDN is in entry of form `shortname/$FQDN`
117+

sections/errors.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,13 @@ In `krb5.conf`:
107107
[libdefaults]
108108
udp_preference_limit = 1
109109

110+
## `GSSException: No valid credentials provided (Mechanism level: Connection reset)'
111+
112+
We've seen this triggered in Hadoop tests after the MiniKDC through an exception; it's thread
113+
exited and hence the Kerberos client got a connection error.
114+
115+
When you see this assume network connectivity problems, or something up at the KDC itself.
116+
110117
## Principal not found
111118

112119
The hostname is wrong (or there is >1 hostname listed with different IP addrs) and so a principal
@@ -124,6 +131,68 @@ This apparently surfaces in [Java 8 after 8u40](http://sourceforge.net/p/spnego/
124131
if Kerberos server doesn't support the first authentication mechanism which the client
125132
offers, then the client fails. Workaround: don't use those versions of Java.
126133

134+
This is [now acknowledged by Oracle](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8080129) and
135+
has been fixed in 8u60.
136+
137+
138+
## `Specified version of key is not available (44)`
139+
140+
```
141+
Client failed to SASL authenticate: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44))]
142+
```
143+
144+
The meaning of this message —or how to fix it— is a mystery to all.
145+
146+
There is [some tentative coverage in Stack Overflow](http://stackoverflow.com/questions/24511812/krbexception-specified-version-of-key-is-not-available-44)
147+
148+
One possibility is that the keys in your keytab have expired. Did you know that can happen? It does.
149+
One day your cluster works happily. The next your client requests are failing, with this message
150+
surfacing in the logs.
151+
152+
```
153+
klist -kt zk.service.keytab
154+
Keytab name: FILE:zk.service.keytab
155+
KVNO Timestamp Principal
156+
---- ----------------- --------------------------------------------------------
157+
5 12/16/14 11:46:05 zookeeper/devix.cotham.uk@COTHAM
158+
5 12/16/14 11:46:05 zookeeper/devix.cotham.uk@COTHAM
159+
5 12/16/14 11:46:05 zookeeper/devix.cotham.uk@COTHAM
160+
5 12/16/14 11:46:05 zookeeper/devix.cotham.uk@COTHAM
161+
```
162+
163+
164+
## `javax.security.auth.login.LoginException: No password provided`
165+
166+
When this surfaces in a server log, it means the server couldn't log in as the user. That is,
167+
there isn't an entry in the supplied keytab for that user.
168+
169+
Some of the possible causes
170+
171+
* The wrong keytab was specified
172+
* There isn't an entry in the keytab for the user
173+
* The hostname of the machine doesn't match that of a user in the keytab, so a match of `service/host`
174+
fails.
175+
176+
Ideally, services list the keytab and username at fault here. In a less than ideal world —that is
177+
the one we live in— things are less helpful
178+
179+
Here, for example, is a Zookeeper trace, saying it is the user `null` that is at fault.
180+
181+
```
182+
2015-12-15 17:16:23,517 - WARN [main:SaslServerCallbackHandler@105] - No password found for user: null
183+
2015-12-15 17:16:23,536 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
184+
java.io.IOException: Could not configure server because SASL configuration did not allow the ZooKeeper server to authenticate itself properly: javax.security.auth.login.LoginException: No password provided
185+
at org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:207)
186+
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:87)
187+
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:111)
188+
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
189+
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
190+
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
191+
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
192+
193+
```
194+
195+
127196
# Hadoop Web/REST APIs
128197

129198
## AuthenticationToken ignored

sections/ipc.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,32 @@ something in the service, if not the calls is rejected.
142142
Note how failures are logged to an audit log; successful operations should be logged too.
143143
The purpose of the audit log is determine the actions of a principal —both successful
144144
and unsuccessful.
145+
146+
### Downgrading to unauthed IPC
147+
148+
IPC can be set up on the client to fall back to unauthenticated IPC if it can't negotiate
149+
a kerberized connection. While convenient, this opens up some security vulnerabilitie -hence
150+
the feature is generally disabled on secure clusters. It can/should be enabled when needed
151+
152+
```
153+
-D ipc.client.fallback-to-simple-auth-allowed=true
154+
```
155+
156+
As an example, this is the option on the command line for DistCp to copy from a secure cluster
157+
to an insecure cluster, the destination only supporting simple authentication.
158+
159+
```
160+
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://secure:8020/lovecraft/books hdfs://insecure:8020/lovecraft/books
161+
```
162+
163+
Although you can set it in a core-site.xml, this is dangerous from a security perpective
164+
165+
```
166+
<property>
167+
<name>ipc.client.fallback-to-simple-auth-allowed</name>
168+
<value>true</value>
169+
</property>
170+
```
171+
172+
*warning* it's tempting to turn this on during development, as it makes problems go away. As it is
173+
not recommended in production: avoid except on the CLI during attempts to debug problems.

sections/jaas.md

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,43 @@ JAAS is a nightmare from the Enterprise Java Bean era, one which surfaces from t
2020

2121
JAAS provides for a standard configuration file format for specifying a *login context*; how code trying to run in a specific context/role should login and authenticate.
2222

23-
As a single jaas.conf file can have multiple contexts, the same file can be used to configure the server and clients of a service, each with different binding information. Different contexts can have different login/auth mechanisms, including Kerberos and LDAP, so that you can even specify different auth mechanisms for different roles.
23+
As a single `jaas.conf` file can have multiple contexts, the same file can be used to configure the server and clients of a service, each with different binding information. Different contexts can have different login/auth mechanisms, including Kerberos and LDAP, so that you can even specify different auth mechanisms for different roles.
2424

25-
In Hadoop, the JAAS context is invariably Kerberos when it comes to talking to HDFS, YARN, etc. However, if Zookeeper enters the mix, it may be interacted with differently —and so need a different JAAS context.
25+
In Hadoop, the JAAS context is invariably Kerberos when it comes to talking to HDFS, YARN, etc.
26+
However, if Zookeeper enters the mix, it may be interacted with differently —and so need a different JAAS context.
2627

2728
Fun facts about JAAS
2829

2930
1. Nobody ever mentions it, but the file takes backslashed-escapes like a Java string.
30-
1. It needs escaped backlash directory separators on Windows, such as: `C:\\security\\krb5.conf`. Get that wrong and your code will fail with what will inevitably be an unintuitive message.
31-
1. Each context must declare the authentication module to use. The kerberos authentication model on IBM JVMs is different from that on Oracle and OpenJDK JVMs. You need to know the target JVM for the context —or create separate contexts for the different JVMs.
32-
33-
Hadoop's UGI class will dynamically create a JAAS context for Hadoop logins, dynamically determining the name of the kerberos module to use. For interacting purely with HDFS and YARN, you may be able to avoid needing to know about or understand JAAS.
31+
1. It needs escaped backlash directory separators on Windows, such as: `C:\\security\\krb5.conf`.
32+
Get that wrong and your code will fail with what will inevitably be an unintuitive message.
33+
1. Each context must declare the authentication module to use.
34+
The kerberos authentication model on IBM JVMs is different from that on Oracle and OpenJDK JVMs.
35+
You need to know the target JVM for the context —or create separate contexts for the different JVMs.
36+
1. The rules about when to use `=` within an entry, and when to complete an entry with a `;` appear to be:
37+
start with the login module, one key=value line per entry, quote strings, finish with a `;`
38+
within the same file.
39+
40+
Hadoop's UGI class will dynamically create a JAAS context for Hadoop logins, dynamically determining the name of the kerberos module to use. For interacting purely with HDFS and YARN, you may be able to avoid needing to know about or understand JAAS.
41+
42+
Example of a JAAS file valid for Sun
43+
44+
If you need a basic JAAS cient configuration which
45+
```
46+
Client {
47+
com.sun.security.auth.module.Krb5LoginModule required
48+
useKeyTab=false
49+
useTicketCache=true
50+
doNotPrompt=true;
51+
};
52+
```
53+
54+
55+
# Setting a JAAS Config file for a Java process
56+
57+
```
58+
-Djava.security.auth.login.config=/path/to/server/jaas/file.conf
59+
```
60+
61+
In Hadoop applications, this has to be set in whichever environment variable is picked up
62+
by the command which your are invoking.

sections/keytabs.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
<!---
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License. See accompanying LICENSE file.
13+
-->
14+
15+
# Keytabs
16+
17+
Keytabs are critical for secure Hadoop clusters, as they allow the services to be launched
18+
without prompts for passwords
19+
20+
21+
## Creating a Keytab
22+
23+
If your management tools sets up keytabs for you: use it.
24+
25+
```
26+
kadmin.local
27+
28+
ktadd -k zk.service.keytab -norandkey zookeeper/devix@COTHAM
29+
ktadd -k zk.service.keytab -norandkey zookeeper/devix.cotham.uk@COTHAM
30+
exit
31+
```
32+
33+
and of course, make it accessible
34+
35+
```
36+
chgrp hadoop zk.service.keytab
37+
chown zookeeper zk.service.keytab
38+
```
39+
40+
check that the user can login
41+
42+
```
43+
# sudo -u zookeeper klist -e -kt zk.service.keytab
44+
# sudo -u zookeeper kinit -kt zk.service.keytab zookeeper/devix.cotham.uk
45+
# sudo -u zookeeper klist
46+
```
47+
48+
### Keytab Expiry
49+
50+
Keytabs expire
51+
52+
That is: entries in them have a limited lifespan (default: 1 year)
53+
54+
This is actually a feature —it limits how long a lost/stolen keytab can have access to the system.
55+
56+
At the same time, it's a major inconvenience as (a) the keytabs expire and (b) it's never
57+
immediately obvious why your cluster has stopped working.
58+
59+
### Keytab security
60+
61+
Keytabs are sensitive items. They need to be treated as having all the access to the data of that principal
62+
63+
### Keytabs and YARN applications
64+
65+

0 commit comments

Comments
 (0)