You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sections/errors.md
+20-2Lines changed: 20 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,22 @@
19
19
> *[Supernatural Horror in Literature](https://en.wikisource.org/wiki/Supernatural_Horror_in_Literature), HP Lovecraft, 1927.*
20
20
21
21
22
+
Security error messages appear to take pride in providing limited information. In particular,
23
+
they are usually some generic `IOException` wrapping a generic security exception. There is some
24
+
text in the message, but it is often `Failure unspecified at GSS-API level`, which means
25
+
"something went wrong".
26
+
27
+
Generally a stack trace with UGI in it is a security problem, *though it can be a network problem
28
+
surfacing in the security code*.
29
+
30
+
The underlying causes of problems are usually the standard ones of distributed systems: networking
31
+
and configuration.
32
+
33
+
34
+
In [HADOOP-12426](https://issues.apache.org/jira/browse/HADOOP-12426) I've proposed a CLI entry point
35
+
for health checking this. Volunteers to implement welcome.
36
+
37
+
22
38
# OS/JVM Layer; GSS library
23
39
24
40
Some of these are covered in Oracle's Troubleshooting Kerberos docs.
@@ -27,7 +43,8 @@ This section just highlights some of the common causes, other causes that Oracle
27
43
## Server not found in Kerberos database (7)
28
44
29
45
* DNS is a mess and your machine does not know its own name.
30
-
* Your machine has a hostname, but it's not one there's an entry in the keytab for
46
+
* Your machine has a hostname, but the service principal is a `/_HOST` wildcard and the hostname
47
+
is not one there's an entry in the keytab for.
31
48
32
49
## No valid credentials provided (Mechanism level: Illegal key size)]
33
50
@@ -59,6 +76,7 @@ This comes from the clocks on the machines being too far out of sync.
59
76
60
77
This can surface if you are doing Hadoop work on some VMs and have been suspending and resuming them;
61
78
they've lost track of when they are. Reboot them.
79
+
62
80
If it's a physical cluster, make sure that your NTP daemons are pointing at the same NTP server, one that is actually reachable from the Hadoop cluster. And that the timezone settings of all the hosts are consistent.
63
81
64
82
## KDC has no support for encryption type
@@ -79,7 +97,7 @@ an error about checksums.
79
97
## Principal not found
80
98
81
99
The hostname is wrong (or there is >1 hostname listed with different IP addrs) and so a principal
82
-
of the form `USER/HOST@DOMAIN` is coming back with the wrong host, and the KDC doesn't find it.
100
+
of the form `user/_HOST@REALM` is coming back with the wrong host, and the KDC doesn't find it.
83
101
84
102
See the comments above about DNS for some more possibilities.
public interface MyRpc extends VersionedProtocol {
47
+
long versionID = 0x01;
48
+
...
49
+
}
50
+
```
51
+
42
52
### `SecurityInfo` subclass
43
53
44
54
Every exported RPC service will need its own extension of the `SecurityInfo` class, to provide two things:
@@ -48,16 +58,40 @@ Every exported RPC service will need its own extension of the `SecurityInfo` cla
48
58
49
59
### `PolicyProvider` subclass
50
60
51
-
A `PolicyProvider` subclass. This is used to inform the RPC infrastructure of the ACL policy: who may talk to the service. It must be explicitly passed to the RPC server
52
61
53
-
rpcService.getServer()
54
-
.refreshServiceAcl(serviceConf, new MyRPCPolicyProvider());
62
+
```
63
+
public class MyRpcPolicyProvider extends PolicyProvider {
64
+
65
+
public Service[] getServices() {
66
+
return new Service[] {
67
+
new Service("my.protocol.acl", MyRpc.class)
68
+
};
69
+
}
70
+
71
+
}
72
+
73
+
```
74
+
75
+
This is used to inform the RPC infrastructure of the ACL policy: who may talk to the service. It must be explicitly passed to the RPC server
76
+
77
+
```
78
+
rpcService.getServer() .refreshServiceAcl(serviceConf, new MyRpcPolicyProvider());
79
+
```
80
+
81
+
In practise, the ACL list is usually configured with a list of groups, rather than a user.
82
+
83
+
### `SecurityInfo` class
84
+
85
+
```
86
+
public class MyRpcSecurityInfo extends SecurityInfo { ... }
87
+
88
+
```
55
89
56
90
### `SecurityInfo` resource file
57
91
58
92
The resource file `META-INF/services/org.apache.hadoop.security.SecurityInfo` lists all RPC APIs which have a matching SecurityInfo subclass in that JAR.
59
93
60
-
org.apache.example.appmaster.rpc.RPCSecurityInfo
94
+
org.example.rpc.MyRpcSecurityInfo
61
95
62
96
The RPC framework will read this file and build up the security information for the APIs (server side? Client side? both?)
63
97
@@ -70,32 +104,37 @@ the server can determine the identity of the principal.
70
104
71
105
This is something it can ask for when handling the RPC Call:
0 commit comments