Skip to content

Commit 755d417

Browse files
committed
tokens, HDFS
1 parent 6acf2cb commit 755d417

File tree

5 files changed

+212
-40
lines changed

5 files changed

+212
-40
lines changed

sections/errors.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@
2121

2222
# OS/JVM Layer; GSS library
2323

24-
Some of these are covered in Oracle's Troubleshooting Kerberos docs. This section just highlights some of the common causes, other causes that Oracle don't mention —and messages they haven't covered.
24+
Some of these are covered in Oracle's Troubleshooting Kerberos docs.
25+
This section just highlights some of the common causes, other causes that Oracle don't mention —and messages they haven't covered.
2526

2627
## Server not found in Kerberos database (7)
2728

@@ -30,7 +31,8 @@ Some of these are covered in Oracle's Troubleshooting Kerberos docs. This sectio
3031

3132
## No valid credentials provided (Mechanism level: Illegal key size)]
3233

33-
Your JVM doesn't have the extended cryptography package and can't talk to the KDC. Switch to openjdk or go to your JVM supplier (Oracle, IBM) and download the JCE extension package, and install it in the hosts where you want Kerberos to work.
34+
Your JVM doesn't have the extended cryptography package and can't talk to the KDC.
35+
Switch to openjdk or go to your JVM supplier (Oracle, IBM) and download the JCE extension package, and install it in the hosts where you want Kerberos to work.
3436

3537
## No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt
3638

@@ -41,14 +43,22 @@ This may appear in a stack trace starting with something like:
4143
Possible causes:
4244

4345
1. You aren't logged in via `kinit`.
44-
2. You did specify a keytab but it isn't there or is somehow otherwise invalid
45-
3. You don't have the Java Cryptography Extensions installed.
46+
1. You have logged in with `kinit`, but the tickets you were issued with have expired.
47+
1. You did specify a keytab but it isn't there or is somehow otherwise invalid
48+
1. You don't have the Java Cryptography Extensions installed.
4649

4750
## Clock skew too great
4851

4952
GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null)) . . . Caused by: javax.security.auth.login.LoginException: Clock skew too great
5053

51-
This comes from the clocks on the machines being too far out of sync. This can surface if you are doing Hadoop work on some VMs and have been suspending and resuming them; they've lost track of when they are. Reboot them.
54+
GSSException: No valid credentials provided (Mechanism level: Clock skew too great (37) - PROCESS_TGS
55+
56+
kinit: krb5_get_init_creds: time skew (343) larger than max (300)
57+
58+
This comes from the clocks on the machines being too far out of sync.
59+
60+
This can surface if you are doing Hadoop work on some VMs and have been suspending and resuming them;
61+
they've lost track of when they are. Reboot them.
5262
If it's a physical cluster, make sure that your NTP daemons are pointing at the same NTP server, one that is actually reachable from the Hadoop cluster. And that the timezone settings of all the hosts are consistent.
5363

5464
## KDC has no support for encryption type
@@ -62,14 +72,14 @@ to prove to the KDC that the caller has the password. If the password is wrong,
6272
an error about checksums.
6373
1. Kerberos is very strict about hostnames and DNS; this can somehow trigger the problem.
6474
[http://stackoverflow.com/questions/12229658/java-spnego-unwanted-spn-canonicalization](http://stackoverflow.com/questions/12229658/java-spnego-unwanted-spn-canonicalization);
65-
1. Java 8 behaves differently from Java 6 & 7 here which can cause problems
75+
1. Java 8 behaves differently from Java 6 and 7 here which can cause problems
6676
[(HADOOP-11628](https://issues.apache.org/jira/browse/HADOOP-11628).
6777

6878

6979
## Principal not found
7080

7181
The hostname is wrong (or there is >1 hostname listed with different IP addrs) and so a principal
72-
of the form USER/HOST@DOMAIN is coming back with the wrong host, and the KDC doesn't find it.
82+
of the form `USER/HOST@DOMAIN` is coming back with the wrong host, and the KDC doesn't find it.
7383

7484
See the comments above about DNS for some more possibilities.
7585

@@ -89,7 +99,7 @@ offers, then the client fails. Workaround: don't use those versions of Java.
8999

90100
This has been seen in the HTTP logs of Hadoop REST/Web UIs:
91101

92-
2015-06-26 13:49:02,239 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
102+
WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signature
93103

94104
This means that the caller did not have the credentials to talk to a Kerberos-secured channel.
95105

sections/hadoop_tokens.md

Lines changed: 87 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ public class BlockTokenIdentifier extends TokenIdentifier {
6464

6565
Alongside the fields covering the block and permissions, that `cache` data contains
6666

67-
## Tickets vs Tokens
67+
## Kerberos Tickets vs Hadoop Tokens
6868

6969

7070
| Token | Function |
@@ -119,13 +119,7 @@ Alongside the fields covering the block and permissions, that `cache` data conta
119119
a thread in the background.
120120

121121

122-
123-
124-
125-
## Token Propagation in YARN Applications
126-
127-
128-
122+
129123

130124
Imagine a user deploying a YARN application in a cluster, one which needs
131125
access to the user's data stored in HDFS. The user would be required to be authenticated with
@@ -173,4 +167,88 @@ return info about the provider. One class `AnnotatedSecurityInfo`, examines the
173167
on the class to determine these values, including looking in the Hadoop configuration
174168
to determine the kerberos principal declared for that service (see [IPC](ipc.html) for specifics).
175169

176-
## Delegation Token internals
170+
## Implementation Details
171+
172+
What is inside a Hadoop Token? Whatever the
173+
service provider wishes to supply.
174+
175+
A token is treated as a byte array to be passed
176+
in communications, such as when setting up an IPC
177+
connection, or as a data to include on an HTTP header
178+
while negotiating with a remote REST endpoint.
179+
180+
The code on the server which issues tokens,
181+
the `SecretManager` is free to fill its byte arrays with
182+
structures of its choice. Sometimes serialized java objects
183+
are used; more recent code, such as that in YARN, serializes
184+
data as a protobuf structure and provides that in the byte array
185+
(example, `NMTokenIdentifier`).
186+
187+
188+
### `Token`
189+
190+
The abstract class `org.apache.hadoop.yarn.api.records.Token` is
191+
used to represent a token in Java code; it contains
192+
193+
| field | type | role |
194+
|-------|------|------|
195+
| identifier | `ByteBuffer` | the service-specific data within a token |
196+
| password | `ByteBuffer` | a password
197+
| tokenKind | `String` | token kind for looking up tokens. Example
198+
199+
200+
### `SecretManager`
201+
202+
Every server which issues tokens must implement and run
203+
a `org.apache.hadoop.security.token.SecretManager` subclass.
204+
205+
### `DelegationKey`
206+
207+
This contains a "secret" (generated by the `javax.crypto` libraries), adding serialization
208+
and equality checks. Because of this the keys can be persisted (as HDFS does) or sent
209+
over a secure channel. Uses crop up in YARN's `ZKRMStateStore`, the MapReduce History server
210+
and the YARN Application Timeline Service.
211+
212+
### How tokens are issued
213+
214+
TODO: how a connection bootstraps from Kerberos auth to Tokens
215+
216+
### How tokens are refreshed
217+
218+
TODO
219+
220+
### How delegation tokens are shared
221+
222+
DTs can be serialized; that is done when issued/renewed.
223+
224+
When making requests over Hadoop RPC, you don't need to include the DT, simply
225+
include the Hash to indicate that you have it
226+
227+
### Delegation Tokens
228+
229+
### Token Propagation in YARN Applications
230+
231+
YARN applications depend on delegation tokens to gain access to cluster
232+
resources and data on behalf of the principal. It is the task of
233+
the client-side launcher code to collect the tokens needed, and pass them
234+
to the launch context used to launch the Application Master..
235+
236+
237+
## What does this mean for my application?
238+
239+
If you are writing an application, what does this mean?
240+
241+
You need to worry about tokens in servers if
242+
243+
1. You want to support secure connections without requiring Kerberos
244+
authentication at the rate of the maximum life of a kerberos ticket.
245+
1. You want to allow applications to delegate authority, such
246+
as to YARN applications, or other services. (Example, filesystem delegation tokens
247+
provided to a Hive thrift server could be used to access the filesystem
248+
as that user).
249+
1. You want a consistent client/server authentication and identification
250+
mechanism across secure and insecure clusters. This is exactly what YARN does:
251+
a token is issued by the YARN Resource Manager to an application instance's
252+
Application Manager at launch time; this is used in all communications from
253+
the AM to the RM. Using tokens *always* means there is no separate codepath
254+
between insecure and secure clusters.

sections/hdfs.md

Lines changed: 77 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,18 @@
1-
# HDFS and Kerberos
1+
<!---
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License. See accompanying LICENSE file.
13+
-->
14+
15+
# HDFS
216

317
> It seemed to be a sort of monster, or symbol representing a monster, of a form which only a diseased fancy could conceive. If I say that my somewhat extravagant imagination yielded simultaneous pictures of an octopus, a dragon, and a human caricature, I shall not be unfaithful to the spirit of the thing. A pulpy, tentacled head surmounted a grotesque and scaly body with rudimentary wings; but it was the general outline of the whole which made it most shockingly frightful.
418
> *[The Call of Cthulhu](https://en.wikisource.org/wiki/The_Call_of_Cthulhu), HP Lovecraft, 1926.*
@@ -29,19 +43,47 @@ the HDFS team from implementing user-specific priority/throttling of HDFS data a
2943
and allow multi-tenant Hadoop clusters to prioritise high-SLA applications over lower-priority
3044
code.
3145

32-
## HDFS Namenode
46+
## HDFS NameNode
3347

34-
### TODO
3548

36-
1. Namenode reads in a keytab and initializes itself from there (i.e. no need to `kinit`; ticket
49+
1. NN reads in a keytab and initializes itself from there (i.e. no need to `kinit`; ticket
3750
renewal handed by `UGI`).
38-
1. In a secure cluster, Web HDFS requires SPNEGO
39-
1. If web auth is enabled in a secure cluster, both the DN web UI will requires SPNEGO
40-
1. In a secure cluster, if webauth is disabled, kerberos/SPNEGO auth may still be needed
41-
to access the HDFS browser. This is a point of contention: its implicit from the delegation
42-
to WebHDFS --but a change across Hadoop versions, as before an unauthed user could still browse
43-
as "dr who".
51+
1. Generates a *Secret*
52+
53+
Delegation tokens in the NN are persisted to the edit log, the operations `OP_GET_DELEGATION_TOKEN`
54+
`OP_RENEW_DELEGATION_TOKEN` and `OP_CANCEL_DELEGATION_TOKEN` covering the actions. This ensures
55+
that on failover, the tokens are still valid
56+
57+
58+
### Block Keys
59+
60+
A `BlockKey` is the secret used to show that the caller has been granted access to a block
61+
in a DN.
62+
63+
The NN issues the block key to a client, which then asks a DN for that block, supplying
64+
the key as proof of authorization.
65+
66+
Block Keys are managed in the `BlockTokenSecretManager`, one in the NN
67+
and another in every DN to track the block keys to which it has access.
68+
It is the DNs which issue block keys as blocks are created; when they heartbeat to the NN
69+
they include the keys.
70+
71+
### Block Tokens
72+
73+
A `BlockToken` is the token issued for access to a block; it includes
74+
75+
(userId, (BlockPoolId, BlockId), keyId, expiryDate, access-modes)
4476

77+
The block key itself isn't included, just the key to the referenced block. The access modes declare
78+
what access rights the caller has to the data
79+
80+
public enum AccessMode {
81+
READ, WRITE, COPY, REPLACE
82+
}
83+
84+
It is the NN which has the permissions/ACLs on each file —DNs don't have access to that data.
85+
Thus it is the BlockToken which passes this information to the DN, by way of the client.
86+
Obviously, they need to be tamper-proof.
4587

4688

4789
## DataNodes
@@ -50,24 +92,41 @@ DataNodes do not use Hadoop RPC —they transfer data over HTTP. This delivers b
5092
though the (historical) use of Jetty introduced other problems. At scale, obscure race conditions
5193
in Jetty surfaced. Hadoop now uses Netty for its DN block protocol.
5294

95+
### DataNodes and SASL
96+
5397
Pre-2.6, all that could be done to secure the DN was to bring it up on a secure (&lt;1024) port
5498
and so demonstrate that an OS superuser started the process. Hadoop 2.6 supports SASL
5599
authenticated HTTP connections, which works *provided all clients are all running Hadoop 2.6+*
56100

57-
58101
See [Secure DataNode](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataNode)
59102

60-
### TODO
61103

62104
## HDFS Client interaction
63105

64-
1. Client asks NN for access to a path, identifying via KST or DT.
65-
1. NN authenticates caller, if access to path is authorized, returns BT to the client.
66-
1. Client talks to 1+ DNs with the block, using the BT.
67-
1. DN authenticates BT using shared-secret with NN.
68-
1. if authenticated, DN compares permissions in BT with operation requested, grants or rejects it.
106+
1. Client asks NN for access to a path, identifying via Kerberos or delegation token.
107+
1. NN authenticates caller, if access to path is authorized, returns Block Token to the client.
108+
1. Client talks to 1+ DNs with the block, using the Block Token.
109+
1. DN authenticates Block Token using shared-secret with NameNode.
110+
1. if authenticated, DN compares permissions in Block Token with operation requested, then
111+
grants or rejects the request.
69112

70113
The client does not have its identity checked by the DNs. That is done by the NN. This means
71-
that the client can in theory pass a BT on to another process for delegated access to a single
114+
that the client can in theory pass a Block Token on to another process for delegated access to a single
72115
file. It has another implication: DNs can't do IO throttling on a per-user basis, as they do
73116
not know the user requesting data.
117+
118+
### NN/WebHDFS
119+
120+
1. In a secure cluster, Web HDFS requires SPNEGO
121+
1. After authenticating with a SPNEGO-negotiated mechanism, webhdfs sends an HTTP redirect,
122+
including the BlockTocken in the redirect
123+
124+
### NN/Web UI
125+
126+
1. In a secure cluster, Web HDFS requires SPNEGO
127+
1. If web auth is enabled in a secure cluster, the DN web UI will requires SPNEGO
128+
1. In a secure cluster, if webauth is disabled, kerberos/SPNEGO auth may still be needed
129+
to access the HDFS browser. This is a point of contention: its implicit from the delegation
130+
to WebHDFS --but a change across Hadoop versions, as before an unauthed user could still browse
131+
as "dr who".
132+

sections/ugi.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ This returns the logged in user
111111

112112
UserGroupInformation user = UserGroupInformation.getLoginUser();
113113

114-
If there is no current user --that is, the login process hasn't started yet,
114+
If there is no logged user --that is, the login process hasn't started yet,
115115
this triggers the login and the starting of the background refresh thread.
116116

117117
This makes it a point where the security kicks in: all configuration resources
@@ -139,6 +139,29 @@ when a service performs an action on the user's behalf
139139

140140
### `doAs()`
141141

142+
143+
This method is at the core of UGI. A call to `doAs()` executes the inner code
144+
*as the user*. In secure, that means using the Kerberos tickets and Hadoop delegation
145+
tokens belonging to them.
146+
147+
Example: loading a filesystem as a user
148+
149+
FileSystem systemFS = FileSystem.get(FileSystem.getDefaultUri(), conf);
150+
UserGroupInformation proxyUser = UserGroupInformation.createProxyUser(
151+
user, UserGroupInformation.getLoginUser());
152+
FileSystem userFS = proxyUser.doAs(new PrivilegedExceptionAction<FileSystem>() {
153+
@Override
154+
public FileSystem run() throws Exception {
155+
return FileSystem.get(systemFS.getUri(), systemFS.getConf());
156+
}
157+
});
158+
159+
Here the variable userFS contains a client of the Hadoop Filesystem with
160+
the home directory and access rights of the user `user`. If the user identity
161+
had come in via an RPC call, they'd
162+
163+
164+
142165
## Environment variable-managed UGI Initialization
143166

144167
There are some environment variables which configure UGI.

sections/what_is_kerberos.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -205,10 +205,12 @@ To look at and work with keytabs, the `ktutil` command line program is the tool
205205

206206
## Tickets
207207

208-
Kerberos is built around the notion of tickets.
208+
Kerberos is built around the notion of *tickets*.
209209

210-
A ticket is something which may be passed to a service to indicate that the caller
211-
has the permissions contained within the ticket —for the duration of the ticket's lifetime.
210+
A ticket is something which can be passed to a server to identify that the caller
211+
and to provide a secret key that can be used between the client an the server
212+
—for the duration of the ticket's lifetime. It is all that a server needs to
213+
authenticate a client: there's no need for the server to talk to the KDC.
212214

213215
What's important is that tickets can be passed on: an authenticated principal
214216
can obtain a ticket to a service, and pass that on to another process in the distributed
@@ -218,7 +220,7 @@ using that ticket. That recipient only has the permissions granted to the ticket
218220
also provided), and those permissions are only valid for as long as the ticket
219221
is valid.
220222

221-
Limited-lifetime tickets ensure that even if a ticket is captured by a malicious
223+
The limited lifetime iftickets ensure that even if a ticket is captured by a malicious
222224
attacker, they can only make use of the credential for the lifetime of the ticket.
223225
The ops team doesn't need to worry about lost/stolen tickets, to have a process for
224226
revoking them, as they expire within a short time period, usually a couple of days.

0 commit comments

Comments
 (0)