Skip to content

Commit 6acf2cb

Browse files
committed
Lots more on kerberos, some on UGI
1 parent dea80a5 commit 6acf2cb

19 files changed

+889
-121
lines changed

sections/biblography.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@
2424
Hortonworks, 2014.
2525
1. [Cloudera15] Cloudera,
2626
[Integrating Hadoop Security with Active Directory](http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_sg_hadoop_security_active_directory_integrate.html),
27-
2015
27+
2015
28+
1. [Coluris01], Colouris et al, *Distributed System Concepts and Design*, 2001

sections/checklists.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,23 @@
5252

5353
[ ] Container Credentials are retrieved in AM and containers.
5454

55+
## YARN Web UIs and REST endpoints
56+
57+
[ ] Primary Web server: `AmFilterInitializer` used to redirect requests to the RM Proxy.
58+
59+
[ ] Other web servers: a custom authentication strategy is chosen and implemented.
60+
61+
## Yarn Service
62+
63+
[ ] A strategy for token renewal is chosen and implemented
64+
5565
## Web Service
5666

5767
[ ] `AuthenticationFilter` added to web filter chain
5868

5969
[ ] Token renewal policy defined and implemented. (Look at `TimelineClientImpl` for an example of this)
6070

71+
6172
## Clients
6273

6374
### All clients

sections/concepts.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!---
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License. See accompanying LICENSE file.
13+
-->
14+
15+
# Concepts
16+
17+
This is the maths bit
18+
19+

sections/hadoop_and_kerberos.md

Lines changed: 0 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -31,100 +31,3 @@ to interact with a Hadoop cluster and applications running in it *do need to kno
3131

3232
This is what this book attempts to cover.
3333

34-
## Tickets vs Tokens
35-
36-
37-
| Token | Function |
38-
|--------------------------|----------------------------------------------------|
39-
| Authentication Token | Directly authenticate a caller. |
40-
| Delegation Token | A token which can be passed to another process. |
41-
42-
43-
### Authentication Tokens
44-
45-
Authentication Tokens are explicitly issued by services to allow the caller to
46-
interact with the service without having to re-request tickets from the TGT.
47-
48-
When an Authentication Tokens expires, the caller must request a new one off the service.
49-
If the Kerberos ticket to interact with the service has expired, this may include
50-
re-requesting a ticket off the TGS, or even re-logging in to kerberos to obtain a new TGT.
51-
52-
As such, they are almost equivalent to Kerberos Tickets -except that it is the
53-
distributed services themselves issuing the Authentication Token, not the TGS.
54-
55-
### Delegation Tokens
56-
57-
A delegation token is requested by a client of a service; they can be passed to
58-
other processes.
59-
60-
When the token expires, the original client must request a new delegation token
61-
and pass it on to the other process, again.
62-
63-
What is more important is: *delegation tokens can be renewed before they expire.*
64-
65-
This is a fundamental difference between Kerberos Tickets and Hadoop Delegation Tokens.
66-
67-
Holders of delegation tokens may renew them with a token-specific `TokenRenewer` service,
68-
so refresh them without needing the Kerberos credentials to log in to kerberos.
69-
70-
More subtly
71-
72-
1. The tokens must be renewed before they expire: once expired, a token is worthless.
73-
1. Token renewers can be implemented as a Hadoop RPC service, or by other means, *including HTTP*.
74-
75-
For the HDFS Client protocol, the client protocol itself is the token renewer. A client may
76-
talk to the Namenode using its current token, and request a new one, so refreshing it.
77-
78-
In contrast, the YARN timeline service is a pure REST API, which implements its token renewal over
79-
HTTP/HTTPS. To refresh the token, the client must issue an HTTP request (a PUT operation, interestingly
80-
enough), receiving a new token as a response.
81-
82-
Other delegation token renewal mechanisms alongside Hadoop RPC and HTTP could be implemented,
83-
that is a detail which client applications do not need to care about. All the matters is that
84-
they have the code to refresh tokens, usually code which lives alongside the RPC/REST client,
85-
*and keep renewing the tokens on a regularl basis*. Generally this is done by starting
86-
a thread in the background.
87-
88-
89-
90-
91-
92-
## Token Propagation in YARN Applications
93-
94-
95-
96-
97-
Imagine a user deploying a YARN application in a cluster, one which needs
98-
access to the user's data stored in HDFS. The user would be required to be authenticated with
99-
the KDC, and have been granted a *Ticket Granting Ticket*; the ticket needed to work with
100-
the TGS.
101-
102-
The client-side launcher of the YARN application would be able to talk to HDFS and the YARN
103-
resource manager, because the user was logged in to Kerberos. This would be managed in the Hadoop
104-
RPC layer, requesting tickets to talk to the HDFS NameNode and YARN ResourceManager, if needed.
105-
106-
To give the YARN application the same rights to HDFS, the client-side application must
107-
request a Delegation Token to talk to HDFS, a key which is then passed to the YARN application in
108-
the `ContainerLaunchContext` within the `ApplicationSubmissionContext` used to define the
109-
application to launch: its required container resources, artifacts to download, "localize",
110-
environment to set up and command to run.
111-
112-
The YARN resource manager finds a location for the Application Master, and requests that
113-
hosts' Node Manager start the container/application.
114-
115-
The Node Manager uses the "delegated HDFS token" to download the launch-time resources into
116-
a local directory space, then executes the application.
117-
118-
*Somehow*, the HDFS token (and any other supplied tokens) are passed to the application that
119-
has been launched.
120-
121-
The launched application master can use this token to interact with HDFS *as the original user*.
122-
123-
The AM can also pass token(s) on to launched containers, so that they too have access to HDFS.
124-
125-
126-
The Hadoop Name Node does not need to care whether the caller is the user themselves, the Node Manager
127-
localizing the container, the launched application or any launched containers. All it does is verify
128-
that when a caller requests access to the HDFS filesystem metadata or the contents of a file,
129-
it must have a ticket/token which declares that they are the specific user, and that the token
130-
is currently considered valid (based on the expiry time and the clock value of the Name Node)

sections/hadoop_tokens.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
<!---
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License. See accompanying LICENSE file.
13+
-->
14+
15+
# Introducing Hadoop Tokens
16+
17+
So far we've covered Kerberos and *Kerberos Tickets*. Hadoop complicates
18+
things by adding another form of delegated authentication, *Hadoop Tokens*.
19+
20+
21+
### Why does Hadoop have another layer on top of Kerberos?
22+
23+
That's a good question, one developers ask on a regular basis —at least once
24+
every hour based on our limited experiments.
25+
26+
Hadoop clusters are some of the largest "single" distributed systems on the planet
27+
in terms of numbers of services: a YARN cluster of 10,000 nodes would have
28+
10,000 hdfs principals, 10,000 yarn principals and the principals of the users
29+
running the applications. That's a lot of principals, all talking to each other,
30+
all having to talk to the KDC, having to re-authenticate all the time, and making
31+
calls to the KDC whenever they wish to talk to another principal in the system.
32+
33+
Tokens are wire-serializable objects issued by Hadoop services, which grant access
34+
to services. Some services issue tokens to callers which are then used by those callers
35+
to directly interact with other services *without involving the KDC at all*.
36+
37+
As an example, The HDFS NameNode has to give callers access to the blocks comprising a file.
38+
This isn't done in the DataNodes: all filenames and the permissions are stored in the NN.
39+
All the DNs have is their set of blocks.
40+
41+
To get at these blocks, HDFS gives an authenticated caller a *Block Tokens* for every block
42+
they need to read in a file. The caller then requests the blocks of any of the datanodes
43+
hosting that block, including the block token in the request.
44+
45+
These HDFS Block Tokens do not contain any specific knowledge of the principal running the
46+
Datanodes, instead they declare that the caller has stated access rights to the specific block, up until
47+
the token expires.
48+
49+
50+
```
51+
public class BlockTokenIdentifier extends TokenIdentifier {
52+
static final Text KIND_NAME = new Text("HDFS_BLOCK_TOKEN");
53+
54+
private long expiryDate;
55+
private int keyId;
56+
private String userId;
57+
private String blockPoolId;
58+
private long blockId;
59+
private final EnumSet<AccessMode> modes;
60+
private byte [] cache;
61+
62+
...
63+
```
64+
65+
Alongside the fields covering the block and permissions, that `cache` data contains
66+
67+
## Tickets vs Tokens
68+
69+
70+
| Token | Function |
71+
|--------------------------|----------------------------------------------------|
72+
| Authentication Token | Directly authenticate a caller. |
73+
| Delegation Token | A token which can be passed to another process. |
74+
75+
76+
### Authentication Tokens
77+
78+
Authentication Tokens are explicitly issued by services to allow the caller to
79+
interact with the service without having to re-request tickets from the TGT.
80+
81+
When an Authentication Tokens expires, the caller must request a new one off the service.
82+
If the Kerberos ticket to interact with the service has expired, this may include
83+
re-requesting a ticket off the TGS, or even re-logging in to kerberos to obtain a new TGT.
84+
85+
As such, they are almost equivalent to Kerberos Tickets -except that it is the
86+
distributed services themselves issuing the Authentication Token, not the TGS.
87+
88+
### Delegation Tokens
89+
90+
A delegation token is requested by a client of a service; they can be passed to
91+
other processes.
92+
93+
When the token expires, the original client must request a new delegation token
94+
and pass it on to the other process, again.
95+
96+
What is more important is: *delegation tokens can be renewed before they expire.*
97+
98+
This is a fundamental difference between Kerberos Tickets and Hadoop Delegation Tokens.
99+
100+
Holders of delegation tokens may renew them with a token-specific `TokenRenewer` service,
101+
so refresh them without needing the Kerberos credentials to log in to kerberos.
102+
103+
More subtly
104+
105+
1. The tokens must be renewed before they expire: once expired, a token is worthless.
106+
1. Token renewers can be implemented as a Hadoop RPC service, or by other means, *including HTTP*.
107+
108+
For the HDFS Client protocol, the client protocol itself is the token renewer. A client may
109+
talk to the Namenode using its current token, and request a new one, so refreshing it.
110+
111+
In contrast, the YARN timeline service is a pure REST API, which implements its token renewal over
112+
HTTP/HTTPS. To refresh the token, the client must issue an HTTP request (a PUT operation, interestingly
113+
enough), receiving a new token as a response.
114+
115+
Other delegation token renewal mechanisms alongside Hadoop RPC and HTTP could be implemented,
116+
that is a detail which client applications do not need to care about. All the matters is that
117+
they have the code to refresh tokens, usually code which lives alongside the RPC/REST client,
118+
*and keep renewing the tokens on a regularl basis*. Generally this is done by starting
119+
a thread in the background.
120+
121+
122+
123+
124+
125+
## Token Propagation in YARN Applications
126+
127+
128+
129+
130+
Imagine a user deploying a YARN application in a cluster, one which needs
131+
access to the user's data stored in HDFS. The user would be required to be authenticated with
132+
the KDC, and have been granted a *Ticket Granting Ticket*; the ticket needed to work with
133+
the TGS.
134+
135+
The client-side launcher of the YARN application would be able to talk to HDFS and the YARN
136+
resource manager, because the user was logged in to Kerberos. This would be managed in the Hadoop
137+
RPC layer, requesting tickets to talk to the HDFS NameNode and YARN ResourceManager, if needed.
138+
139+
To give the YARN application the same rights to HDFS, the client-side application must
140+
request a Delegation Token to talk to HDFS, a key which is then passed to the YARN application in
141+
the `ContainerLaunchContext` within the `ApplicationSubmissionContext` used to define the
142+
application to launch: its required container resources, artifacts to download, "localize",
143+
environment to set up and command to run.
144+
145+
The YARN resource manager finds a location for the Application Master, and requests that
146+
hosts' Node Manager start the container/application.
147+
148+
The Node Manager uses the "delegated HDFS token" to download the launch-time resources into
149+
a local directory space, then executes the application.
150+
151+
*Somehow*, the HDFS token (and any other supplied tokens) are passed to the application that
152+
has been launched.
153+
154+
The launched application master can use this token to interact with HDFS *as the original user*.
155+
156+
The AM can also pass token(s) on to launched containers, so that they too have access to HDFS.
157+
158+
159+
The Hadoop Name Node does not need to care whether the caller is the user themselves, the Node Manager
160+
localizing the container, the launched application or any launched containers. All it does is verify
161+
that when a caller requests access to the HDFS filesystem metadata or the contents of a file,
162+
it must have a ticket/token which declares that they are the specific user, and that the token
163+
is currently considered valid (based on the expiry time and the clock value of the Name Node)
164+
165+
166+
## Determining the Kerberos Principal for a service
167+
168+
1. Service name is derived from the URI (see `SecurityUtil.buildDTServiceName`)...different
169+
services on the same host have different service names
170+
1. Every service has a protocol (usually defined by the RPC protocol API)
171+
1. To find a token for a service, client enumerates all `SecurityInfo` instances; these
172+
return info about the provider. One class `AnnotatedSecurityInfo`, examines the annotations
173+
on the class to determine these values, including looking in the Hadoop configuration
174+
to determine the kerberos principal declared for that service (see [IPC](ipc.html) for specifics).
175+
176+
## Delegation Token internals

sections/hdfs.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,18 @@ to access the HDFS browser. This is a point of contention: its implicit from the
4444

4545

4646

47-
## Datanodes
47+
## DataNodes
48+
49+
DataNodes do not use Hadoop RPC —they transfer data over HTTP. This delivers better performance,
50+
though the (historical) use of Jetty introduced other problems. At scale, obscure race conditions
51+
in Jetty surfaced. Hadoop now uses Netty for its DN block protocol.
52+
53+
Pre-2.6, all that could be done to secure the DN was to bring it up on a secure (&lt;1024) port
54+
and so demonstrate that an OS superuser started the process. Hadoop 2.6 supports SASL
55+
authenticated HTTP connections, which works *provided all clients are all running Hadoop 2.6+*
56+
57+
58+
See [Secure DataNode](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html#Secure_DataNode)
4859

4960
### TODO
5061

sections/ipc.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,44 @@ The resource file `META-INF/services/org.apache.hadoop.security.SecurityInfo` li
6262
The RPC framework will read this file and build up the security information for the APIs (server side? Client side? both?)
6363

6464

65+
### Authenticating a caller
66+
67+
How does an IPC endpoint validate the caller? If security is turned on,
68+
the client will have had to authenticate with Kerberos, ensuring that
69+
the server can determine the identity of the principal.
70+
71+
This is something it can ask for when handling the RPC Call:
72+
73+
UserGroupInformation callerUGI;
74+
75+
// #1: get the current user identity
76+
try {
77+
callerUGI = UserGroupInformation.getCurrentUser();
78+
} catch (IOException ie) {
79+
LOG.info("Error getting UGI ", ie);
80+
AuditLogger.logFailure("UNKNOWN", "Error getting UGI");
81+
throw RPCUtil.getRemoteException(ie);
82+
}
83+
84+
The `callerUGI` variable is now set to the identity of the caller. If the caller
85+
has delegated authority (tickets, tokens) then they still authenticate as
86+
that principal they were acting as (possibly via a `doAs()` call).
87+
88+
89+
// #2 verify their permissions
90+
if (!checkAccess(callerUGI, ApplicationAccessType.MODIFY)) {
91+
AuditLogger.logFailure(callerUGI.getShortUserName(),
92+
AuditConstants.KILL_CONTAINER_REQUEST,
93+
"User doesn't have permissions to " + ApplicationAccessType.MODIFY
94+
AuditConstants.UNAUTHORIZED_USER);
95+
throw RPCUtil.getRemoteException(new AccessControlException("User "
96+
+ callerUGI.getShortUserName() + " cannot perform operation "
97+
+ ApplicationAccessType.MODIFY_APP.name());
98+
}
99+
100+
In ths example, there's a check to see if the caller can make a request which modifies
101+
something in the service, if not the calls is rejected.
102+
103+
Note how failures are logged to an audit log; successful operations should be logged too.
104+
The purpose of the audit log is determine the actions of a principal —both successful
105+
and unsuccessful.

0 commit comments

Comments
 (0)