|
| 1 | +<!--- |
| 2 | + Licensed under the Apache License, Version 2.0 (the "License"); |
| 3 | + you may not use this file except in compliance with the License. |
| 4 | + You may obtain a copy of the License at |
| 5 | + |
| 6 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | + |
| 8 | + Unless required by applicable law or agreed to in writing, software |
| 9 | + distributed under the License is distributed on an "AS IS" BASIS, |
| 10 | + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 11 | + See the License for the specific language governing permissions and |
| 12 | + limitations under the License. See accompanying LICENSE file. |
| 13 | +--> |
| 14 | + |
| 15 | +# Introducing Hadoop Tokens |
| 16 | + |
| 17 | +So far we've covered Kerberos and *Kerberos Tickets*. Hadoop complicates |
| 18 | +things by adding another form of delegated authentication, *Hadoop Tokens*. |
| 19 | + |
| 20 | + |
| 21 | +### Why does Hadoop have another layer on top of Kerberos? |
| 22 | + |
| 23 | +That's a good question, one developers ask on a regular basis —at least once |
| 24 | +every hour based on our limited experiments. |
| 25 | + |
| 26 | +Hadoop clusters are some of the largest "single" distributed systems on the planet |
| 27 | +in terms of numbers of services: a YARN cluster of 10,000 nodes would have |
| 28 | +10,000 hdfs principals, 10,000 yarn principals and the principals of the users |
| 29 | +running the applications. That's a lot of principals, all talking to each other, |
| 30 | +all having to talk to the KDC, having to re-authenticate all the time, and making |
| 31 | +calls to the KDC whenever they wish to talk to another principal in the system. |
| 32 | + |
| 33 | +Tokens are wire-serializable objects issued by Hadoop services, which grant access |
| 34 | +to services. Some services issue tokens to callers which are then used by those callers |
| 35 | +to directly interact with other services *without involving the KDC at all*. |
| 36 | + |
| 37 | +As an example, The HDFS NameNode has to give callers access to the blocks comprising a file. |
| 38 | +This isn't done in the DataNodes: all filenames and the permissions are stored in the NN. |
| 39 | +All the DNs have is their set of blocks. |
| 40 | + |
| 41 | +To get at these blocks, HDFS gives an authenticated caller a *Block Tokens* for every block |
| 42 | +they need to read in a file. The caller then requests the blocks of any of the datanodes |
| 43 | +hosting that block, including the block token in the request. |
| 44 | + |
| 45 | +These HDFS Block Tokens do not contain any specific knowledge of the principal running the |
| 46 | +Datanodes, instead they declare that the caller has stated access rights to the specific block, up until |
| 47 | +the token expires. |
| 48 | + |
| 49 | + |
| 50 | +``` |
| 51 | +public class BlockTokenIdentifier extends TokenIdentifier { |
| 52 | + static final Text KIND_NAME = new Text("HDFS_BLOCK_TOKEN"); |
| 53 | +
|
| 54 | + private long expiryDate; |
| 55 | + private int keyId; |
| 56 | + private String userId; |
| 57 | + private String blockPoolId; |
| 58 | + private long blockId; |
| 59 | + private final EnumSet<AccessMode> modes; |
| 60 | + private byte [] cache; |
| 61 | +
|
| 62 | + ... |
| 63 | +``` |
| 64 | + |
| 65 | +Alongside the fields covering the block and permissions, that `cache` data contains |
| 66 | + |
| 67 | + ## Tickets vs Tokens |
| 68 | + |
| 69 | + |
| 70 | +| Token | Function | |
| 71 | +|--------------------------|----------------------------------------------------| |
| 72 | +| Authentication Token | Directly authenticate a caller. | |
| 73 | +| Delegation Token | A token which can be passed to another process. | |
| 74 | + |
| 75 | + |
| 76 | + ### Authentication Tokens |
| 77 | + |
| 78 | + Authentication Tokens are explicitly issued by services to allow the caller to |
| 79 | + interact with the service without having to re-request tickets from the TGT. |
| 80 | + |
| 81 | + When an Authentication Tokens expires, the caller must request a new one off the service. |
| 82 | + If the Kerberos ticket to interact with the service has expired, this may include |
| 83 | + re-requesting a ticket off the TGS, or even re-logging in to kerberos to obtain a new TGT. |
| 84 | + |
| 85 | + As such, they are almost equivalent to Kerberos Tickets -except that it is the |
| 86 | + distributed services themselves issuing the Authentication Token, not the TGS. |
| 87 | + |
| 88 | + ### Delegation Tokens |
| 89 | + |
| 90 | + A delegation token is requested by a client of a service; they can be passed to |
| 91 | + other processes. |
| 92 | + |
| 93 | + When the token expires, the original client must request a new delegation token |
| 94 | + and pass it on to the other process, again. |
| 95 | + |
| 96 | + What is more important is: *delegation tokens can be renewed before they expire.* |
| 97 | + |
| 98 | + This is a fundamental difference between Kerberos Tickets and Hadoop Delegation Tokens. |
| 99 | + |
| 100 | + Holders of delegation tokens may renew them with a token-specific `TokenRenewer` service, |
| 101 | + so refresh them without needing the Kerberos credentials to log in to kerberos. |
| 102 | + |
| 103 | + More subtly |
| 104 | + |
| 105 | + 1. The tokens must be renewed before they expire: once expired, a token is worthless. |
| 106 | + 1. Token renewers can be implemented as a Hadoop RPC service, or by other means, *including HTTP*. |
| 107 | + |
| 108 | + For the HDFS Client protocol, the client protocol itself is the token renewer. A client may |
| 109 | + talk to the Namenode using its current token, and request a new one, so refreshing it. |
| 110 | + |
| 111 | + In contrast, the YARN timeline service is a pure REST API, which implements its token renewal over |
| 112 | + HTTP/HTTPS. To refresh the token, the client must issue an HTTP request (a PUT operation, interestingly |
| 113 | + enough), receiving a new token as a response. |
| 114 | + |
| 115 | + Other delegation token renewal mechanisms alongside Hadoop RPC and HTTP could be implemented, |
| 116 | + that is a detail which client applications do not need to care about. All the matters is that |
| 117 | + they have the code to refresh tokens, usually code which lives alongside the RPC/REST client, |
| 118 | + *and keep renewing the tokens on a regularl basis*. Generally this is done by starting |
| 119 | + a thread in the background. |
| 120 | + |
| 121 | + |
| 122 | + |
| 123 | + |
| 124 | + |
| 125 | + ## Token Propagation in YARN Applications |
| 126 | + |
| 127 | + |
| 128 | + |
| 129 | + |
| 130 | + Imagine a user deploying a YARN application in a cluster, one which needs |
| 131 | + access to the user's data stored in HDFS. The user would be required to be authenticated with |
| 132 | + the KDC, and have been granted a *Ticket Granting Ticket*; the ticket needed to work with |
| 133 | + the TGS. |
| 134 | + |
| 135 | + The client-side launcher of the YARN application would be able to talk to HDFS and the YARN |
| 136 | + resource manager, because the user was logged in to Kerberos. This would be managed in the Hadoop |
| 137 | + RPC layer, requesting tickets to talk to the HDFS NameNode and YARN ResourceManager, if needed. |
| 138 | + |
| 139 | + To give the YARN application the same rights to HDFS, the client-side application must |
| 140 | + request a Delegation Token to talk to HDFS, a key which is then passed to the YARN application in |
| 141 | + the `ContainerLaunchContext` within the `ApplicationSubmissionContext` used to define the |
| 142 | + application to launch: its required container resources, artifacts to download, "localize", |
| 143 | + environment to set up and command to run. |
| 144 | + |
| 145 | + The YARN resource manager finds a location for the Application Master, and requests that |
| 146 | + hosts' Node Manager start the container/application. |
| 147 | + |
| 148 | + The Node Manager uses the "delegated HDFS token" to download the launch-time resources into |
| 149 | + a local directory space, then executes the application. |
| 150 | + |
| 151 | + *Somehow*, the HDFS token (and any other supplied tokens) are passed to the application that |
| 152 | + has been launched. |
| 153 | + |
| 154 | + The launched application master can use this token to interact with HDFS *as the original user*. |
| 155 | + |
| 156 | + The AM can also pass token(s) on to launched containers, so that they too have access to HDFS. |
| 157 | + |
| 158 | + |
| 159 | + The Hadoop Name Node does not need to care whether the caller is the user themselves, the Node Manager |
| 160 | + localizing the container, the launched application or any launched containers. All it does is verify |
| 161 | + that when a caller requests access to the HDFS filesystem metadata or the contents of a file, |
| 162 | + it must have a ticket/token which declares that they are the specific user, and that the token |
| 163 | + is currently considered valid (based on the expiry time and the clock value of the Name Node) |
| 164 | + |
| 165 | + |
| 166 | +## Determining the Kerberos Principal for a service |
| 167 | + |
| 168 | +1. Service name is derived from the URI (see `SecurityUtil.buildDTServiceName`)...different |
| 169 | +services on the same host have different service names |
| 170 | +1. Every service has a protocol (usually defined by the RPC protocol API) |
| 171 | +1. To find a token for a service, client enumerates all `SecurityInfo` instances; these |
| 172 | +return info about the provider. One class `AnnotatedSecurityInfo`, examines the annotations |
| 173 | +on the class to determine these values, including looking in the Hadoop configuration |
| 174 | +to determine the kerberos principal declared for that service (see [IPC](ipc.html) for specifics). |
| 175 | + |
| 176 | +## Delegation Token internals |
0 commit comments