Skip to content

Commit b30a7d2

Browse files
author
Marcelo Vanzin
committed
[SPARK-23572][DOCS] Bring "security.md" up to date.
This change basically rewrites the security documentation so that it's up to date with new features, more correct, and more complete. Because security is such an important feature, I chose to move all the relevant configuration documentation to the security page, instead of having them peppered all over the place in the configuration page. This allows an almost one-stop shop for security configuration in Spark. The only exceptions are some YARN-specific minor features which I left in the YARN page. I also re-organized the page's topics, since they didn't make a lot of sense. You had kerberos features described inside paragraphs talking about UI access control, and other oddities. It should be easier now to find information about specific Spark security features. I also enabled TOCs for both the Security and YARN pages, since that makes it easier to see what is covered. I removed most of the comments from the SecurityManager javadoc since they just replicated information in the security doc, with different levels of out-of-dateness. Author: Marcelo Vanzin <[email protected]> Closes apache#20742 from vanzin/SPARK-23572.
1 parent eb48edf commit b30a7d2

File tree

6 files changed

+673
-703
lines changed

6 files changed

+673
-703
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ streaming-tests.log
7676
target/
7777
unit-tests.log
7878
work/
79+
docs/.jekyll-metadata
7980

8081
# For Hive
8182
TempStatsStore/

core/src/main/scala/org/apache/spark/SecurityManager.scala

Lines changed: 3 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -42,148 +42,10 @@ import org.apache.spark.util.Utils
4242
* should access it from that. There are some cases where the SparkEnv hasn't been
4343
* initialized yet and this class must be instantiated directly.
4444
*
45-
* Spark currently supports authentication via a shared secret.
46-
* Authentication can be configured to be on via the 'spark.authenticate' configuration
47-
* parameter. This parameter controls whether the Spark communication protocols do
48-
* authentication using the shared secret. This authentication is a basic handshake to
49-
* make sure both sides have the same shared secret and are allowed to communicate.
50-
* If the shared secret is not identical they will not be allowed to communicate.
51-
*
52-
* The Spark UI can also be secured by using javax servlet filters. A user may want to
53-
* secure the UI if it has data that other users should not be allowed to see. The javax
54-
* servlet filter specified by the user can authenticate the user and then once the user
55-
* is logged in, Spark can compare that user versus the view acls to make sure they are
56-
* authorized to view the UI. The configs 'spark.acls.enable', 'spark.ui.view.acls' and
57-
* 'spark.ui.view.acls.groups' control the behavior of the acls. Note that the person who
58-
* started the application always has view access to the UI.
59-
*
60-
* Spark has a set of individual and group modify acls (`spark.modify.acls`) and
61-
* (`spark.modify.acls.groups`) that controls which users and groups have permission to
62-
* modify a single application. This would include things like killing the application.
63-
* By default the person who started the application has modify access. For modify access
64-
* through the UI, you must have a filter that does authentication in place for the modify
65-
* acls to work properly.
66-
*
67-
* Spark also has a set of individual and group admin acls (`spark.admin.acls`) and
68-
* (`spark.admin.acls.groups`) which is a set of users/administrators and admin groups
69-
* who always have permission to view or modify the Spark application.
70-
*
71-
* Starting from version 1.3, Spark has partial support for encrypted connections with SSL.
72-
*
73-
* At this point spark has multiple communication protocols that need to be secured and
74-
* different underlying mechanisms are used depending on the protocol:
75-
*
76-
* - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty
77-
* for the HttpServer. Jetty supports multiple authentication mechanisms -
78-
* Basic, Digest, Form, Spnego, etc. It also supports multiple different login
79-
* services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService
80-
* to authenticate using DIGEST-MD5 via a single user and the shared secret.
81-
* Since we are using DIGEST-MD5, the shared secret is not passed on the wire
82-
* in plaintext.
83-
*
84-
* We currently support SSL (https) for this communication protocol (see the details
85-
* below).
86-
*
87-
* The Spark HttpServer installs the HashLoginServer and configures it to DIGEST-MD5.
88-
* Any clients must specify the user and password. There is a default
89-
* Authenticator installed in the SecurityManager to how it does the authentication
90-
* and in this case gets the user name and password from the request.
91-
*
92-
* - BlockTransferService -> The Spark BlockTransferServices uses java nio to asynchronously
93-
* exchange messages. For this we use the Java SASL
94-
* (Simple Authentication and Security Layer) API and again use DIGEST-MD5
95-
* as the authentication mechanism. This means the shared secret is not passed
96-
* over the wire in plaintext.
97-
* Note that SASL is pluggable as to what mechanism it uses. We currently use
98-
* DIGEST-MD5 but this could be changed to use Kerberos or other in the future.
99-
* Spark currently supports "auth" for the quality of protection, which means
100-
* the connection does not support integrity or privacy protection (encryption)
101-
* after authentication. SASL also supports "auth-int" and "auth-conf" which
102-
* SPARK could support in the future to allow the user to specify the quality
103-
* of protection they want. If we support those, the messages will also have to
104-
* be wrapped and unwrapped via the SaslServer/SaslClient.wrap/unwrap API's.
105-
*
106-
* Since the NioBlockTransferService does asynchronous messages passing, the SASL
107-
* authentication is a bit more complex. A ConnectionManager can be both a client
108-
* and a Server, so for a particular connection it has to determine what to do.
109-
* A ConnectionId was added to be able to track connections and is used to
110-
* match up incoming messages with connections waiting for authentication.
111-
* The ConnectionManager tracks all the sendingConnections using the ConnectionId,
112-
* waits for the response from the server, and does the handshake before sending
113-
* the real message.
114-
*
115-
* The NettyBlockTransferService ensures that SASL authentication is performed
116-
* synchronously prior to any other communication on a connection. This is done in
117-
* SaslClientBootstrap on the client side and SaslRpcHandler on the server side.
118-
*
119-
* - HTTP for the Spark UI -> the UI was changed to use servlets so that javax servlet filters
120-
* can be used. Yarn requires a specific AmIpFilter be installed for security to work
121-
* properly. For non-Yarn deployments, users can write a filter to go through their
122-
* organization's normal login service. If an authentication filter is in place then the
123-
* SparkUI can be configured to check the logged in user against the list of users who
124-
* have view acls to see if that user is authorized.
125-
* The filters can also be used for many different purposes. For instance filters
126-
* could be used for logging, encryption, or compression.
127-
*
128-
* The exact mechanisms used to generate/distribute the shared secret are deployment-specific.
129-
*
130-
* For YARN deployments, the secret is automatically generated. The secret is placed in the Hadoop
131-
* UGI which gets passed around via the Hadoop RPC mechanism. Hadoop RPC can be configured to
132-
* support different levels of protection. See the Hadoop documentation for more details. Each
133-
* Spark application on YARN gets a different shared secret.
134-
*
135-
* On YARN, the Spark UI gets configured to use the Hadoop YARN AmIpFilter which requires the user
136-
* to go through the ResourceManager Proxy. That proxy is there to reduce the possibility of web
137-
* based attacks through YARN. Hadoop can be configured to use filters to do authentication. That
138-
* authentication then happens via the ResourceManager Proxy and Spark will use that to do
139-
* authorization against the view acls.
140-
*
141-
* For other Spark deployments, the shared secret must be specified via the
142-
* spark.authenticate.secret config.
143-
* All the nodes (Master and Workers) and the applications need to have the same shared secret.
144-
* This again is not ideal as one user could potentially affect another users application.
145-
* This should be enhanced in the future to provide better protection.
146-
* If the UI needs to be secure, the user needs to install a javax servlet filter to do the
147-
* authentication. Spark will then use that user to compare against the view acls to do
148-
* authorization. If not filter is in place the user is generally null and no authorization
149-
* can take place.
150-
*
151-
* When authentication is being used, encryption can also be enabled by setting the option
152-
* spark.authenticate.enableSaslEncryption to true. This is only supported by communication
153-
* channels that use the network-common library, and can be used as an alternative to SSL in those
154-
* cases.
155-
*
156-
* SSL can be used for encryption for certain communication channels. The user can configure the
157-
* default SSL settings which will be used for all the supported communication protocols unless
158-
* they are overwritten by protocol specific settings. This way the user can easily provide the
159-
* common settings for all the protocols without disabling the ability to configure each one
160-
* individually.
161-
*
162-
* All the SSL settings like `spark.ssl.xxx` where `xxx` is a particular configuration property,
163-
* denote the global configuration for all the supported protocols. In order to override the global
164-
* configuration for the particular protocol, the properties must be overwritten in the
165-
* protocol-specific namespace. Use `spark.ssl.yyy.xxx` settings to overwrite the global
166-
* configuration for particular protocol denoted by `yyy`. Currently `yyy` can be only`fs` for
167-
* broadcast and file server.
168-
*
169-
* Refer to [[org.apache.spark.SSLOptions]] documentation for the list of
170-
* options that can be specified.
171-
*
172-
* SecurityManager initializes SSLOptions objects for different protocols separately. SSLOptions
173-
* object parses Spark configuration at a given namespace and builds the common representation
174-
* of SSL settings. SSLOptions is then used to provide protocol-specific SSLContextFactory for
175-
* Jetty.
176-
*
177-
* SSL must be configured on each node and configured for each component involved in
178-
* communication using the particular protocol. In YARN clusters, the key-store can be prepared on
179-
* the client side then distributed and used by the executors as the part of the application
180-
* (YARN allows the user to deploy files before the application is started).
181-
* In standalone deployment, the user needs to provide key-stores and configuration
182-
* options for master and workers. In this mode, the user may allow the executors to use the SSL
183-
* settings inherited from the worker which spawned that executor. It can be accomplished by
184-
* setting `spark.ssl.useNodeLocalConf` to `true`.
45+
* This class implements all of the configuration related to security features described
46+
* in the "Security" document. Please refer to that document for specific features implemented
47+
* here.
18548
*/
186-
18749
private[spark] class SecurityManager(
18850
sparkConf: SparkConf,
18951
val ioEncryptionKey: Option[Array[Byte]] = None)

0 commit comments

Comments
 (0)