You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.
66
+
This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.
67
67
68
68
--- middle
69
69
70
70
# Introduction {#intro}
71
71
72
72
Recently, with the emergence of more new scenarios such as high-performance cloud services, AI large model computing, and 5G mobile backhaul networks, higher requirements have been put forward for the hardware friendliness, performance, and flexibility of the IPsec ESP protocol. A new protocol design, EESP {{?I-D.ietf-ipsecme-eesp}} {{?I-D.ietf-ipsecme-eesp-ikev2}}, is being discussed and formulated. EESP focuses on solving issues such as introducing more fine-grained sub-child-SAs, adapting the ESP header and trailer format, and allowing parts of the transport layer header to be unencrypted, and implementing flexible expansion of EESP new features through options.
73
73
74
-
In addition to the issues listed above that are being addressed, stateless encryption is also a very important point. Its basic idea is to dynamically calculate data keys based on a small number of master keys (for AES-GCM, the encryption key and authentication key are combined), which helps optimize hardware resource limitations, performance optimization, and key negotiation complexity in large-scale IPSec session scenarios. This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.
74
+
In addition to the issues listed above that are being addressed, stateless encryption is also a very important point. Its basic idea is to dynamically calculate data keys based on a small number of master keys (for AES-GCM, the encryption key and authentication key are combined), which helps optimize hardware resource limitations, performance optimization, and key negotiation complexity in large-scale IPsec session scenarios. This draft first introduces several use cases for stateless encryption, analyzes and compares some existing stateless encryption schemes in the industry, and then attempts to propose a general and flexible stateless encryption scheme based on the summarized requirements.
75
75
76
76
77
77
# Use Cases
78
78
79
79
80
80
## General Computing of Cloud Service
81
81
82
-
Public cloud services provide IPSec VPN access for massive users, and the servers in their infrastructure need to support massive IPSec session access. If hardware supports IPSec, the hardware should support session-based encryption and decryption, and the data keys of different sessions are isolated. The server needs to maintain the security connection context between the server and a large number of clients, and the hardware with limited memory cannot store the huge context. Note that the client and server do not belong to the same trusted domain in this case.
82
+
Public cloud services provide IPsec VPN access for massive users, and the servers in their infrastructure need to support massive IPsec session access. If hardware supports IPsec, the hardware should support session-based encryption and decryption, and the data keys of different sessions are isolated. The server needs to maintain the security connection context between the server and a large number of clients, and the hardware with limited memory cannot store the huge context. Note that the client and server do not belong to the same trusted domain in this case.
83
83
84
84
The stateless encryption scheme in the {{PSP}} solution proposed by Google is used to address the above hardware memory overhead problem. Its main principle is to derive a data key based on the master key on the server side, and the client side obtains the data key through an out-of-band method. It has:
85
85
@@ -133,8 +133,8 @@ As shown in the below figure, encrypted communication is required between differ
133
133
The stateless encryption scheme defined by {{UEC TSS}} can be used to solve the above problem. The main principle is that all communication instances of a HPC job belong to the same trust domain and share the same master key for both receiving and sending directions. It has:
134
134
135
135
- Pros:
136
-
- Better than Google PSP,it saves all security session contexts;
137
-
- The communication parties do not need to store data keys, and the increase of the number of instances and connections of the HPC job does not affect the number of security contexts;
136
+
- Better than Google PSP,it saves all security session contexts;
137
+
- The communication parties do not need to store data keys, and the increase of the number of instances and connections of the HPC job does not affect the number of security contexts;
138
138
- Without out of band slow path data key negotiation, the first packet delay is small;
139
139
- Data keys can be updated through the TSC.epoch.
140
140
- Cons:
@@ -184,7 +184,7 @@ Similarly, the NIC resource pool can also be used for east-west traffic access b
184
184
## AI Computing
185
185
186
186
187
-
187
+
As shown in the figure below, in a AI computing network, a computing task is collaboratively executed by a group of CPUs & XPUs located in the same trust domain or across trust domains (in the case of cross-trust domains, they are interconnected as proxies through DPU). For CPUs & XPUs within the same trust domain, stateless encryption sharing the same master key can eliminate the complexity and latency of key negotiation between chips. For interconnection across trust domains, the DPU needs to perform encryption connection proxy functions between two trust domains (local trusted domain and global trusted domain). At this time, the DPU simultaneously possesses the master keys of the two trust domains, calculates the data key for intra-domain communication in each domain based on its context, and then uses the calculated two data keys to complete the secure connection proxy across trust domains.
188
188
189
189
~~~
190
190
@@ -228,22 +228,98 @@ Similarly, the NIC resource pool can also be used for east-west traffic access b
228
228
229
229
Based on the above use cases, the requirements for a general and flexible stateless encryption scheme are as follows:
230
230
231
-
- Support nodes within a trusted trust domain to share the same master key;
232
-
- Master key supports multi-level combination design. In a trust domain, the master key is composed of multiple root keys of different types and levels, such as trust domain root key, tenant root key, task group root key, etc. This enhances the overall security of the master key and supports fine-grained encryption traffic isolation (e.g., all nodes in a trust domain, nodes of the same tenant in a trust domain, nodes of the same computing task in a trust domain, etc.);
231
+
- Support entities within a trust group to share the same master key;
232
+
- Master key supports multi-level combination design. In a trust group, the master key is composed of multiple root keys of different types and levels, such as trust region root key, user group root key, task group root key, etc. This enhances the overall security of the master key and supports fine-grained encryption traffic isolation (e.g., all entities in a trust region, entities of the same user group in a trust region, entities of the same task group in a trust region, etc.);
233
233
- Different types of root keys have different security levels and lifecycles, and corresponding key rotation mechanisms need to be defined. The master key update will trigger the data key update;
234
234
- The key rotation of each type of root key should support multiple key rotations, such as pre_key, current_key, and next_key, to support rapid rotation while ensuring that real-time encryption and decryption are not affected;
235
235
- The key derivation of the data key is based on the master key, context, and KDF. KDF must support packet-by-packet data key calculation in most cases (except when the data key is cached in memory), which requires extremely high performance and must support cryptographically secure, hardware-concurrent high-performance algorithms;
236
236
- To support real-time derivation of the Data Key, context information and IV information need to be carried with the message. To support different scenarios and different granularities of data key calculation and encryption traffic isolation (based on stream, based on source IP, based on source ID, etc.), multiple combinations of context and IV need to be supported, and different combination algorithms need to be distinguished through specific fields in the message;
237
237
- Context information enables dynamic updates of the data key, such as carrying an epoch in the context. When the epoch changes, the data key is also refreshed accordingly;
238
-
- It is necessary to support encryption proxy capabilities across trust domains. At the edge nodes across trust domains (such as DPU, Switch, etc.), support for master keys and stateless encryption of two trust domains (local trust domain and global trust domain) is required, and proxy conversion of message encryption and decryption between the two trust domains must be completed.
238
+
- It is necessary to support encryption proxy capabilities across trust regions. At the edge nodes across trust regions (such as DPU, Switch, etc.), support for master keys and stateless encryption of two trust groups (one is in local trust region and the other is in global trust region) is required, and proxy conversion of message encryption and decryption between the two trust groups must be completed.
239
239
240
240
# EESP Stateless Encryption Scheme
241
-
TBD.
241
+
Stateless Encryption is designed for large-scale general-purpose computing, AI computing, and pooled networks. It addresses the challenges of storing and managing security contexts by using computation to replace storage (key derivation) and flexible encryption and decryption, thereby enabling secure communication between nodes within and across domains. Therefore, to ensure that the endpoint can perform correct encryption and decryption without the need to store and manage security contexts, the stateless encryption extension must include the necessary fields required for calculating data key and performing the follow up encryption and decryption:
242
+
- Key Derivation Fields: Used to calculate the data key for data packets;
243
+
- Initial Vector Fields: Since AES-GCM is the primary data encryption algorithm, per-packet initialization vector (IV) should never be repeated for the same encryption key. A single duplicate IV can undermine the encryption of the entire stream;
244
+
- Confidentiality and integrity protection range Fields: Provide flexibility in the range of message confidentiality and integrity protection.
245
+
246
+
## Master Key Management
247
+
Each trust group shares a master key. The master key supports being composed of multiple root keys, including: the trust zone root key, the user group root key, and the task group root key. This mechanism enhances the overall security of the master key and supports fine-grained encryption traffic isolation. The multiple root keys that make up the group key are securely distributed by different controllers (infrastructure providers, user group administrators, task group administrators) through different controllers/KMS. An example of the data structure definition for the root key is as follows:
248
+
249
+
~~~
250
+
251
+
RootKeyStruct ::= SEQUENCE {
252
+
root_key_id OCTET STRING,
253
+
root_keys_index SEQUENCE (SIZE(3)) OF INTEGER
254
+
root_keys_value SEQUENCE (SIZE(3)) OF OCTET STRING
255
+
}
256
+
257
+
~~~
258
+
259
+
Based on the trust region, use group, and task group under the trust group, the corresponding root_key_id can be found respectively. Then, within the structure corresponding to this ID, the combination of the root_keys_index and root_key_value arrays forms three sets of root_key information (pre_key, current_key, and next_key) used for key rotation. This three-key rotation ensures the timely update of the root key (when the root key is rotated, it is replaced with the latest current_key) and guarantees that real-time encryption and decryption are not affected.
260
+
The specific method for key rotation is as follows: a new next_key is generated, the original next_key is replaced with the new current_key, and the original current_key is replaced with the new pre_key.
261
+
262
+
##Data key Derivation at Both Ends of the Communication
263
+
When secure communication is required within a trust group, the source point performs the following processing:
264
+
- data key derivation:
265
+
- Obtain the master key: Based on the trust group information, combine the relevant root keys (e.g., through XOR calculation) to derive it;
266
+
- Calculate the context information: Based on the source point IP/ID, or connection ID, etc., along with Epoch, the context is calculated using a specific algorithm. Using the source point IP/ID to calculate the context ensures that different secure sessions at the destination point have different data keys, thereby preventing the compromise of encryption security that could occur if different sessions had the same data key and the IV was also the same;
267
+
- Execute KDF to derive the data key: use the aforementioned master key and context as inputs to the KDF;
268
+
- IV Calculation: Based on the source point IP/ID or connection ID, along with Epoch, random numbers, and counters, the IV is computed using a specific algorithm;
269
+
- Determine the scope of confidentiality and integrity protection: COffset and IOffset respectively;
270
+
- Encrypt the message using the data key and IV, and construct the security header: The security header field contains all the information mentioned above. The example diagram is as follows:
{: #fig-ipsecme-eesp-stateless-security-header-option title="Example of the Master Key Option of Security Header Format for Stateless Encryption"}
304
+
305
+
306
+
Correspondingly, the destination node is processed as follows:
307
+
- Read the security header: Obtain all parameters required for key derivation;
308
+
- Data key derivation:
309
+
- Obtain the master key: Based on the master key option in the security header, combine the relevant root keys (e.g., through XOR calculation) to obtain it;
310
+
- Calculate the context information: Based on the source point IP/ID or connection ID in the security header, along with Epoch, compute the context using a specific algorithm;
311
+
- Execute KDF to derive the data key: use the aforementioned master key and context as inputs to the KDF;
312
+
- IV Calculation: Based on the source point IP/ID in the security header, or connection ID, etc., along with Epoch, random numbers, and counters, the IV is calculated according to a specific algorithm;
313
+
- Determine the scope of confidentiality and integrity protection: COffset and IOffset respectively;
314
+
- Decrypt the message using the data key and IV.
243
315
244
316
# Security Considerations
245
317
246
-
TBD.
318
+
- A highly secure control plane is required to ensure that the master keys managed by users/systems are not leaked or lost;
319
+
-
320
+
The control channel establishment phase requires two-way authentication and authorization to ensure the integrity and confidentiality of the master key during the master key distribution phase. At the same time, it ensures that the group master key is only distributed to the corresponding group members;
321
+
- The endpoint requires secure storage of the master key and data key locally.
0 commit comments