Skip to content
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
3b11c13
Removing start, stop with ec2.py, adding validations
abuabraham-ttd Dec 9, 2024
6034c5e
Removing start, stop with ec2.py, adding validations
abuabraham-ttd Dec 9, 2024
d1f1756
Updates
abuabraham-ttd Dec 10, 2024
f42f872
Updates
abuabraham-ttd Dec 10, 2024
2542232
Add virtual env and start it in systemd
abuabraham-ttd Dec 10, 2024
edf85f3
Add virtual env and start it in systemd
abuabraham-ttd Dec 10, 2024
3e95e4c
Add virtual env and start it in systemd
abuabraham-ttd Dec 10, 2024
cb70032
use venv like flask service
abuabraham-ttd Dec 10, 2024
5fe844c
use versions
abuabraham-ttd Dec 10, 2024
937e7a2
Add URL validation
abuabraham-ttd Dec 10, 2024
2b23ff0
Move validations around
abuabraham-ttd Dec 10, 2024
44aa71f
Move validations around
abuabraham-ttd Dec 10, 2024
711d50b
Move validations around
abuabraham-ttd Dec 10, 2024
4c694e7
Remove aws implemnttion from typedict
abuabraham-ttd Dec 10, 2024
62cc490
Remove aws implemnttion from typedict
abuabraham-ttd Dec 10, 2024
5de70be
Adding more logs
abuabraham-ttd Dec 11, 2024
77f1f4a
Adding min capacity
abuabraham-ttd Dec 11, 2024
a4241fc
Loop every sec for 10sec for confg server to be up
abuabraham-ttd Dec 11, 2024
0bff456
Fix regex
abuabraham-ttd Dec 11, 2024
e669887
validate after default
abuabraham-ttd Dec 11, 2024
85fc3e7
Add tested min values for capacity
abuabraham-ttd Dec 11, 2024
d7b24c7
[CI Pipeline] Released Snapshot version: 5.43.1-alpha-93-SNAPSHOT
Dec 12, 2024
4499dcf
Add to build eif stage
abuabraham-ttd Dec 12, 2024
3dd967d
[CI Pipeline] Released Snapshot version: 5.43.2-alpha-94-SNAPSHOT
Dec 12, 2024
45b2908
Dont check for enclave, kill all
abuabraham-ttd Dec 12, 2024
d890e5d
Change version on ami build
abuabraham-ttd Dec 12, 2024
8eeaf9a
[CI Pipeline] Released Snapshot version: 5.43.3-alpha-100-SNAPSHOT
Dec 13, 2024
eb8955c
Use AuxilaryConfig to store and return URLs
abuabraham-ttd Dec 13, 2024
502e9d7
fix log level
abuabraham-ttd Dec 17, 2024
b210817
Remove optout URL references, we only need core URL and it infers opt…
abuabraham-ttd Dec 17, 2024
4398ac6
use custom shared action
abuabraham-ttd Dec 17, 2024
69968b2
[CI Pipeline] Released Snapshot version: 5.43.4-alpha-101-SNAPSHOT
Dec 17, 2024
47b841c
[CI Pipeline] Released Snapshot version: 5.43.5-alpha-102-SNAPSHOT
Dec 17, 2024
51e5c6b
FIx CFN templates
abuabraham-ttd Dec 18, 2024
0aa4df8
[CI Pipeline] Released Snapshot version: 5.43.6-alpha-103-SNAPSHOT
Dec 18, 2024
251f812
[CI Pipeline] Released Snapshot version: 5.43.7-alpha-104-SNAPSHOT
Dec 18, 2024
e39c125
[CI Pipeline] Released Snapshot version: 5.43.8-alpha-105-SNAPSHOT
Dec 18, 2024
9c9d81b
Add FB in FB
abuabraham-ttd Dec 18, 2024
616964c
Bypass validations
abuabraham-ttd Dec 19, 2024
0219170
Add new param for skipping validations
abuabraham-ttd Dec 19, 2024
061050d
SkipValidations
abuabraham-ttd Dec 19, 2024
9ee0d14
SkipValidations
abuabraham-ttd Dec 19, 2024
a6650e0
Force debug, better error handle
abuabraham-ttd Dec 19, 2024
53495aa
Fix code
abuabraham-ttd Dec 19, 2024
791b6ae
Adding more logs inside enclave
abuabraham-ttd Dec 20, 2024
607fe20
Adding more logs inside enclave
abuabraham-ttd Dec 20, 2024
73acc2f
Remove all e2e related
abuabraham-ttd Dec 20, 2024
a595a55
Remove all e2e related
abuabraham-ttd Dec 20, 2024
e597e53
remove space
abuabraham-ttd Dec 20, 2024
e4a9fca
remove space
abuabraham-ttd Dec 20, 2024
5d2e54a
Add values based on condition
abuabraham-ttd Dec 20, 2024
30fcbbe
Add values based on condition
abuabraham-ttd Dec 20, 2024
1449563
removing from cfn interface
abuabraham-ttd Dec 20, 2024
30d4a8b
Remove skipvalidation as a param in CFN
abuabraham-ttd Dec 20, 2024
8b249a1
Updates to CFN templates
abuabraham-ttd Dec 21, 2024
108ab34
Add an env validation
abuabraham-ttd Dec 23, 2024
f249e8f
Revert noop chnage
abuabraham-ttd Dec 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/actions/build_aws_eif/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,9 @@ runs:

cp ${{ steps.buildFolder.outputs.BUILD_FOLDER }}/identity_scope.txt ${ARTIFACTS_OUTPUT_DIR}/
cp ${{ steps.buildFolder.outputs.BUILD_FOLDER }}/version_number.txt ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/start.sh ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/stop.sh ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/ec2.py ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/confidential_compute.py ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/requirements.txt ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/proxies.host.yaml ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/sockd.conf ${ARTIFACTS_OUTPUT_DIR}/
cp ./scripts/aws/uid2operator.service ${ARTIFACTS_OUTPUT_DIR}/
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<groupId>com.uid2</groupId>
<artifactId>uid2-operator</artifactId>
<version>5.43.0</version>
<version>5.43.8-alpha-105-SNAPSHOT</version>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
Expand Down
25 changes: 24 additions & 1 deletion scripts/aws/EUID_CloudFormation.template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,18 @@ Parameters:
Description: EUID API Token
Type: String
NoEcho: true
CoreBaseURL:
Description: CoreBaseURL
Type: String
NoEcho: true
OptoutBaseURL:
Description: OptoutBaseURL
Type: String
NoEcho: true
SkipValidations:
Description: SkipValidations on starting enclave
Type: String
Default: "false"
DeployToEnvironment:
Description: Environment to deploy to prod/integ
Type: String
Expand Down Expand Up @@ -65,6 +77,8 @@ Metadata:
Parameters:
- APIToken
- DeployToEnvironment
- CoreBaseURL
- OptoutBaseURL
- Label:
default: Instance Configuration
Parameters:
Expand All @@ -86,6 +100,12 @@ Metadata:
default: OPERATOR_KEY provided by EUID Administrator.
DeployToEnvironment:
default: EUID environment to deploy to. Prod - production; Integ - integration test.
CoreBaseURL:
default: CoreBaseURL
OptoutBaseURL:
default: OptoutBaseURL
SkipValidations:
default: Skip configuration validation before starting enclave
InstanceType:
default: Instance Type for EC2. Minimum 4 vCPUs needed. M5, M5a, M5n, M6i and R6i Instance types are tested. Choose 2xlarge or 4xlarge.
SSHKeyName:
Expand Down Expand Up @@ -159,7 +179,10 @@ Resources:
"service_instances":6,
"enclave_cpu_count":6,
"enclave_memory_mb":24576,
"environment":"${DeployToEnvironment}"
"environment":"${DeployToEnvironment}",
"core_base_url": "${CoreBaseURL}"
"optout_base_url": "${OptoutBaseURL}",
"skip_validations": "${SkipValidations}"
}'
WorkerRole:
Type: 'AWS::IAM::Role'
Expand Down
26 changes: 25 additions & 1 deletion scripts/aws/UID_CloudFormation.template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,18 @@ Parameters:
Description: UID2 API Token
Type: String
NoEcho: true
CoreBaseURL:
Description: CoreBaseURL
Type: String
NoEcho: true
OptoutBaseURL:
Description: OptoutBaseURL
Type: String
NoEcho: true
SkipValidations:
Description: SkipValidations on starting enclave
Type: String
Default: "false"
DeployToEnvironment:
Description: Environment to deploy to prod/integ
Type: String
Expand Down Expand Up @@ -65,6 +77,9 @@ Metadata:
Parameters:
- APIToken
- DeployToEnvironment
- CoreBaseURL
- OptoutBaseURL
- SkipValidations
- Label:
default: Instance Configuration
Parameters:
Expand All @@ -84,6 +99,12 @@ Metadata:
ParameterLabels:
APIToken:
default: OPERATOR_KEY provided by UID2 Administrator.
CoreBaseURL:
default: CoreBaseURL provided by UID2 Administrator.
OptoutBaseURL:
default: OptoutBaseURL provided by UID2 Administrator.
SkipValidations:
default: Skip configuration validation before starting enclave
DeployToEnvironment:
default: UID2 environment to deploy to. Prod - production; Integ - integration test.
InstanceType:
Expand Down Expand Up @@ -187,7 +208,10 @@ Resources:
"service_instances":6,
"enclave_cpu_count":6,
"enclave_memory_mb":24576,
"environment":"${DeployToEnvironment}"
"environment":"${DeployToEnvironment}",
"core_base_url": "${CoreBaseURL}",
"optout_base_url": "${OptoutBaseURL}",
"skip_validations": "${SkipValidations}"
}'
WorkerRole:
Type: 'AWS::IAM::Role'
Expand Down
2 changes: 1 addition & 1 deletion scripts/aws/config-server/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Flask==2.3.2
Werkzeug==3.0.3
setuptools==70.0.0
setuptools==70.0.0
259 changes: 259 additions & 0 deletions scripts/aws/ec2.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
#!/usr/bin/env python3

import boto3
import json
import os
import subprocess
import re
import multiprocessing
import requests
import signal
import argparse
from botocore.exceptions import ClientError
from typing import Dict
import sys
import time
import yaml

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from confidential_compute import ConfidentialCompute, ConfidentialComputeConfig, SecretNotFoundException, ConfidentialComputeStartupException

class AWSConfidentialComputeConfig(ConfidentialComputeConfig):
enclave_memory_mb: int
enclave_cpu_count: int

class AuxiliaryConfig:
FLASK_PORT: str = "27015"
LOCALHOST: str = "127.0.0.1"
AWS_METADATA: str = "169.254.169.254"

@classmethod
def get_socks_url(cls) -> str:
return f"socks5://{cls.LOCALHOST}:3306"

@classmethod
def get_config_url(cls) -> str:
return f"http://{cls.LOCALHOST}:{cls.FLASK_PORT}/getConfig"

@classmethod
def get_user_data_url(cls) -> str:
return f"http://{cls.AWS_METADATA}/latest/user-data"

@classmethod
def get_token_url(cls) -> str:
return f"http://{cls.AWS_METADATA}/latest/api/token"

@classmethod
def get_meta_url(cls) -> str:
return f"http://{cls.AWS_METADATA}/latest/dynamic/instance-identity/document"


class EC2(ConfidentialCompute):

def __init__(self):
super().__init__()

def __get_aws_token(self) -> str:
"""Fetches a temporary AWS EC2 metadata token."""
try:
response = requests.put(
AuxiliaryConfig.get_token_url(), headers={"X-aws-ec2-metadata-token-ttl-seconds": "3600"}, timeout=2
)
return response.text
except requests.RequestException as e:
raise RuntimeError(f"Failed to fetch aws token: {e}")

def __get_current_region(self) -> str:
"""Fetches the current AWS region from EC2 instance metadata."""
token = self.__get_aws_token()
headers = {"X-aws-ec2-metadata-token": token}
try:
response = requests.get(AuxiliaryConfig.get_meta_url(), headers=headers, timeout=2)
response.raise_for_status()
return response.json()["region"]
except requests.RequestException as e:
raise RuntimeError(f"Failed to fetch region: {e}")

def __validate_aws_specific_config(self, secret):
if "enclave_memory_mb" in secret or "enclave_cpu_count" in secret:
max_capacity = self.__get_max_capacity()
min_capacity = {"enclave_memory_mb": 11000, "enclave_cpu_count" : 2 }
for key in ["enclave_memory_mb", "enclave_cpu_count"]:
if int(secret.get(key, 0)) > max_capacity.get(key):
raise ValueError(f"{key} value ({secret.get(key, 0)}) exceeds the maximum allowed ({max_capacity.get(key)}).")
if min_capacity.get(key) > int(secret.get(key, 10**9)):
raise ValueError(f"{key} value ({secret.get(key, 0)}) needs to be higher than the minimum required ({min_capacity.get(key)}).")

def _get_secret(self, secret_identifier: str) -> AWSConfidentialComputeConfig:
"""Fetches a secret value from AWS Secrets Manager and adds defaults"""

def add_defaults(configs: Dict[str, any]) -> AWSConfidentialComputeConfig:
"""Adds default values to configuration if missing."""
default_capacity = self.__get_max_capacity()
configs.setdefault("enclave_memory_mb", default_capacity["enclave_memory_mb"])
configs.setdefault("enclave_cpu_count", default_capacity["enclave_cpu_count"])
configs.setdefault("debug_mode", False)
return configs

region = self.__get_current_region()
print(f"Running in {region}")
try:
client = boto3.client("secretsmanager", region_name=region)
except Exception as e:
raise RuntimeError("Please use IAM instance profile for your instance and make sure that has permission to access Secret Manager", e)
try:
secret = add_defaults(json.loads(client.get_secret_value(SecretId=secret_identifier)["SecretString"]))
self.__validate_aws_specific_config(secret)
return secret
except ClientError as _:
raise SecretNotFoundException(f"{secret_identifier} in {region}")

@staticmethod
def __get_max_capacity():
try:
with open("/etc/nitro_enclaves/allocator.yaml", "r") as file:
nitro_config = yaml.safe_load(file)
return {"enclave_memory_mb": nitro_config['memory_mib'], "enclave_cpu_count": nitro_config['cpu_count']}
except Exception as e:
raise RuntimeError("/etc/nitro_enclaves/allocator.yaml does not have CPU, memory allocated")

def __setup_vsockproxy(self, log_level: int) -> None:
"""
Sets up the vsock proxy service.
"""
thread_count = (multiprocessing.cpu_count() + 1) // 2
command = [
"/usr/bin/vsockpx", "-c", "/etc/uid2operator/proxy.yaml",
"--workers", str(thread_count), "--log-level", str(log_level), "--daemon"
]
self.run_command(command)

def __run_config_server(self) -> None:
"""
Starts the Flask configuration server.
"""
os.makedirs("/etc/secret/secret-value", exist_ok=True)
config_path = "/etc/secret/secret-value/config"
with open(config_path, 'w') as config_file:
json.dump(self.configs, config_file)
os.chdir("/opt/uid2operator/config-server")
command = ["./bin/flask", "run", "--host", AuxiliaryConfig.LOCALHOST, "--port", AuxiliaryConfig.FLASK_PORT]
self.run_command(command, seperate_process=True)

def __run_socks_proxy(self) -> None:
"""
Starts the SOCKS proxy service.
"""
command = ["sockd", "-D"]
self.run_command(command)

def __get_secret_name_from_userdata(self) -> str:
"""Extracts the secret name from EC2 user data."""
token = self.__get_aws_token()
response = requests.get(AuxiliaryConfig.get_user_data_url(), headers={"X-aws-ec2-metadata-token": token})
user_data = response.text

with open("/opt/uid2operator/identity_scope.txt") as file:
identity_scope = file.read().strip()

default_name = f"{identity_scope.lower()}-operator-config-key"
hardcoded_value = f"{identity_scope.upper()}_CONFIG_SECRET_KEY"
match = re.search(rf'^export {hardcoded_value}="(.+?)"$', user_data, re.MULTILINE)
return match.group(1) if match else default_name

def _setup_auxiliaries(self) -> None:
"""Sets up the vsock tunnel, socks proxy and flask server"""
log_level = 1 if self.configs["debug_mode"] else 3
self.__setup_vsockproxy(log_level)
self.__run_config_server()
self.__run_socks_proxy()
print("Finished setting up all auxiliaries")

def _validate_auxiliaries(self) -> None:
"""Validates connection to flask server direct and through socks proxy."""
print("Validating auxiliaries")
try:
for attempt in range(10):
try:
response = requests.get(AuxiliaryConfig.get_config_url())
print("Config server is reachable")
break
except requests.exceptions.ConnectionError as e:
print(f"Connecting to config server, attempt {attempt + 1} failed with ConnectionError: {e}")
time.sleep(1)
else:
raise RuntimeError(f"Config server unreachable")
response.raise_for_status()
except requests.RequestException as e:
raise RuntimeError(f"Failed to get config from config server: {e}")
proxies = {"http": AuxiliaryConfig.get_socks_url(), "https": AuxiliaryConfig.get_socks_url()}
try:
response = requests.get(AuxiliaryConfig.get_config_url(), proxies=proxies)
response.raise_for_status()
except requests.RequestException as e:
raise RuntimeError(f"Cannot connect to config server via SOCKS proxy: {e}")
print("Connectivity check to config server passes")

def __run_nitro_enclave(self):
command = [
"nitro-cli", "run-enclave",
"--eif-path", "/opt/uid2operator/uid2operator.eif",
"--memory", str(self.configs["enclave_memory_mb"]),
"--cpu-count", str(self.configs["enclave_cpu_count"]),
"--enclave-cid", "42",
"--enclave-name", "uid2operator"
]
if self.configs.get('debug_mode', False):
print("Running in debug_mode")
command += ["--debug-mode", "--attach-console"]
self.run_command(command, seperate_process=True)

def run_compute(self) -> None:
"""Main execution flow for confidential compute."""
secret_manager_key = self.__get_secret_name_from_userdata()
self.configs = self._get_secret(secret_manager_key)
print(f"Fetched configs from {secret_manager_key}")
if not self.configs.get("skip_validations"):
self.validate_configuration()
self._setup_auxiliaries()
self._validate_auxiliaries()
self.__run_nitro_enclave()

def cleanup(self) -> None:
"""Terminates the Nitro Enclave and auxiliary processes."""
try:
self.run_command(["nitro-cli", "terminate-enclave", "--all"])
self.__kill_auxiliaries()
except subprocess.SubprocessError as e:
raise (f"Error during cleanup: {e}")

def __kill_auxiliaries(self) -> None:
"""Kills all auxiliary processes spawned."""
try:
for process_name in ["vsockpx", "sockd", "flask"]:
result = subprocess.run(["pgrep", "-f", process_name], stdout=subprocess.PIPE, text=True, check=False)
if result.stdout.strip():
for pid in result.stdout.strip().split("\n"):
os.kill(int(pid), signal.SIGKILL)
print(f"Killed process '{process_name}'.")
else:
print(f"No process named '{process_name}' found.")
except Exception as e:
print(f"Error killing process '{process_name}': {e}")


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Manage EC2-based confidential compute workflows.")
parser.add_argument("-o", "--operation", choices=["stop", "start"], default="start", help="Operation to perform.")
args = parser.parse_args()
try:
ec2 = EC2()
if args.operation == "stop":
ec2.cleanup()
else:
ec2.run_compute()
except ConfidentialComputeStartupException as e:
print("Failed starting up Confidential Compute. Please checks the logs for errors and retry \n", e)
except Exception as e:
print("Unknown failure while starting up Confidential Compute. Please contact UID support team with this log \n ", e)

Loading
Loading