Skip to content

Commit 65305c2

Browse files
Ray Load Tests CDK Stack and Instructions for Load Testing (#1583)
* adding load test instructinos and ray stack * flake8 * black * isort * Tutorials updating paths (#1584) * sync * sync * sync * fixing pip install syntax * updating region env var * pylint
1 parent f877a0e commit 65305c2

File tree

8 files changed

+184
-23
lines changed

8 files changed

+184
-23
lines changed

CONTRIBUTING.md

Lines changed: 116 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,6 @@ You can choose from three different environments to test your fixes/changes, bas
9494
* Pick up a Linux or MacOS.
9595
* Install Python 3.7, 3.8 or 3.9 with [poetry](https://github.com/python-poetry/poetry) for package management
9696
* Fork the AWS SDK for pandas repository and clone that into your development environment
97-
* Go to the project's directory create a Python's virtual environment for the project
98-
99-
`python3 -m venv .venv && source .venv/bin/activate`
100-
101-
or
102-
103-
`python -m venv .venv && source .venv/bin/activate`
10497

10598
* Install dependencies:
10699

@@ -125,13 +118,6 @@ or
125118
* Pick up a Linux or MacOS.
126119
* Install Python 3.7, 3.8 or 3.9 with [poetry](https://github.com/python-poetry/poetry) for package management
127120
* Fork the AWS SDK for pandas repository and clone that into your development environment
128-
* Go to the project's directory create a Python's virtual environment for the project
129-
130-
`python3 -m venv .venv && source .venv/bin/activate`
131-
132-
or
133-
134-
`python -m venv .venv && source .venv/bin/activate`
135121

136122
* Install dependencies:
137123

@@ -186,9 +172,6 @@ or
186172
* Pick up a Linux or MacOS.
187173
* Install Python 3.7, 3.8 or 3.9 with [poetry](https://github.com/python-poetry/poetry) for package management
188174
* Fork the AWS SDK for pandas repository and clone that into your development environment
189-
* Go to the project's directory create a Python's virtual environment for the project
190-
191-
`python -m venv .venv && source .venv/bin/activate`
192175

193176
* Then run the command bellow to install all dependencies:
194177

@@ -262,6 +245,122 @@ or
262245

263246
``./test_infra/scripts/delete-stack.sh databases``
264247

248+
## Ray Load Tests Environment
249+
**DISCLAIMER**: Make sure you know what you are doing. These steps will charge some services on your AWS account and require a minimum security skill to keep your environment safe.
250+
251+
* Pick up a Linux or MacOS.
252+
* Install Python 3.7, 3.8 or 3.9 with [poetry](https://github.com/python-poetry/poetry) for package management
253+
* Fork the AWS SDK for pandas repository and clone that into your development environment
254+
255+
* Then run the command bellow to install all dependencies:
256+
257+
``poetry install``
258+
259+
* Go to the ``test_infra`` directory
260+
261+
``cd test_infra``
262+
263+
* Install CDK dependencies:
264+
265+
``poetry install``
266+
267+
* [OPTIONAL] Set AWS_DEFAULT_REGION to define the region the Ray Test environment will deploy into. You may want to choose a region which you don't currently use:
268+
269+
``export AWS_DEFAULT_REGION=ap-northeast-1``
270+
271+
* Go to the ``scripts`` directory
272+
273+
``cd scripts``
274+
275+
* Deploy the `ray` CDK stack.
276+
277+
``./deploy-stack.sh ray``
278+
279+
* Configure Ray Cluster
280+
281+
``vi ray-cluster-config.yaml``
282+
283+
```
284+
# Update the following file to match your enviroment
285+
# The following is an example
286+
cluster_name: ray-cluster
287+
288+
initial_workers: 2
289+
min_workers: 2
290+
max_workers: 2
291+
292+
provider:
293+
type: aws
294+
region: us-east-1 # change region as required
295+
availability_zone: us-east-1a,us-east-1b,us-east-1c # change azs as required
296+
security_group:
297+
GroupName: ray_client_security_group
298+
cache_stopped_nodes: False
299+
300+
available_node_types:
301+
ray.head.default:
302+
node_config:
303+
InstanceType: r5n.2xlarge # change instance type as required
304+
IamInstanceProfile:
305+
Arn: arn:aws:iam::{UPDATE YOUR ACCOUNT ID HERE}:instance-profile/ray-cluster-instance-profile
306+
ImageId: ami-0ea510fcb67686b48 # latest ray images -> https://github.com/amzn/amazon-ray#amazon-ray-images
307+
NetworkInterfaces:
308+
- AssociatePublicIpAddress: True
309+
SubnetId: {replace with subnet within above AZs}
310+
Groups: [{ID of group `ray_client_security_group` created by the step above}]
311+
DeviceIndex: 0
312+
313+
ray.worker.default:
314+
min_workers: 2
315+
max_workers: 2
316+
node_config:
317+
InstanceType: r5n.2xlarge
318+
IamInstanceProfile:
319+
Arn: arn:aws:iam::{UPDATE YOUR ACCOUNT ID HERE}:instance-profile/ray-cluster-instance-profile
320+
ImageId: ami-0ea510fcb67686b48 # latest ray images -> https://github.com/amzn/amazon-ray#amazon-ray-images
321+
NetworkInterfaces:
322+
- AssociatePublicIpAddress: True
323+
SubnetId: {replace with subnet within above AZs}
324+
Groups: [{ID of group `ray_client_security_group` created by the step above}]
325+
DeviceIndex: 0
326+
327+
setup_commands:
328+
- pip install "awswrangler[distributed]==3.0.0a2"
329+
- pip install pytest
330+
331+
```
332+
333+
* Create Ray Cluster
334+
``ray up -y ray-cluster-config.yaml``
335+
336+
* Push Load Tests to Ray Cluster
337+
``ray rsync-up ray-cluster-config.yaml tests/load /home/ubuntu/``
338+
339+
* Submit Pytest Run to Ray Cluster
340+
```
341+
echo '''
342+
import os
343+
344+
import pytest
345+
346+
args = "-v load/"
347+
348+
if not os.getenv("AWS_DEFAULT_REGION"):
349+
os.environ["AWS_DEFAULT_REGION"] = "us-east-1" # Set your region as necessary
350+
351+
result = pytest.main(args.split(" "))
352+
353+
print(f"result: {result}")
354+
''' > handler.py
355+
ray submit ray-cluster-config.yaml handler.py
356+
```
357+
358+
* Teardown Cluster
359+
``ray down -y ray-cluster-config.yaml``
360+
361+
[More on launching Ray Clusters on AWS](https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/aws.html#)
362+
363+
265364
## Recommended Visual Studio Code Recommended setting
266365

267366
```json

test_infra/app.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from stacks.databases_stack import DatabasesStack
55
from stacks.lakeformation_stack import LakeFormationStack
66
from stacks.opensearch_stack import OpenSearchStack
7+
from stacks.ray_stack import RayStack
78

89
app = App()
910

@@ -27,4 +28,7 @@
2728
base.get_key,
2829
)
2930

31+
RayStack(app, "aws-sdk-pandas-ray")
32+
33+
3034
app.synth()

test_infra/stacks/ray_stack.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
from aws_cdk import Stack
2+
from aws_cdk import aws_iam as iam
3+
from constructs import Construct
4+
5+
6+
class RayStack(Stack): # type: ignore
7+
def __init__(self, scope: Construct, construct_id: str, **kwargs: str) -> None:
8+
"""
9+
Ray Cluster Infrastructure.
10+
Includes IAM role and instance profile.
11+
"""
12+
super().__init__(scope, construct_id, **kwargs)
13+
14+
# Ray execution Role
15+
ray_exec_role = iam.Role(
16+
self,
17+
"ray-execution-role",
18+
assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"),
19+
managed_policies=[
20+
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2FullAccess"),
21+
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonS3FullAccess"),
22+
iam.ManagedPolicy.from_aws_managed_policy_name("CloudWatchFullAccess"),
23+
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonSSMFullAccess"),
24+
],
25+
)
26+
27+
# Add IAM pass role for a head instance to launch worker nodes
28+
# w/ an instance profile
29+
iam.Policy(
30+
self,
31+
"ray-execution-role-policy-pass-role",
32+
policy_name="IAMPassRole",
33+
roles=[ray_exec_role],
34+
statements=[
35+
iam.PolicyStatement(
36+
effect=iam.Effect.ALLOW, actions=["iam:PassRole"], resources=[ray_exec_role.role_arn]
37+
),
38+
],
39+
)
40+
41+
# Add additional permissions for Pandas SDK Load Tests
42+
iam.Policy(
43+
self,
44+
"ray-load-test-permissions",
45+
policy_name="AdditionalLoadTestPermissions",
46+
roles=[ray_exec_role],
47+
statements=[
48+
iam.PolicyStatement(effect=iam.Effect.ALLOW, actions=["timestream:WriteRecords"], resources=["*"]),
49+
],
50+
)
51+
52+
# Add instance profile
53+
iam.CfnInstanceProfile(
54+
self,
55+
"ray-instance-profile",
56+
roles=[ray_exec_role.role_name],
57+
instance_profile_name="ray-cluster-instance-profile",
58+
)

tutorials/006 - Amazon Athena.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@
119119
"cols = [\"id\", \"dt\", \"element\", \"value\", \"m_flag\", \"q_flag\", \"s_flag\", \"obs_time\"]\n",
120120
"\n",
121121
"df = wr.s3.read_csv(\n",
122-
" path=\"s3://noaa-ghcn-pds/csv/189\",\n",
122+
" path=\"s3://noaa-ghcn-pds/csv/by_year/189\",\n",
123123
" names=cols,\n",
124124
" parse_dates=[\"dt\", \"obs_time\"]) # Read 10 files from the 1890 decade (~1GB)\n",
125125
"\n",
@@ -381,4 +381,4 @@
381381
},
382382
"nbformat": 4,
383383
"nbformat_minor": 4
384-
}
384+
}

tutorials/008 - Redshift - Copy & Unload.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@
276276
"cols = [\"id\", \"dt\", \"element\", \"value\", \"m_flag\", \"q_flag\", \"s_flag\", \"obs_time\"]\n",
277277
"\n",
278278
"df = wr.s3.read_csv(\n",
279-
" path=\"s3://noaa-ghcn-pds/csv/1897.csv\",\n",
279+
" path=\"s3://noaa-ghcn-pds/csv/by_year/1897.csv\",\n",
280280
" names=cols,\n",
281281
" parse_dates=[\"dt\", \"obs_time\"]) # ~127MB, ~4MM rows\n",
282282
"\n",

tutorials/010 - Parquet Crawler.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@
244244
"cols = [\"id\", \"dt\", \"element\", \"value\", \"m_flag\", \"q_flag\", \"s_flag\", \"obs_time\"]\n",
245245
"\n",
246246
"df = wr.s3.read_csv(\n",
247-
" path=\"s3://noaa-ghcn-pds/csv/189\",\n",
247+
" path=\"s3://noaa-ghcn-pds/csv/by_year/189\",\n",
248248
" names=cols,\n",
249249
" parse_dates=[\"dt\", \"obs_time\"]) # Read 10 files from the 1890 decade (~1GB)\n",
250250
"\n",

tutorials/019 - Athena Cache.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@
272272
"cols = [\"id\", \"dt\", \"element\", \"value\", \"m_flag\", \"q_flag\", \"s_flag\", \"obs_time\"]\n",
273273
"\n",
274274
"df = wr.s3.read_csv(\n",
275-
" path=\"s3://noaa-ghcn-pds/csv/189\",\n",
275+
" path=\"s3://noaa-ghcn-pds/csv/by_year/189\",\n",
276276
" names=cols,\n",
277277
" parse_dates=[\"dt\", \"obs_time\"]) # Read 10 files from the 1890 decade (~1GB)\n",
278278
"\n",

tutorials/022 - Writing Partitions Concurrently.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@
7575
}
7676
],
7777
"source": [
78-
"noaa_path = \"s3://noaa-ghcn-pds/csv/193\"\n",
78+
"noaa_path = \"s3://noaa-ghcn-pds/csv/by_year/193\"\n",
7979
"\n",
8080
"cols = [\"id\", \"dt\", \"element\", \"value\", \"m_flag\", \"q_flag\", \"s_flag\", \"obs_time\"]\n",
8181
"dates = [\"dt\", \"obs_time\"]\n",

0 commit comments

Comments
 (0)