Skip to content

Commit de29a70

Browse files
authored
Merge pull request #181 from aws-samples/solana
Solana. Network throttling feature.
2 parents fa51662 + 0613583 commit de29a70

21 files changed

+213
-52
lines changed

lib/solana/README.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,16 @@ Solana nodes on AWS can be deployed in 2 different configurations: base RPC and
2626
3. The Solana nodes use all required secrets locally, but optionally can store a copy in [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html) as secure backup.
2727
4. The Solana nodes send various monitoring metrics for both EC2 and Solana nodes to Amazon CloudWatch.
2828

29+
### Optimizing Data Transfer Costs
30+
31+
Solana Agave clients generate significant outbound traffic, ranging from 80 to 200+ TiB monthly in recent years. To manage associated costs, the blueprint includes an outbound traffic optimization feature that automatically monitors and adjusts bandwidth usage.
32+
33+
The system works by tracking the node's "Slots Behind" metric after the initial sync is done. When this metric reaches zero, indicating the node is fully synced, the system applies a user-defined bandwidth limit specified in the `SOLANA_LIMIT_OUT_TRAFFIC_MBPS` variable of your `.env` file. If the slots behind metric exceeds 100, the limit is temporarily removed until the node catches up. While the default outbound bandwidth limit is set to 20 Mbit/s (~6.5 TiB/month), testing has shown that nodes can maintain synchronization even at speeds as low as 15 Mbit/s. Inbound bandwidth remains unrestricted.
34+
35+
To maintain operational efficiency, the system excludes internal network traffic from these restrictions. Traffic within standard internal IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16) remains unrestricted, ensuring that AWS applications using internal IPs function normally. This optimization can reduce data transfer costs by over 90%.
36+
37+
It's important to note that while this feature is highly effective for RPC nodes, it should not be implemented on consensus nodes. Restricting outbound traffic on consensus nodes can compromise performance and is not recommended for optimal network participation.
38+
2939
## Additional materials
3040

3141
<details>
@@ -84,8 +94,8 @@ This is the Well-Architected checklist for Solana nodes implementation of the AW
8494

8595
| Usage pattern | Ideal configuration | Primary option on AWS | Data Transfer Estimates | Config reference |
8696
|---|---|---|---|---|
87-
| 1/ Base RPC node (no secondary indexes) | 48 vCPU, 384 GiB RAM, Accounts volume: EBS gp3, 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: EBS gp3, 2TB, 9K IOPS, 700 MB/s throughput | r7a.12xlarge, Accounts volume: EBS gp3, 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: EBS gp3, 2TB, 9K IOPS, 700 MB/s throughput | 13-15TB/month (no staking) | [.env-sample-baserpc-x86](./sample-configs/.env-sample-baserpc-x86) |
88-
| 2/ Extended RPC node (with all secondary indexes) | 96 vCPU, 768 GiB RAM, Accounts volume: 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: 2TB, 9K IOPS, 700 MB/s throughput | I8g.18xlarge, Accounts volume: Instance Store, Data volume: Instance Store | 20-38TB/month (no staking) | [.env-sample-extendedrpc-arm](./sample-configs/.env-sample-extendedrpc-arm) |
97+
| 1/ Base RPC node (no secondary indexes) | 48 vCPU, 384 GiB RAM, Accounts volume: EBS gp3, 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: EBS gp3, 2TB, 9K IOPS, 700 MB/s throughput | r7a.12xlarge, Accounts volume: EBS gp3, 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: EBS gp3, 2TB, 9K IOPS, 700 MB/s throughput | 100-200TB/month (no staking) | [.env-sample-baserpc-x86](./sample-configs/.env-sample-baserpc-x86) |
98+
| 2/ Extended RPC node (with all secondary indexes) | 96 vCPU, 768 GiB RAM, Accounts volume: 500GiB, 7K IOPS, 700 MB/s throughput, Data volume: 2TB, 9K IOPS, 700 MB/s throughput | r7a.24xlarge, Accounts volume: EBS io2, 500GiB, 10K IOPS, Data volume: EBS io2, 2000GiB, 30K IOPS | 100-200TB/month (no staking) | [.env-sample-extendedrpc-arm](./sample-configs/.env-sample-extendedrpc-arm) |
8999
</details>
90100

91101
## Setup Instructions
@@ -305,6 +315,37 @@ free -g
305315
sudo sysctl vm.swappiness=10
306316
```
307317

318+
6. How can I check network throttling configuration currently applied to the instance?
319+
320+
```bash
321+
# Check iptables manage table
322+
iptables -t mangle -L -n -v
323+
324+
# Set network interface ID
325+
INTERFACE=$(ip -br addr show | grep -v '^lo' | awk '{print $1}' | head -n1)
326+
327+
# Check traffic control (tc) tool configuration
328+
tc qdisc show
329+
330+
# Watch current traffic
331+
tc -s qdisc ls dev $INTERFACE
332+
333+
# Monitor bandwidth in real-time
334+
iftop -i $INTERFACE
335+
```
336+
337+
7. How to manually remove all iptables and tc rules?
338+
339+
```bash
340+
# Remove tc rules
341+
tc qdisc del dev $INTERFACE root
342+
343+
# Remove iptables rules
344+
iptables -t mangle -D OUTPUT -j MARKING
345+
iptables -t mangle -F MARKING
346+
iptables -t mangle -X MARKING
347+
```
348+
308349
## Upgrades
309350

310351
When nodes need to be upgraded or downgraded, [use blue/green pattern to do it](https://aws.amazon.com/blogs/devops/performing-bluegreen-deployments-with-aws-codedeploy-and-auto-scaling-groups/). This is not yet automated and contributions are welcome!

lib/solana/app.ts

Lines changed: 3 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,34 +21,15 @@ new SolanaSingleNodeStack(app, "solana-single-node", {
2121
stackName: `solana-single-node-${config.baseNodeConfig.nodeConfiguration}`,
2222
env: { account: config.baseConfig.accountId, region: config.baseConfig.region },
2323

24-
instanceType: config.baseNodeConfig.instanceType,
25-
instanceCpuType: config.baseNodeConfig.instanceCpuType,
26-
solanaCluster: config.baseNodeConfig.solanaCluster,
27-
solanaVersion: config.baseNodeConfig.solanaVersion,
28-
nodeConfiguration: config.baseNodeConfig.nodeConfiguration,
29-
dataVolume: config.baseNodeConfig.dataVolume,
30-
accountsVolume: config.baseNodeConfig.accountsVolume,
31-
solanaNodeIdentitySecretARN: config.baseNodeConfig.solanaNodeIdentitySecretARN,
32-
voteAccountSecretARN: config.baseNodeConfig.voteAccountSecretARN,
33-
authorizedWithdrawerAccountSecretARN: config.baseNodeConfig.authorizedWithdrawerAccountSecretARN,
34-
registrationTransactionFundingAccountSecretARN: config.baseNodeConfig.registrationTransactionFundingAccountSecretARN,
24+
...config.baseNodeConfig
3525
});
3626

3727
new SolanaHANodesStack(app, "solana-ha-nodes", {
3828
stackName: `solana-ha-nodes-${config.baseNodeConfig.nodeConfiguration}`,
3929
env: { account: config.baseConfig.accountId, region: config.baseConfig.region },
4030

41-
instanceType: config.baseNodeConfig.instanceType,
42-
instanceCpuType: config.baseNodeConfig.instanceCpuType,
43-
solanaCluster: config.baseNodeConfig.solanaCluster,
44-
solanaVersion: config.baseNodeConfig.solanaVersion,
45-
nodeConfiguration: config.baseNodeConfig.nodeConfiguration,
46-
dataVolume: config.baseNodeConfig.dataVolume,
47-
accountsVolume: config.baseNodeConfig.accountsVolume,
48-
49-
albHealthCheckGracePeriodMin: config.haNodeConfig.albHealthCheckGracePeriodMin,
50-
heartBeatDelayMin: config.haNodeConfig.heartBeatDelayMin,
51-
numberOfNodes: config.haNodeConfig.numberOfNodes,
31+
...config.baseNodeConfig,
32+
...config.haNodeConfig,
5233
});
5334

5435

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/bin/bash
2+
3+
# Add input as command line parameters for name of the directory to mount
4+
if [ -n "$1" ]; then
5+
LIMIT_OUT_TRAFFIC_MBPS=$1
6+
else
7+
echo "Warning: Specify max value for outbound data traffic in Mbps."
8+
echo "Usage: net-rules.sh <max_bandwidth_mbps>"
9+
exit 1;
10+
fi
11+
12+
# Step 1: Create an iptables rule to mark packets going to public IPs
13+
# Create a new chain for our marking rules
14+
iptables -t mangle -N MARKING
15+
16+
# Add rules to return (skip marking) for private IP ranges
17+
iptables -t mangle -A MARKING -d 10.0.0.0/8 -j RETURN
18+
iptables -t mangle -A MARKING -d 172.16.0.0/12 -j RETURN
19+
iptables -t mangle -A MARKING -d 192.168.0.0/16 -j RETURN
20+
iptables -t mangle -A MARKING -d 169.254.0.0/16 -j RETURN
21+
22+
# Mark remaining traffic (public IPs)
23+
iptables -t mangle -A MARKING -j MARK --set-mark 1
24+
25+
# Jump to our MARKING chain from OUTPUT
26+
iptables -t mangle -A OUTPUT -j MARKING
27+
28+
# Step 2: Set up tc with filter for marked packets
29+
INTERFACE=$(ip -br addr show | grep -v '^lo' | awk '{print $1}' | head -n1)
30+
31+
tc qdisc add dev $INTERFACE root handle 1: prio
32+
33+
# Step 3: Add the tbf filter for marked packets
34+
tc filter add dev $INTERFACE parent 1: protocol ip handle 1 fw flowid 1:1
35+
tc qdisc add dev $INTERFACE parent 1:1 tbf rate "${LIMIT_OUT_TRAFFIC_MBPS}mbit" burst 20kb latency 50ms
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
3+
INTERFACE=$(ip -br addr show | grep -v '^lo' | awk '{print $1}' | head -n1)
4+
5+
# Remove tc rules
6+
/usr/sbin/tc qdisc del dev $INTERFACE root
7+
8+
# Remove iptables rules
9+
/usr/sbin/iptables -t mangle -D OUTPUT -j MARKING
10+
/usr/sbin/iptables -t mangle -F MARKING
11+
/usr/sbin/iptables -t mangle -X MARKING
12+
13+
exit 0;
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
[Unit]
2+
Description="ipables and Traffic Control Rules"
3+
After=network.target
4+
5+
[Service]
6+
Type=oneshot
7+
RemainAfterExit=yes
8+
ExecStart=/opt/instance/network/net-rules-start.sh _LIMIT_OUT_TRAFFIC_MBPS_
9+
ExecStop=/opt/instance/network/net-rules-stop.sh
10+
11+
[Install]
12+
WantedBy=multi-user.target
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[Unit]
2+
Description="Net sync checker for blockchain node"
3+
4+
[Service]
5+
ExecStart=/opt/instance/network/net-syncchecker.sh
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
[Unit]
2+
Description="Run Network Sync checker service every 1 min"
3+
4+
[Timer]
5+
OnCalendar=*:*:0/1
6+
Unit=net-sync-checker.service
7+
8+
[Install]
9+
WantedBy=multi-user.target
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/bin/bash
2+
3+
INIT_COMPLETED_FILE=/data/data/init-completed
4+
5+
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
6+
EC2_INTERNAL_IP=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/local-ipv4)
7+
8+
# Start checking the sync node status only after the node has finished the initial sync
9+
if [ -f "$INIT_COMPLETED_FILE" ]; then
10+
SOLANA_SLOTS_BEHIND_DATA=$(curl -s -X POST -H "Content-Type: application/json" -d ' {"jsonrpc":"2.0","id":1, "method":"getHealth"}' http://$EC2_INTERNAL_IP:8899 | jq .error.data)
11+
SOLANA_SLOTS_BEHIND=$(echo $SOLANA_SLOTS_BEHIND_DATA | jq .numSlotsBehind -r)
12+
13+
if [ "$SOLANA_SLOTS_BEHIND" == "null" ] || [ -z "$SOLANA_SLOTS_BEHIND" ]
14+
then
15+
SOLANA_SLOTS_BEHIND=0
16+
fi
17+
18+
if [ $SOLANA_SLOTS_BEHIND -gt 100 ]
19+
then
20+
if systemctl is-active --quiet net-rules; then
21+
systemctl stop net-rules
22+
fi
23+
fi
24+
25+
if [ $SOLANA_SLOTS_BEHIND -eq 0 ]
26+
then
27+
if ! systemctl is-active --quiet net-rules; then
28+
systemctl start net-rules
29+
fi
30+
fi
31+
fi
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/bin/bash
2+
3+
# Add input as command line parameters for name of the directory to mount
4+
if [ -n "$1" ]; then
5+
LIMIT_OUT_TRAFFIC_MBPS=$1
6+
else
7+
echo "Warning: Specify max value for outbound data traffic in Mbps."
8+
echo "Usage: instance/network/setup.sh <max_bandwidth_mbps>"
9+
exit 1;
10+
fi
11+
12+
INTERFACE=$(ip -br addr show | grep -v '^lo' | awk '{print $1}' | head -n1)
13+
NET_SCRIPTS_PATH="/opt/instance/network"
14+
15+
# Replace _LIMIT_OUT_TRAFFIC_MBPS_ with the value of LIMIT_OUT_TRAFFIC_MBPS in file /opt/network/net-rules.service.template
16+
sed -i "s/_LIMIT_OUT_TRAFFIC_MBPS_/${LIMIT_OUT_TRAFFIC_MBPS}/g" $NET_SCRIPTS_PATH/net-rules.service
17+
sed -i "s/_INTERFACE_/${INTERFACE}/g" $NET_SCRIPTS_PATH/net-rules.service
18+
19+
# Copy the file $NET_SCRIPTS_PATH/net-rules.service to /etc/systemd/system/net-rules.service
20+
cp $NET_SCRIPTS_PATH/net-rules.service /etc/systemd/system/net-rules.service
21+
22+
echo "Enabling net rules service"
23+
systemctl enable net-rules.service
24+
25+
echo "Setting up sync-checker service"
26+
mv $NET_SCRIPTS_PATH/net-sync-checker.service /etc/systemd/system/net-sync-checker.service
27+
28+
# Run every 5 minutes
29+
echo "Setting up sync-checker timer"
30+
mv $NET_SCRIPTS_PATH/net-sync-checker.timer /etc/systemd/system/net-sync-checker.timer
31+
32+
echo "Starting net sync checker timer"
33+
systemctl start net-sync-checker.timer
34+
systemctl enable net-sync-checker.timer

lib/solana/lib/assets/user-data-ubuntu.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ chmod 600 /etc/cdk_environment
2222
echo "SOLANA_CLUSTER=${_SOLANA_CLUSTER_}"
2323
echo "LIFECYCLE_HOOK_NAME=${_LIFECYCLE_HOOK_NAME_}"
2424
echo "ASG_NAME=${_ASG_NAME_}"
25+
echo "LIMIT_OUT_TRAFFIC_MBPS=${_LIMIT_OUT_TRAFFIC_MBPS_}"
2526
} >> /etc/cdk_environment
2627
source /etc/cdk_environment
2728

@@ -127,6 +128,11 @@ systemctl restart amazon-cloudwatch-agent
127128

128129
systemctl daemon-reload
129130

131+
if [[ "$LIMIT_OUT_TRAFFIC_MBPS" -gt 0 ]]; then
132+
echo "Limiting out traffic"
133+
/opt/instance/network/setup.sh
134+
fi
135+
130136
echo "Starting up the node service"
131137
systemctl enable --now node
132138

0 commit comments

Comments
 (0)