Skip to content

Commit 8024082

Browse files
authored
Script to get sizes of all s3 buckets (#6484)
This code was used when recently analyzing our S3 buckets to see which ones were the largest and may have room for optimization (resulting in us setting a new lifecycle policy: meta-pytorch/pytorch-gha-infra#626) What it does: * It gets the size of each bucket using CloudWatch, which is much faster than checking directly in S3. The latter iterates over every item in the bucket and can take hours to complete. * It sorts the output to show the largest buckets. Checking in the code in case we want to use it again Output looks like this (dummy data below): ``` > python bucket_size_metrics.py 🔍 Fetching S3 bucket list... ✅ Found 7 buckets. Fetching storage metrics... (1/7): Bucket marketing-assets 💿 Storage used: 100.0 GB (2/7): Bucket devops-tools 💿 Storage used: 50.0 GB (3/7): Bucket software-releases 💿 Storage used: 200.0 GB (4/7): Bucket analytics-reports 🫥 No metrics found for bucket: analytics-reports. Is it unused? (5/7): Bucket product-images 💿 Storage used: 10.0 GB (6/7): Bucket company-videos 💿 Storage used: 500.0 GB (7/7): Bucket sales-documents 💿 Storage used: 20.0 GB 📌 Storage Usage Summary: Size (GB) Bucket Name ---------------------------- 500.00 company-videos 200.00 software-releases 100.00 marketing-assets 50.00 devops-tools 20.00 sales-documents 10.00 product-images 0.00 analytics-reports ```
1 parent e2efc38 commit 8024082

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
#!/usr/bin/env python3
2+
3+
"""
4+
This script retrieves the storage size of all S3 buckets in your AWS account
5+
using CloudWatch metrics. It lists the buckets and their respective sizes
6+
in gigabytes (GB).
7+
8+
Requirements:
9+
Before running this script, ensure you have the boto3 library installed
10+
and your AWS credentials are configured.
11+
12+
Installation:
13+
pip install boto3
14+
"""
15+
16+
import os
17+
import sys
18+
from datetime import datetime, timedelta, timezone
19+
20+
import boto3
21+
22+
23+
def main():
24+
print("🔍 Fetching S3 bucket list...")
25+
26+
# Create S3 and CloudWatch clients
27+
s3 = boto3.client("s3")
28+
29+
try:
30+
# Get buckets
31+
response = s3.list_buckets()
32+
buckets = [bucket["Name"] for bucket in response["Buckets"]]
33+
except Exception as e:
34+
print(f"❌ Error accessing AWS: {str(e)}")
35+
sys.exit(1)
36+
37+
if not buckets:
38+
print("🚫 No S3 buckets found.")
39+
sys.exit(0)
40+
41+
bucket_count = len(buckets)
42+
print(f"✅ Found {bucket_count} buckets. Fetching storage metrics...")
43+
44+
# Store results
45+
results = []
46+
47+
# Process each bucket
48+
for i, bucket in enumerate(buckets, 1):
49+
print(f"({i}/{bucket_count}): Bucket {bucket}")
50+
51+
# Get bucket region
52+
try:
53+
location = s3.get_bucket_location(Bucket=bucket)
54+
region = location.get("LocationConstraint")
55+
except Exception as e:
56+
print(f" ⚠️ Error getting region for {bucket}: {str(e)}")
57+
# Assume it's in us-east-1
58+
region = "us-east-1"
59+
60+
# Create CloudWatch client for the bucket's region
61+
cloudwatch = boto3.client("cloudwatch", region_name=region)
62+
63+
# Get metrics
64+
try:
65+
response = cloudwatch.get_metric_statistics(
66+
Namespace="AWS/S3",
67+
MetricName="BucketSizeBytes",
68+
Dimensions=[
69+
{"Name": "BucketName", "Value": bucket},
70+
{"Name": "StorageType", "Value": "StandardStorage"},
71+
],
72+
StartTime=datetime.now(timezone.utc) - timedelta(days=1),
73+
EndTime=datetime.now(timezone.utc),
74+
Period=86400,
75+
Statistics=["Average"],
76+
)
77+
78+
if response["Datapoints"]:
79+
size = response["Datapoints"][0]["Average"]
80+
else:
81+
size = 0
82+
except Exception as e:
83+
print(f" ⚠️ Error getting metrics for {bucket}: {str(e)}")
84+
size = 0
85+
86+
if size <= 0:
87+
print(f" 🫥 No metrics found for bucket: {bucket}. Is it unused?")
88+
else:
89+
size_gb = size / 1073741824 # Convert bytes to GB
90+
formatted_size = f"{size_gb:.1f}"
91+
print(f" 💿 Storage used: {formatted_size} GB")
92+
93+
size_gb = size / 1073741824 # Convert bytes to GB
94+
results.append((size_gb, bucket))
95+
96+
# Sort by size in descending order
97+
results.sort(reverse=True)
98+
99+
# Display results
100+
print("\n\n📌 Storage Usage Summary:")
101+
print("Size (GB)\tBucket Name")
102+
print("----------------------------")
103+
for size_gb, bucket in results:
104+
print(f"{size_gb:9.2f}\t{bucket}")
105+
106+
print("\n✅ Done!")
107+
108+
109+
if __name__ == "__main__":
110+
main()

0 commit comments

Comments
 (0)