Replies: 15 comments 5 replies
-
|
I was thinking of setting up a s3 storage backed static web server and then using export EDGAR_DATA_URL="https://xyz" to serve up the data. If there is an easier way, would be good to know. I am also not sure what is the best way to keep s3 up to date with all the filings. func main() {
initS3() // Initialize S3 client
app := fiber.New()
// Use a route with a wildcard to catch all files under /static
// The wildcard parameter can be accessed via c.Params("*1")
app.Get("/static/*", s3FileHandler)
log.Fatal(app.Listen(":3000"))
} |
Beta Was this translation helpful? Give feedback.
-
|
What is your primary motivation? Bypass the 10 requests per second SEC limit? |
Beta Was this translation helpful? Give feedback.
-
|
Thank you @saul-data for this suggestion and for sharing your implementation approach! TL;DR: EdgarTools Already Supports Cloud Storage (Two Ways!)You actually have two options that work TODAY with existing features:
Option 1: EDGAR_DATA_URL (Your Current Approach) ✅Your workaround using # Point EdgarTools to your S3-backed static server
export EDGAR_DATA_URL="https://your-s3-static-site.example.com"
from edgar import get_filings
filings = get_filings() # Fetches from your S3 endpointWhy this is great for reading:
Option 2: EDGAR_LOCAL_DATA_DIR + FUSE Mount 🆕For writing to cloud storage (like
Then configure EdgarTools: export EDGAR_LOCAL_DATA_DIR=/mnt/edgar-data
export EDGAR_USE_LOCAL_DATA=1Now from edgar.storage import download_filings
# Downloads filings directly to S3 via FUSE mount
download_filings('2025-01-01:2025-01-31')Why this works: EdgarTools uses standard Recommended Hybrid ApproachFor best performance, use both methods together: # 1. Mount S3 for downloading/writing
s3fs edgar-bucket /mnt/edgar-data
export EDGAR_LOCAL_DATA_DIR=/mnt/edgar-data
export EDGAR_USE_LOCAL_DATA=1
# 2. Download filings (writes to S3 via FUSE)
python -c "from edgar.storage import download_filings; download_filings('2025-01-01:')"
# 3. For reading, use the static website endpoint (faster with CDN caching)
export EDGAR_DATA_URL="https://edgar-bucket.s3-website.amazonaws.com"This gives you:
Keeping S3 Up to DateFor incremental updates, you have several options: Option A: Scheduled FUSE Downloads # Cron job or Lambda that runs daily
export EDGAR_LOCAL_DATA_DIR=/mnt/s3-edgar
python -c "from edgar.storage import download_filings; download_filings()" # Downloads latestOption B: Local Download + S3 Sync # Download locally first (faster), then sync
from edgar.storage import download_filings
download_filings('2025-01-01:')
# Then sync to S3
aws s3 sync ~/.edgar/filings s3://edgar-bucket/filingsWhy We're Not Adding Native Cloud SDK SupportWe considered adding native boto3/azure-storage-blob support, but your current approaches are actually better:
Documentation ComingWe've created a tracking issue (Beads: edgartools-5i3) to document these patterns properly:
Target: v4.34.0 or v4.35.0 documentation Summary
Your current solution is solid! The FUSE mount option gives you an additional tool for the download/sync workflow. Questions? Let us know what cloud provider you're using and we can provide more specific guidance. |
Beta Was this translation helpful? Give feedback.
-
|
Been working on this all weekend and it is quite the setup. Looks simple but not really. I would highly recommend adding native S3 compatible support by passing environment variables for keys and secrets. When it downloads, it can write to S3 and it can also fetch from S3. |
Beta Was this translation helpful? Give feedback.
-
Update: Native Cloud Storage Support ComingBased on @saul-data's feedback about the complexity of the FUSE approach ("spent all weekend on setup"), we've created a plan for native cloud storage support via optional dependencies. What's ChangingInstead of requiring FUSE mounts, you'll be able to do: # Install cloud extras
pip install edgartools[s3] # AWS S3
pip install edgartools[gcs] # Google Cloud Storage
pip install edgartools[azure] # Azure Blob Storage
pip install edgartools[cloud] # All providersimport edgar
# Simple one-liner configuration
edgar.use_cloud_storage('s3://my-edgar-bucket/data/')
# Or via environment variable
# export EDGAR_STORAGE_URL=s3://my-edgar-bucket/data/
edgar.use_cloud_storage()
# Now all storage operations use cloud
from edgar.storage import download_filings
download_filings('2025-01-01:2025-01-31') # Downloads directly to S3
# Reading also works from cloud
filing = edgar.find("AAPL", form="10-K")[0]
html = filing.html() # Reads from S3 cache if availableKey Features
S3-Compatible Services (MinIO, Cloudflare R2, etc.)edgar.use_cloud_storage(
's3://my-bucket/',
client_kwargs={'endpoint_url': 'https://minio.example.com'}
)Technical Approach
TimelineTargeting v4.35.0 or v4.36.0 for this feature. Feedback WelcomeDoes this approach address your needs? Any specific cloud provider requirements or use cases we should consider? |
Beta Was this translation helpful? Give feedback.
-
|
My only feedback would have been the part on S3-Compatible Services but you seem to have that covered. We are using R2 and we sometimes need to input the region. This is great thank you! |
Beta Was this translation helpful? Give feedback.
-
🚀 Native Cloud Storage Implementation Available for TestingThe native cloud storage feature is now implemented and available on the Installationpip install "edgartools[s3] @ git+https://github.com/dgunning/edgartools.git@feature/cloud-storage"Or for other providers: pip install "edgartools[gcs] @ git+https://github.com/dgunning/edgartools.git@feature/cloud-storage" # GCS
pip install "edgartools[azure] @ git+https://github.com/dgunning/edgartools.git@feature/cloud-storage" # AzureUsageimport edgar
# AWS S3 (uses default credentials from ~/.aws or environment)
edgar.use_cloud_storage("s3://my-edgar-bucket/")
# Cloudflare R2 (S3-compatible with custom endpoint)
edgar.use_cloud_storage(
"s3://my-bucket/",
client_kwargs={
"endpoint_url": "https://ACCOUNT_ID.r2.cloudflarestorage.com",
"region_name": "auto" # R2 requires this
}
)
# Google Cloud Storage
edgar.use_cloud_storage("gs://my-edgar-bucket/")
# Azure Blob Storage
edgar.use_cloud_storage("az://my-container/edgar/")
# Now reading filings works from cloud storage
filing = edgar.find("0000320193-24-000123")
html = filing.html() # Reads from cloud if availableCurrent LimitationsReading: ✅ Fully supported - Writing: For now, the workflow for populating cloud storage is:
We are tracking cloud write support as a follow-up enhancement. What We Would Like Feedback On
Technical Details
Please try it out and let us know your experience, especially with R2 @saul-data! |
Beta Was this translation helpful? Give feedback.
-
|
Released 4.34.0 with better support for cloud storage https://edgartools.readthedocs.io/en/latest/guides/cloud-storage/ I plan to test and do some additional improvements |
Beta Was this translation helpful? Give feedback.
-
|
This is great, still going through it. May want to remove Goofys from documentation, I don't think it is being maintained anymore. AWS came out with this Rust based alternative: https://github.com/awslabs/mountpoint-s3 |
Beta Was this translation helpful? Give feedback.
-
|
@dgunning - this didn't work for me, got an error message: # AWS S3, Cloudflare R2, MinIO, DigitalOcean Spaces
uv pip install edgartools[s3] |
Beta Was this translation helpful? Give feedback.
-
|
Try uv pip install "edgartools[s3]"I will update the documentation |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Sure will look into that |
Beta Was this translation helpful? Give feedback.
-
|
@dgunning Does this also download to cloud or just download_filings()? # Download all data types (submissions, facts, reference data)
download_edgar_data() |
Beta Was this translation helpful? Give feedback.
-
|
@dgunning - this code, is it uploading all the filings for that day or just the forms specified? I think it is downloading all the forms and I am guessing filtering for the forms specified? from edgar import get_filings, download_filings
filings = get_filings(form=["10-K","10-Q","13F-HR"], filing_date="2025-11-01:")
download_filings(filings=filings, upload_to_cloud=True) |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
In addition to local file storage, it would be great to configure a cloud storage endpoint. To have the same file /folder structure as local storage. This would make it easier to download all the history and keep it in say S3. Even keep S3 up to date with the latest submissions.
Beta Was this translation helpful? Give feedback.
All reactions