Option to disable automatic SEC downloads when using cloud storage #627
Replies: 2 comments
-
|
Hi @pablograba, Thank you for the kind words and for this well-articulated feature request! This is a great idea that will benefit production and enterprise users. Current BehaviorAs you noted, this is great for interactive/exploratory work but problematic for controlled production environments. Proposed SolutionAdd a # Current behavior (default, unchanged)
edgar.use_cloud_storage("s3://my-edgar-bucket/")
# Equivalent to:
edgar.use_cloud_storage("s3://my-edgar-bucket/", fallback_to_sec=True)
# New: Strict cloud-only mode
edgar.use_cloud_storage("s3://my-edgar-bucket/", fallback_to_sec=False)
# When filing not in bucket → raises FileNotFoundError
# No network calls to sec.gov
# Guarantees read-only behavior from cloud storageBehavior When
|
| Scenario | Result |
|---|---|
| Filing exists in S3 | Returns filing |
| Filing missing from S3 | Raises FileNotFoundError |
| Network calls to SEC | Never happens |
| Writes to S3 bucket | Never happens |
This gives you exactly what you need:
- Pre-filtered/pre-approved subsets only
- No unexpected network calls
- Guaranteed read-only from cloud storage
- Predictable behavior for testing
Questions
-
Does
fallback_to_sec=Falseclearly convey the intent, or would you prefer something likeoffline_mode=Trueorstrict=True? -
Should the error message include helpful context like:
FileNotFoundError: Filing 0001234567-24-000001 not found in s3://my-edgar-bucket/ (fallback_to_sec=False, no SEC download attempted) -
Any other configuration options you would find useful for production pipelines?
This is a straightforward addition - I will prioritize it for an upcoming release.
Thanks again for the suggestion!
Beta Was this translation helpful? Give feedback.
-
|
Hi @dgunning, Thank you so much for the quick response! I like the proposed solution, adding A couple of small thoughts on the naming (just personal preference, completely happy either way):
As an alternative, something like edgar.use_cloud_storage("s3://...", allow_download_from_sec=False)But honestly, And yes please! Including that little context in the error message would be really helpful during debugging. The suggested wording is perfect. I can’t think of anything else right now. This flag + improved error message already covers the main production/testing needs very nicely. Thank you again for being so open and fast to consider this! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @dgunning (and everyone),
First of all, thank you so much for creating and maintaining edgartools! It’s become a really valuable tool for many of us working with EDGAR data.
I’ve been happily using the cloud storage feature like this: edgar.use_cloud_storage("s3://my-edgar-bucket/")
It works beautifully but I’ve noticed that when a filing isn’t already in the S3 bucket, the library automatically downloads it from the SEC EDGAR site and saves it to the bucket. This is super convenient for interactive / exploratory work!
However, in some production or restricted environments, we’d actually prefer to never make outbound requests to the SEC, even if that means getting a FileNotFoundError when a filing is missing from our bucket. The main reasons are:
Would it be possible to add an option to turn off this auto-download behavior?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions