Replies: 9 comments
-
Thank You for This Detailed Analysis!@mpreiss9 - Thank you for taking the time to explore EdgarTools' standardization infrastructure and for this thorough feature request. Your observations are spot-on, and I appreciate the level of detail you've provided about the specific questions you have. Current Status: Production-Ready Feature, Missing DocumentationYou've discovered a production-ready feature that lacks user-facing documentation. The XBRL standardization system is: ✅ Fully implemented in The gap you've identified is 100% valid - while the feature exists and works well, we haven't created comprehensive user-facing documentation for customization. Your Specific Questions (All Valid!)I'll address each of your questions: 1. Custom Mapping Files - Where Do They Reside?Current Implementation:
Path Configuration: Currently hardcoded to package directory. We should document this and potentially add user-configurable paths in a future enhancement. 2. Ambiguous XBRL Tags (e.g., CurrentAndNoncurrent)Great catch on the ~200 ambiguous tags! The system handles this through: Priority-based resolution:
Example (from {
"Automotive Revenue": [
"tsla_AutomotiveRevenue",
"tsla_AutomotiveSales"
]
}3. Company-Specific MappingsSeparate files per company is the recommended approach:
See 4. StandardConcept Enum - Why Not Just JSON?The enum serves IDE autocomplete and type safety: from edgar.xbrl.standardization import StandardConcept
# IDE autocomplete works here:
revenue_concept = StandardConcept.REVENUE.valueHowever, the JSON is the source of truth for mappings. The enum provides a curated set of standard concepts with semantic meaning, while JSON allows unlimited custom mappings. Restrictions on modifying the enum: It's Python code in the package, so users shouldn't edit it directly. For custom standard concepts, the JSON approach is preferred. We may explore JSON-based StandardConcept loading in the future. Our Commitment: Comprehensive Documentation in v4.29.0I've created Beads issue edgartools-i5s (linked to this GitHub issue) to track comprehensive documentation: Target: v4.29.0 (next minor release) Documentation ScopeWe'll create a new user-facing guide: Sections:
We'll also add cross-references in Questions for YouTo make this documentation as useful as possible, I'd love to understand your use case better:
Your feedback will help us prioritize which examples and edge cases to cover in depth. Next Steps
Thank you again for this excellent issue report. The level of detail you provided makes it clear you've done a thorough investigation of the codebase. We're excited to make this powerful feature more accessible through comprehensive documentation! Feel free to share your specific use case or any clarifying questions in the meantime. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks so much for always responding to users in such a thoughtful way. It's a pleasure seeing this package evolve.
Now to your questions:
Balance Sheet Current/Noncurrent ambiguity Some companies use this as a total, some as a line item There are also a few Interest Income vs Interest Expense vs Non-opearating income ambiguities. All of these are ambiguous either directly in the name or through my observations of how the tags have been used in different filings. |
Beta Was this translation helpful? Give feedback.
-
|
Can you explain how your ambiguous tags work? I am trying to see how far we can have edgartools assist with custom standardization without being too user specific |
Beta Was this translation helpful? Give feedback.
-
|
This is going to get pretty complicated to explain, but I'll try. My tag map is reversed from yours - I have xbrl tags as a primary key (since they are unique) and then standard tags attached. So an xbrl tag can be mapped to more than one standard tag. Let's take the balance sheet since it has the bulk of the problem. First I assign standard tags to all items in the statement (whether dataframe or other structure, but assumed to be in order as filed). If an item isn't in the map yet, I log it as described before. I have a dictionary with balance sheet sections as keys (using a standard tag) and all the possible standard tags for that section as a set attached to the key. So for example Current Assets would be a section key and all possible standard tags that belong in that section are in a set. So then, working backwards I assign a section name to each item in the statement. Ideally there are no gaps due to missing standard tags. Then again working backwards up the balance sheet for any item that has more than one standard tag I look to see which of the standard tags matches what should be in that section (using the dictionary just described). I then remove the incorrect ones from that item. Working backwards is helpful because the subtotals are the trigger for a new section. That handles most of them. It all works on the assumption that filers don't scatter items around at random - that we get the rows in order, which is almost always true. (Very occasionally I've seen a netting item for receivables stuck to the bottom of a balance sheet filing, which is a mess). There is one special case in the balance sheet where different filers will use an xbrl tag either as a line item or as a total (Noncurrent liabilities). That one has to be dealt with first before doing the above process. For that one I look at the label field to see if the words Other or Total are used to help decide if detail or total respectively. If that's not helpful, I look at total liabilities minus current liabilities and if it matches the item in question, it's a noncurrent liability total, otherwise I assume it isn't. In this one respect the Income Statement is easier (in everything else it's a nightmare) in that we only have to deal with non-operating income/Interest income/interest expense ambiguity. Sometimes using the sign is enough of a clue, but better is if there's a footnore that decomposes the ambiguious item. It's one of many reasons I want good footnote data. |
Beta Was this translation helpful? Give feedback.
-
Research Update: Comparing Standardization ApproachesThank you @mpreiss9 for sharing your detailed methodology! I've completed comprehensive research comparing your approach with EdgarTools' current system. Key Finding: The Approaches Are Complementary ✨Your method and EdgarTools both have strengths, and they work beautifully together: EdgarTools Strengths:
Your Method's Strengths:
What We LearnedYour approach offers 7 specific innovations that could enhance EdgarTools:
Current StatusDocumentation ✅:
Research ✅:
Recommended Next StepsOption A: Keep current system as-is
Option B: Incremental enhancements
Option C: Full hybrid implementation
Questions for You
Your real-world experience with 200+ companies would be invaluable for guiding these enhancements! Research Documents:
|
Beta Was this translation helpful? Give feedback.
-
|
✅ Documentation Request Complete The original request for XBRL standardization customization documentation has been fully completed: 📚 Deliverables:
🎯 Issue Status: Closing as complete 💬 Continuing the Conversation:
Thank you @mpreiss9 for the detailed methodology you shared - it was invaluable for our research! |
Beta Was this translation helpful? Give feedback.
-
|
I'm attaching my .csv mapping files. A few caveats:
|
Beta Was this translation helpful? Give feedback.
-
|
A couple more things. You've made me think a little more about how mapping might change for different users (something I didn't consider for my own work). There are really two reasons to map an xbrl tag to a standard tag. The first reason is to take what is exactly the same kind of fact coded different ways into a common tag (for example the seemingly countless revenue tag flavors). The second reason is often overlooked but very important - a user may want to consolidate multiple kinds of facts into a single concept because the distinction is immaterial to them. For example, I gave you a pretty granular mapping, distinguishing between tax liabilities, retirement liabilities and other non-operating liabilities. Another user might just collapse all those xbrl tags into a single non-operating liability tag. This is why a flexible mapping scheme is so important. |
Beta Was this translation helpful? Give feedback.
-
User-Configurable XBRL Standardization - Design ProposalOverviewThis proposal outlines a new architecture for XBRL financial statement standardization in EdgarTools that gives users full control over how financial data is mapped and aggregated, while maintaining EdgarTools' commitment to accuracy, robustness, and ease of use. Key Insight: There are two fundamentally different reasons users map XBRL tags:
Different users need different levels of detail. A researcher analyzing tax strategies wants granular breakdowns, while someone building portfolio screens just needs high-level summaries. The ProblemCurrent State: EdgarTools uses a fixed standardization mapping that works for many use cases but doesn't accommodate:
Community Contribution: @mpreiss9 shared production mapping files that demonstrate a sophisticated approach with context-aware resolution and flexible granularity. These files represent real-world validation of what users need. Proposed Solution: 7-Stage Pipeline ArchitectureWe propose thinking of XBRL processing as a data pipeline with clear transformation stages: Pipeline StagesStage 1-2: Parsing & Building (EdgarTools maintains)
Stage 3: Base Standardization (EdgarTools maintains)
Stage 4: Granularity Transformation (User configurable - NEW)
Stage 5: Context-Aware Resolution (EdgarTools + User config)
Stage 6: Period Selection (EdgarTools maintains)
Stage 7: Rendering (EdgarTools maintains)
Three Levels of User CustomizationLevel 1: Choose a Profile (Easiest) # Pick from built-in profiles
statement = xbrl.statements.balance_sheet(granularity='detailed')Level 2: Custom Profile File (Power users) # Provide your own mapping CSV
profile = Profile.from_csv('my_mappings.csv')
statement = xbrl.statements.balance_sheet().with_profile(profile)Level 3: Programmatic Transformation (Maximum control) # Compose custom transformations
custom = (statement
.with_granularity('detailed')
.with_profile('my_rollups.json')
.apply_custom_rules(my_function))Design Principles✅ EdgarTools provides infrastructure (parsing, validation, rendering) Implementation RoadmapPhase 1-2: Foundation (v4.30.0 - v4.31.0)
Phase 3-4: Context Resolution (v4.31.0 - v4.32.0)
Phase 5: Logging & Observability (v5.0.0)
Phase 6: User-Configurable Granularity (v5.1.0)
Real-World Validation@mpreiss9's contribution includes:
This validates that:
Questions for Community
Example Use CasesFinancial Analyst (Level 1): # Just wants more detail than default
balance_sheet = xbrl.statements.balance_sheet(granularity='detailed')Researcher (Level 2): # Custom mappings for tax research
profile = Profile.from_csv('tax_research_mappings.csv')
balance_sheet = xbrl.statements.balance_sheet().with_profile(profile)Quant Fund (Level 3): # Programmatic transformations for portfolio screening
screen = (xbrl.statements.balance_sheet()
.with_granularity('summarized')
.apply_sector_adjustments()
.calculate_screening_ratios())Next Steps
DocumentationDetailed planning documents available:
We'd love your feedback! Please comment with:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Feature Category
Problem Statement
From examining your code, it seems like you have the ability for the user to provide tag mapping from xbrl to standardized tags. However, I can't find much in the way of documentation on how to use this and there's even some question in my mind about which code is actually being used. I'm referring to xbrl.standardization.py and entity.mappings_loader.py and more. It looks like there was an intention to allow users to supply their own .json mapping files and from there's a suggestion that the StandardConcept class could also be modified (although this would be an awkward way to do it vs. gathering from the .json file).
Who would benefit from this feature?
Proposed Solution
Document the preferred approach for a user to implement their own mapping scheme. Clarify where the json files are supposed to reside (where are the path configurations made?). Make sure to indicate restrictions on the mapping scheme (for example what to do with ambiguous xbrl tags that could map two ways? There are quite a few that include the substring 'CurrentAndNoncurrent'. I've identifiied over 200 ambiguous xbrl tags). Document how company specific mapping should be done. Again, the code is unclear, in one case suggesting it all goes into the same mapping json and in another case suggesting a file per company. Similarly, what restrictions are there in modifying the StandardConcept Enum? (This is beyond the scope of this feature request but you should consider providing a cleaner way of injecting the StandardConcept data than an Enum class in a larger module. Still not clear to me why the json isn't sufficient)
Use Case Example
Implementation Considerations
Complexity Level:
Backwards Compatibility:
Additional Context
Related Issues/Features:
Feature requests are evaluated based on EdgarTools' core principles: Simple yet powerful, accurate financials, beginner-friendly, and joyful UX.
Beta Was this translation helpful? Give feedback.
All reactions