Replies: 3 comments 1 reply
-
|
**Replies to your questions **
Please let me know if you need more input on this. I'm really looking forward to seeing where this leads. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, I think in the code I did so far I used my labels not yours but I can check. Also can you review this PR #573 .. this also does standardization. |
Beta Was this translation helpful? Give feedback.
-
|
For what it's worth here's the code I use to assign sections to the balance sheet. The mapping of section to tags should be pretty obvious. My data sits in a class that's basically a list of dataclass rows - so that's self.tbl in this code. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
This discussion seeks feedback on Phase 0 of the Context-Aware XBRL Standardization initiative. Phase 0 is the critical foundation that must be completed before we can implement context-aware disambiguation.
The Problem
Analysis revealed a massive coverage gap in EdgarTools XBRL standardization:
Before implementing context-aware disambiguation for the 214 ambiguous tags, we must first expand our base mapping coverage.
Proposed Approach: Hybrid Taxonomy
We recommend Option C (Hybrid) - use @mpreiss9's proven taxonomy internally, with user-friendly display names for backwards compatibility.
New Data Structures
1.
gaap_mappings.json(2,343 mappings){ "AccountsPayableCurrentAndNoncurrent": { "standard_tags": ["TradePayables", "OtherOperatingNonCurrentLiabilities"], "ambiguous": true, "comment": "Curr/NonCurr ambiguity" }, "AccountsPayableCurrent": { "standard_tags": ["TradePayables"], "ambiguous": false } }2.
exclusions.py(276 tags)3.
display_names.json(backwards compatibility){ "TradePayables": "Accounts Payable", "TradeReceivables": "Accounts Receivable", "PlantPropertyEquipmentNet": "Property, Plant and Equipment", "CostOfGoodsAndServicesSold": "Cost of Revenue" }4. Reverse Index (O(1) lookup)
Built at runtime from gaap_mappings.json for fast company concept → standard concept resolution.
Deliverables
gaap_mappings.json- 2,343 GAAP tag mappingsexclusions.py- 276 "DropThisItem" tagsdisplay_names.json- User-friendly display namesreverse_index.py- O(1) lookup structurescripts/import_mpreiss9_mappings.py- Migration scriptMappingStoreto use new structuresQuestions for Discussion
For @mpreiss9:
Taxonomy Naming: Is the hybrid approach (your taxonomy internally + display names) acceptable? Or would you recommend full adoption of your naming conventions?
Multi-mapping Format: Your CSV uses colon separator (e.g.,
TagA:TagB) for ambiguous tags. Should we preserve this format or convert to a list structure in JSON?DropThisItem Tags: Are there any tags in the 276 exclusions that might actually be useful in certain contexts? Should exclusions be configurable?
Priority Order: When a tag maps to multiple standard concepts (like
AccountsPayableCurrentAndNoncurrent→TradePayables:OtherOperatingNonCurrentLiabilities), does the order indicate priority/preference?Deprecated Tags: Are any of the 2,343 GAAP mappings for deprecated/obsolete tags we should flag?
For the Community:
Display Names: Are there any standard concept names in EdgarTools today that you'd prefer we keep vs. adopting mpreiss9's naming?
Breaking Changes: Would a minor breaking change in output names be acceptable for v6.0.0 if it means better standardization?
Timeline
Related
docs-internal/planning/future-enhancements/context-aware-standardization.mdcc: @mpreiss9 - Your input on this approach would be invaluable given your production experience with 390+ companies. Thank you for sharing your mapping data!
Beta Was this translation helpful? Give feedback.
All reactions