Skip to content

Feature/account mapping#1457

Open
martrost wants to merge 8 commits intoaws-solutions-library-samples:mainfrom
martrost:feature/account-mapping
Open

Feature/account mapping#1457
martrost wants to merge 8 commits intoaws-solutions-library-samples:mainfrom
martrost:feature/account-mapping

Conversation

@martrost
Copy link
Copy Markdown

@martrost martrost commented Apr 8, 2026

cid-cmd map — Account Mapping

The map command creates an account_map Athena view that enriches your AWS account data with custom taxonomy dimensions (business unit, environment, cost center, etc.). These dimensions can then be used in Cloud Intelligence Dashboards for grouping, filtering, and reporting.

How It Works

The command follows a six-phase workflow:

  1. Discovery — auto-detects the organization_data table (from AWS Organizations data collection) and the target Athena database
  2. Configuration — prompts you to select data sources and define taxonomy dimensions (or reuses a saved configuration)
  3. Data Loading — reads organization data from Athena and optionally loads an external file
  4. Transformation — applies taxonomy rules to produce the enriched account map
  5. Preview — shows a sample of the output and the generated SQL for confirmation
  6. Write — creates the Athena views (account_map, account_map_config, and optionally account_map_file_source)

Usage

# Interactive mode (default) — walks you through configuration
cid-cmd map

# Provide a file with additional taxonomy data
cid-cmd map --file accounts.csv

# Legacy mode — simple account_id/account_name mapping from organization_data
cid-cmd map --simple

# Custom output view name
cid-cmd map --view-name my_account_map

Options

Option Description
--file PATH Path to a CSV, Excel, or JSON file containing additional taxonomy columns to join by account ID
--simple Use legacy mode — creates a basic account_map view with just account_id and account_name from the organization_data table
--view-name TEXT Name of the output Athena view (default: account_map)
-v / --verbose Increase log verbosity (can be repeated: -vv)
-y / --yes Auto-confirm all prompts

Taxonomy Dimension Sources

During interactive configuration you choose one or more data sources for your taxonomy dimensions:

Tags from source table

Extracts values from the hierarchytags column in organization_data. Each selected tag key becomes a column in the output view.

Example: if your accounts are tagged with Environment=Production and CostCenter=Engineering, selecting those tag keys produces environment and cost_center columns.

Additional file (--file)

Joins columns from a CSV/Excel/JSON file by account ID. The file must contain an account ID column; all other selected columns become taxonomy dimensions.

Example file (accounts.csv):

account_id,business_unit,team
123456789012,Retail,Frontend
234567890123,Platform,Data

Split account name

Extracts dimensions by splitting the account_name string on a separator character. You specify the separator and the positional index to extract.

Example: for account name aws-retail-prod, splitting by - at index 1 yields retail, and at index 2 yields prod.

Configuration Persistence

The command saves its configuration as an Athena view (<view_name>_config). On subsequent runs it detects the existing config and offers to reuse it, so you don't have to reconfigure every time.

Views Created

View Purpose
account_map The main enriched account mapping view used by dashboards
account_map_config Stores the mapping configuration for reuse
account_map_file_source (Only when --file is used) Stores the file data as an Athena view for joins

Prerequisites

  • An organization_data table in Athena (typically created by the CID data collection CFN stack)
  • Athena workgroup with a configured query result location
  • Appropriate IAM permissions for Athena and Glue operations

Examples

Create an account map using AWS Organization tags:

cid-cmd map
# → Select "Tags from source table"
# → Pick tag keys like Environment, CostCenter, Team
# → Preview and confirm

Create an account map enriched with data from a spreadsheet:

cid-cmd map --file ~/Downloads/account_taxonomy.xlsx
# → Select "Additional file" and/or "Tags from source table"
# → Pick columns from the file to use as dimensions
# → Preview and confirm

Re-run with saved configuration (no prompts):

cid-cmd map -y

…le, and file discovery

Add AutoDiscovery class to account_mapper_helpers.py with intelligent
auto-selection logic and interactive prompts:

- discover_databases: Auto-select single database or prompt for multiple
- discover_tables: Auto-select single table or prompt for multiple
- discover_tag_keys: Query and parse hierarchytags to extract available keys
- discover_account_id_column: Pattern matching for common account ID columns
- prompt_file_selection: InquirerPy file picker with extension filtering

Implements requirements 3.4-3.11, 4.5-4.7, 9.1-9.2, 16.2-16.3, 16.10, 16.12

Phase 3 of account mapper refactor complete.
Refactor account mapper into modular classes (AutoDiscovery, ConfigManager,
DataLoader, TransformEngine, AthenaWriter, UnifiedWorkflow) replacing the
monolithic map_config.py. Key improvements:

- Add --simple flag for backwards compatibility
- Use managementaccountid instead of payer_id from org table
- Add payer naming feature with CASE WHEN SQL generation
- Add file source view lifecycle management (create, reuse, cleanup)
- Add name_split dimension support with split_part() in Athena
- Add checkbox retry-on-empty-selection UX helper
- Add dimension name sanitization (spaces to underscores)
- Sort taxonomy dimension columns alphabetically in output
- Persist and reload payer_names from config view
- Add stale file_source_view cleanup when no file dims exist
Remove unused functions, methods, and module-level state:
- _discover_database/_discover_table wrappers (replaced by discover_source)
- extract_payer_info (replaced by payer_names config + CASE WHEN SQL)
- _get_config_hash, clear_athena_cache, _athena_data_cache (never called)
- Account.add_tag/get_tags/get_tag/get_business_unit (never called)
- DataLoader.get_available_columns (never called)
- DataLoader.auto_detect_account_column (duplicate of AutoDiscovery)
- DataLoader.validate_account_ids and validate_ids param (never used)
- Account._account_tags init field (no longer needed)
@martrost
Copy link
Copy Markdown
Author

martrost commented Apr 9, 2026

Added the docs/cid-cmd update so the documentation is available as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant