aws-solutions-library-samples
diff --git a/‎CHANGELOG.md‎
Lines changed: 16 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/reporting-database.md‎
Lines changed: 123 additions & 4 deletions b/‎docs/reporting-database.md‎
Lines changed: 123 additions & 4 deletions
@@ -5,6 +5,8 @@ SPDX-License-Identifier: MIT-0
 
 ## [Unreleased]
 
+### Added
+
 ### Fixed
 
 
@@ -41,6 +43,20 @@ SPDX-License-Identifier: MIT-0
   - **Backward Compatibility**: Maintains same interface as standard assessment service with seamless migration path
   - **Enhanced Documentation**: Comprehensive documentation in `docs/assessment.md` and example notebooks for both standard and granular approaches
 
+- **Reporting Database now has Document Sections Tables to enable querying across document fields**
+  - Added comprehensive document sections storage system that automatically creates tables for each section type (classification)
+  - **Dynamic Table Creation**: AWS Glue Crawler automatically discovers new section types and creates corresponding tables (e.g., `invoice`, `receipt`, `bank_statement`)
+  - **Configurable Crawler Schedule**: Support for manual, every 15 minutes, hourly (default), or daily crawler execution via `DocumentSectionsCrawlerFrequency` parameter
+  - **Partitioned Storage**: Data organized by section type and date for efficient querying with Amazon Athena
+
+- **Partition Projections for Evaluation and Metering tables**
+  - **Automated Partition Management**: Eliminates need for `MSCK REPAIR TABLE` operations with projection-based partition discovery
+  - **Performance Benefits**: Athena can efficiently prune partitions based on date ranges without manual partition loading
+  - **Backward Compatibility Warning**: The partition structure change from `year=2024/month=03/day=15/` to `date=2024-03-15/` means that data saved in the evaluation or metering tables prior to v0.3.7 will not be visible in Athena queries after updating. To retain access to historical data, you can either:
+    - Manually reorganize existing S3 data to match the new partition structure
+    - Create separate Athena tables pointing to the old partition structure for historical queries
+
+
 - **Optimize the classification process for single class configurations in Pattern-2**
   - Detects when only a single document class is defined in the configuration
   - Automatically classifies all document pages as that single class
 
@@ -1 +1 @@
-0.3.7-beta
+0.3.7-gamma
@@ -12,6 +12,9 @@ The GenAI IDP Accelerator includes a comprehensive reporting database that captu
   - [Section Evaluations](#section-evaluations)
   - [Attribute Evaluations](#attribute-evaluations)
 - [Metering Table](#metering-table)
+- [Document Sections Tables](#document-sections-tables)
+  - [Dynamic Section Tables](#dynamic-section-tables)
+  - [Crawler Configuration](#crawler-configuration)
 - [Using the Reporting Database with Athena](#using-the-reporting-database-with-athena)
   - [Sample Queries](#sample-queries)
   - [Creating Dashboards](#creating-dashboards)
@@ -37,7 +40,7 @@ The `document_evaluations` table contains document-level evaluation metrics:
 | false_discovery_rate | double | False discovery rate (0-1) |
 | execution_time | double | Time taken to evaluate (seconds) |
 
-This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
+This table is partitioned by date (YYYY-MM-DD format).
 
 ### Section Evaluations
 
@@ -56,7 +59,7 @@ The `section_evaluations` table contains section-level evaluation metrics:
 | false_discovery_rate | double | Section false discovery rate (0-1) |
 | evaluation_date | timestamp | When the evaluation was performed |
 
-This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
+This table is partitioned by date (YYYY-MM-DD format).
 
 ### Attribute Evaluations
 
@@ -78,7 +81,7 @@ The `attribute_evaluations` table contains attribute-level evaluation metrics:
 | confidence_threshold | string | Confidence threshold used |
 | evaluation_date | timestamp | When the evaluation was performed |
 
-This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
+This table is partitioned by date (YYYY-MM-DD format).
 
 ## Metering Table
 
@@ -94,14 +97,65 @@ The `metering` table captures detailed usage metrics for each document processin
 | number_of_pages | int | Number of pages in the document |
 | timestamp | timestamp | When the operation was performed |
 
-This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
+This table is partitioned by date (YYYY-MM-DD format).
 
 The metering table is particularly valuable for:
 - Cost analysis and allocation
 - Usage pattern identification
 - Resource optimization
 - Performance benchmarking across different document types and sizes
 
+## Document Sections Tables
+
+The document sections tables store the actual extracted data from document sections in a structured format suitable for analytics. These tables are automatically discovered by AWS Glue Crawler and are organized by section type (classification).
+
+### Dynamic Section Tables
+
+Document sections are stored in dynamically created tables based on the section classification. Each section type gets its own table (e.g., `invoice`, `receipt`, `bank_statement`, etc.) with the following characteristics:
+
+**Common Metadata Columns:**
+| Column | Type | Description |
+|--------|------|-------------|
+| section_id | string | Unique identifier for the section |
+| document_id | string | Unique identifier for the document |
+| section_classification | string | Type/class of the section |
+| section_confidence | double | Confidence score for the section classification |
+| timestamp | timestamp | When the document was processed |
+
+**Dynamic Data Columns:**
+The remaining columns are dynamically inferred from the JSON extraction results and vary by section type. Common patterns include:
+- Nested JSON objects are flattened using dot notation (e.g., `customer.name`, `customer.address.street`)
+- Arrays are converted to JSON strings
+- Primitive values (strings, numbers, booleans) are preserved as their native types
+
+**Partitioning:**
+Each section type table is partitioned by date (YYYY-MM-DD format) for efficient querying.
+
+**File Organization:**
+```
+document_sections/
+├── invoice/
+│   └── date=2024-01-15/
+│       ├── doc-123_section_1.parquet
+│       └── doc-456_section_3.parquet
+├── receipt/
+│   └── date=2024-01-15/
+│       └── doc-789_section_2.parquet
+└── bank_statement/
+    └── date=2024-01-15/
+        └── doc-abc_section_1.parquet
+```
+
+### Crawler Configuration
+
+The AWS Glue Crawler automatically discovers new section types and creates corresponding tables. The crawler can be configured to run:
+- Manually (on-demand)
+- Every 15 minutes
+- Every hour (default)
+- Daily
+
+This ensures that new section types are automatically available for querying without manual intervention.
+
 ## Using the Reporting Database with Athena
 
 Amazon Athena provides a serverless query service to analyze data directly in Amazon S3. The reporting database tables are automatically registered in the AWS Glue Data Catalog, making them immediately available for querying in Athena.
@@ -190,6 +244,71 @@ ORDER BY
   avg_tokens_per_page DESC;
 ```
 
+**Document sections analysis by type:**
+```sql
+-- Query invoice sections for customer analysis
+SELECT 
+  document_id,
+  section_id,
+  "customer.name" as customer_name,
+  "customer.address.city" as customer_city,
+  "total_amount" as invoice_total,
+  date
+FROM 
+  invoice
+WHERE 
+  date BETWEEN '2024-01-01' AND '2024-01-31'
+ORDER BY 
+  date DESC;
+```
+
+**Section processing volume by date:**
+```sql
+-- Count sections processed by type and date
+SELECT 
+  date,
+  section_classification,
+  COUNT(*) as section_count,
+  COUNT(DISTINCT document_id) as document_count
+FROM (
+  SELECT date, section_classification, document_id FROM invoice
+  UNION ALL
+  SELECT date, section_classification, document_id FROM receipt
+  UNION ALL
+  SELECT date, section_classification, document_id FROM bank_statement
+)
+GROUP BY 
+  date, section_classification
+ORDER BY 
+  date DESC, section_count DESC;
+```
+
+**Date range queries with new partition structure:**
+```sql
+-- Efficient date range query using single date partition
+SELECT 
+  COUNT(*) as total_documents,
+  AVG(accuracy) as avg_accuracy
+FROM 
+  document_evaluations
+WHERE 
+  date BETWEEN '2024-01-01' AND '2024-01-31';
+
+-- Monthly aggregation
+SELECT 
+  SUBSTR(date, 1, 7) as month,
+  COUNT(*) as document_count,
+  AVG(accuracy) as avg_accuracy
+FROM 
+  document_evaluations
+WHERE 
+  date >= '2024-01-01'
+GROUP BY 
+  SUBSTR(date, 1, 7)
+ORDER BY 
+  month;
+```
+
 ### Creating Dashboards
 
 For more advanced visualization and dashboarding: