Skip to content

Commit d313d86

Browse files
committed
Merge branch 'fix/quality-database-table-partitions' into 'develop'
Improved S3 Partition Key Format for Better Date Range Filtering See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!187
2 parents 85912d1 + 5b416b3 commit d313d86

File tree

3 files changed

+20
-16
lines changed

3 files changed

+20
-16
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ SPDX-License-Identifier: MIT-0
6161
- Defend against non-numeric confidence_threshold values in the configuration - avoid float conversion or numeric comparison exceptions in Assessement step
6262
- Prevent creation of empty configuration fields in UI
6363
- Firefox browser issues with signed URLs (PR #14)
64+
- Improved S3 Partition Key Format for Better Date Range Filtering:
65+
- Updated reporting data partition keys to use YYYY-MM format for month and YYYY-MM-DD format for day
66+
- Enables easier date range filtering in analytics queries across different months and years
67+
- Partition structure now: `year=2024/month=2024-03/day=2024-03-15/` instead of `year=2024/month=03/day=15/`
6468

6569
## [0.3.3]
6670

docs/reporting-database.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The `document_evaluations` table contains document-level evaluation metrics:
3737
| false_discovery_rate | double | False discovery rate (0-1) |
3838
| execution_time | double | Time taken to evaluate (seconds) |
3939

40-
This table is partitioned by year, month, day, and document ID.
40+
This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
4141

4242
### Section Evaluations
4343

@@ -56,7 +56,7 @@ The `section_evaluations` table contains section-level evaluation metrics:
5656
| false_discovery_rate | double | Section false discovery rate (0-1) |
5757
| evaluation_date | timestamp | When the evaluation was performed |
5858

59-
This table is partitioned by year, month, day, and document ID.
59+
This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
6060

6161
### Attribute Evaluations
6262

@@ -78,7 +78,7 @@ The `attribute_evaluations` table contains attribute-level evaluation metrics:
7878
| confidence_threshold | string | Confidence threshold used |
7979
| evaluation_date | timestamp | When the evaluation was performed |
8080

81-
This table is partitioned by year, month, day, and document ID.
81+
This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
8282

8383
## Metering Table
8484

@@ -94,7 +94,7 @@ The `metering` table captures detailed usage metrics for each document processin
9494
| number_of_pages | int | Number of pages in the document |
9595
| timestamp | timestamp | When the operation was performed |
9696

97-
This table is partitioned by year, month, day, and document ID.
97+
This table is partitioned by year, month (YYYY-MM format), day (YYYY-MM-DD format), and document ID.
9898

9999
The metering table is particularly valuable for:
100100
- Cost analysis and allocation

lib/idp_common_pkg/idp_common/reporting/save_reporting_data.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -250,8 +250,8 @@ def save_evaluation_results(self, document: Document) -> Optional[Dict[str, Any]
250250
evaluation_date = doc_time
251251
year, month, day = (
252252
doc_time.strftime("%Y"),
253-
doc_time.strftime("%m"),
254-
doc_time.strftime("%d"),
253+
doc_time.strftime("%Y-%m"),
254+
doc_time.strftime("%Y-%m-%d"),
255255
)
256256
logger.info(
257257
f"Using document initial_event_time: {document.initial_event_time} for partitioning"
@@ -263,8 +263,8 @@ def save_evaluation_results(self, document: Document) -> Optional[Dict[str, Any]
263263
evaluation_date = datetime.datetime.now()
264264
year, month, day = (
265265
evaluation_date.strftime("%Y"),
266-
evaluation_date.strftime("%m"),
267-
evaluation_date.strftime("%d"),
266+
evaluation_date.strftime("%Y-%m"),
267+
evaluation_date.strftime("%Y-%m-%d"),
268268
)
269269
else:
270270
logger.warning(
@@ -273,8 +273,8 @@ def save_evaluation_results(self, document: Document) -> Optional[Dict[str, Any]
273273
evaluation_date = datetime.datetime.now()
274274
year, month, day = (
275275
evaluation_date.strftime("%Y"),
276-
evaluation_date.strftime("%m"),
277-
evaluation_date.strftime("%d"),
276+
evaluation_date.strftime("%Y-%m"),
277+
evaluation_date.strftime("%Y-%m-%d"),
278278
)
279279

280280
# Escape document ID by replacing slashes with underscores
@@ -435,8 +435,8 @@ def save_metering_data(self, document: Document) -> Optional[Dict[str, Any]]:
435435
timestamp = doc_time
436436
year, month, day = (
437437
doc_time.strftime("%Y"),
438-
doc_time.strftime("%m"),
439-
doc_time.strftime("%d"),
438+
doc_time.strftime("%Y-%m"),
439+
doc_time.strftime("%Y-%m-%d"),
440440
)
441441
logger.info(
442442
f"Using document initial_event_time: {document.initial_event_time} for partitioning"
@@ -448,8 +448,8 @@ def save_metering_data(self, document: Document) -> Optional[Dict[str, Any]]:
448448
timestamp = datetime.datetime.now()
449449
year, month, day = (
450450
timestamp.strftime("%Y"),
451-
timestamp.strftime("%m"),
452-
timestamp.strftime("%d"),
451+
timestamp.strftime("%Y-%m"),
452+
timestamp.strftime("%Y-%m-%d"),
453453
)
454454
else:
455455
logger.warning(
@@ -458,8 +458,8 @@ def save_metering_data(self, document: Document) -> Optional[Dict[str, Any]]:
458458
timestamp = datetime.datetime.now()
459459
year, month, day = (
460460
timestamp.strftime("%Y"),
461-
timestamp.strftime("%m"),
462-
timestamp.strftime("%d"),
461+
timestamp.strftime("%Y-%m"),
462+
timestamp.strftime("%Y-%m-%d"),
463463
)
464464

465465
# Escape document ID by replacing slashes with underscores

0 commit comments

Comments
 (0)