Skip to content

Commit 7ede8e5

Browse files
committed
Merge branch 'feature/ocr-confidence-in-ui' into 'develop'
**Text Confidence View for Document Pages** See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!222
2 parents e3e1d7f + 9387cec commit 7ede8e5

File tree

11 files changed

+178
-86
lines changed

11 files changed

+178
-86
lines changed

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,23 @@ SPDX-License-Identifier: MIT-0
77

88
### Added
99

10+
- **Text Confidence View for Document Pages**
11+
- Added support for displaying OCR text confidence data through new `TextConfidenceUri` field
12+
- New "Text Confidence View" option in the UI pages panel alongside existing Markdown and Text views
13+
- Fixed issues with view persistence - Text Confidence View button now always visible with appropriate messaging when content unavailable
14+
- Fixed view toggle behavior - switching between views no longer closes the viewer window
15+
- Reordered view buttons to: Markdown View, Text Confidence View, Text View for better user experience
16+
17+
### Changed
18+
- **Converted text confidence data format from JSON to markdown table for improved readability and reduced token usage**
19+
- Removed unnecessary "page_count" field
20+
- Changed "text_blocks" array to "text" field containing a markdown table with Text and Confidence columns
21+
- Reduces prompt size for assessment service while improving UI readability
22+
- OCR confidence values now rounded to 1 decimal point (e.g., 99.1, 87.3) for cleaner display
23+
- Markdown table headers now explicitly left-aligned using `|:-----|:-----------|` format for consistent appearance
24+
25+
26+
1027
### Fixed
1128

1229

lib/idp_common_pkg/idp_common/appsync/service.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ def _document_to_update_input(self, document: Document) -> Dict[str, Any]:
117117
"Class": page.classification or "",
118118
"ImageUri": page.image_uri or "",
119119
"TextUri": page.parsed_text_uri or page.raw_text_uri or "",
120+
"TextConfidenceUri": page.text_confidence_uri or "",
120121
}
121122
pages_data.append(page_data)
122123

@@ -290,6 +291,7 @@ def _appsync_to_document(self, appsync_data: Dict[str, Any]) -> Document:
290291
page_id=page_id,
291292
image_uri=page_data.get("ImageUri"),
292293
raw_text_uri=page_data.get("TextUri"),
294+
text_confidence_uri=page_data.get("TextConfidenceUri"),
293295
classification=page_data.get("Class"),
294296
)
295297

lib/idp_common_pkg/idp_common/ocr/README.md

Lines changed: 23 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -15,21 +15,21 @@ The service supports three OCR backends, each with different capabilities and us
1515

1616
### 1. Textract Backend (Default - Recommended for Assessment)
1717
- **Technology**: AWS Textract OCR service
18-
- **Confidence Data**: ✅ Full granular confidence scores per text block
18+
- **Confidence Data**: ✅ Full granular confidence scores per text line (displayed as markdown table)
1919
- **Features**: Basic text detection + enhanced document analysis (tables, forms, signatures, layout)
2020
- **Assessment Quality**: ⭐⭐⭐ Optimal - Real OCR confidence enables accurate assessment
2121
- **Use Cases**: Standard document processing, when assessment is enabled, production workflows
2222

2323
### 2. Bedrock Backend (LLM-based OCR)
2424
- **Technology**: Amazon Bedrock LLMs (Claude, Nova) for text extraction
25-
- **Confidence Data**: ❌ No confidence data (empty text_blocks array)
25+
- **Confidence Data**: ❌ No confidence data (displays "No confidence data available from LLM OCR")
2626
- **Features**: Advanced text understanding, better handling of challenging/degraded documents
2727
- **Assessment Quality**: ❌ No confidence data for assessment
2828
- **Use Cases**: Challenging documents where traditional OCR fails, specialized text extraction needs
2929

3030
### 3. None Backend (Image-only)
3131
- **Technology**: No OCR processing
32-
- **Confidence Data**: ❌ Empty confidence data
32+
- **Confidence Data**: ❌ No confidence data (displays "No OCR performed")
3333
- **Features**: Image extraction and storage only
3434
- **Assessment Quality**: ❌ No text confidence for assessment
3535
- **Use Cases**: Image-only workflows, custom OCR integration
@@ -104,36 +104,37 @@ The format varies by OCR backend:
104104
**Textract Backend (with confidence data):**
105105
```json
106106
{
107-
"page_count": 1,
108-
"text_blocks": [
109-
{
110-
"text": "WESTERN DARK FIRED TOBACCO GROWERS' ASSOCIATION",
111-
"confidence": 99.35,
112-
"type": "PRINTED"
113-
},
114-
{
115-
"text": "206 Maple Street",
116-
"confidence": 91.41,
117-
"type": "PRINTED"
118-
}
119-
]
107+
"text": "| Text | Confidence |\n|------|------------|\n| WESTERN DARK FIRED TOBACCO GROWERS' ASSOCIATION | 99.4 |\n| 206 Maple Street | 91.4 |\n| Murray, KY 42071 | 98.7 |"
108+
}
109+
```
110+
111+
The `text` field contains a markdown table with two columns:
112+
- **Text**: The extracted text content (with pipe characters escaped as `\|`)
113+
- **Confidence**: OCR confidence score rounded to 1 decimal point
114+
- Handwriting is indicated with "(HANDWRITING)" suffix in the text column
115+
116+
**Bedrock Backend (no confidence data):**
117+
```json
118+
{
119+
"text": "| Text | Confidence |\n|------|------------|\n| *No confidence data available from LLM OCR* | N/A |"
120120
}
121121
```
122122

123-
**Bedrock/None Backend (no confidence data):**
123+
**None Backend (no OCR):**
124124
```json
125125
{
126-
"page_count": 1,
127-
"text_blocks": []
126+
"text": "| Text | Confidence |\n|------|------------|\n| *No OCR performed* | N/A |"
128127
}
129128
```
130129

131130
### Benefits
132131

133-
- **80-90% token reduction** compared to raw Textract output
134-
- **Preserved assessment data**: Text content, OCR confidence scores, text type (PRINTED/HANDWRITING)
135-
- **Removed overhead**: Geometric data, relationships, block IDs, and verbose metadata
132+
- **85-95% token reduction** compared to raw Textract output (markdown table format is more compact than JSON)
133+
- **Preserved assessment data**: Text content, OCR confidence scores (rounded to 1 decimal), text type (PRINTED/HANDWRITING)
134+
- **Removed overhead**: Geometric data, relationships, block IDs, verbose metadata, and unnecessary JSON syntax
135+
- **Improved readability**: Markdown table format is human-readable in both UI and assessment prompts
136136
- **Cost efficiency**: Significantly reduced LLM inference costs for assessment workflows
137+
- **UI compatibility**: Displays beautifully in the Text Confidence View using existing markdown rendering
137138
- **Automated generation**: Created during initial OCR processing, not repeatedly during assessment
138139

139140
### Usage in Assessment Prompts

lib/idp_common_pkg/idp_common/ocr/service.py

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -622,10 +622,9 @@ def _process_single_page_bedrock(
622622
)
623623

624624
# Generate and store text confidence data
625-
# For Bedrock, we use empty confidence data since LLM OCR doesn't provide real confidence scores
625+
# For Bedrock, we use empty markdown table since LLM OCR doesn't provide real confidence scores
626626
text_confidence_data = {
627-
"page_count": 1,
628-
"text_blocks": [], # Empty - no confidence data available from LLM OCR
627+
"text": "| Text | Confidence |\n|:-----|:------------|\n| *No confidence data available from LLM OCR* | N/A |"
629628
}
630629

631630
text_confidence_key = f"{prefix}/pages/{page_id}/textConfidence.json"
@@ -703,8 +702,10 @@ def _process_single_page_none(
703702
content_type="application/json",
704703
)
705704

706-
# Generate minimal text confidence data (empty)
707-
text_confidence_data = {"page_count": 1, "text_blocks": []}
705+
# Generate minimal text confidence data (empty markdown table)
706+
text_confidence_data = {
707+
"text": "| Text | Confidence |\n|:-----|:------------|\n| *No OCR performed* | N/A |"
708+
}
708709

709710
text_confidence_key = f"{prefix}/pages/{page_id}/textConfidence.json"
710711
s3.write_content(
@@ -807,11 +808,9 @@ def _generate_text_confidence_data(
807808
"""
808809
Generate text confidence data from raw OCR to reduce token usage while preserving essential information.
809810
810-
This method transforms verbose Textract output into a minimal format containing:
811+
This method transforms verbose Textract output into a markdown table format containing:
811812
- Essential text content (LINE blocks only)
812-
- OCR confidence scores
813-
- Text type (PRINTED/HANDWRITING)
814-
- Page count
813+
- OCR confidence scores (rounded to 1 decimal point)
815814
816815
Removes geometric data, relationships, block IDs, and other verbose metadata
817816
that aren't needed for assessment purposes.
@@ -820,29 +819,30 @@ def _generate_text_confidence_data(
820819
raw_ocr_data: Raw Textract API response
821820
822821
Returns:
823-
Text confidence data with ~80-90% token reduction
822+
Text confidence data as markdown table with ~80-90% token reduction
824823
"""
825-
text_confidence_data = {
826-
"page_count": raw_ocr_data.get("DocumentMetadata", {}).get("Pages", 1),
827-
"text_blocks": [],
828-
}
824+
# Start building the markdown table with explicit left alignment
825+
markdown_lines = ["| Text | Confidence |", "|:-----|:-----------|"]
829826

830827
blocks = raw_ocr_data.get("Blocks", [])
831828

832829
for block in blocks:
833830
if block.get("BlockType") == "LINE" and block.get("Text"):
834-
text_block = {
835-
"text": block.get("Text", ""),
836-
"confidence": block.get("Confidence"),
837-
}
831+
text = block.get("Text", "").replace(
832+
"|", "\\|"
833+
) # Escape pipe characters
834+
confidence = round(block.get("Confidence", 0.0), 1)
838835

839-
# Include text type if available (PRINTED vs HANDWRITING)
840-
if "TextType" in block:
841-
text_block["type"] = block["TextType"]
836+
# Add text type indicator if it's handwriting
837+
if block.get("TextType") == "HANDWRITING":
838+
markdown_lines.append(f"| {text} (HANDWRITING) | {confidence} |")
839+
else:
840+
markdown_lines.append(f"| {text} | {confidence} |")
842841

843-
text_confidence_data["text_blocks"].append(text_block)
842+
# Join all lines into a single markdown string
843+
markdown_table = "\n".join(markdown_lines)
844844

845-
return text_confidence_data
845+
return {"text": markdown_table}
846846

847847
def _parse_textract_response(
848848
self, response: Dict[str, Any], page_id: int = None
@@ -1070,15 +1070,16 @@ def _process_converted_page(
10701070
content_type="application/json",
10711071
)
10721072

1073-
# Generate text confidence data
1074-
text_confidence_data = {
1075-
"page_count": 1,
1076-
"text_blocks": [
1077-
{"text": line, "confidence": 99.0, "type": "PRINTED"}
1078-
for line in page_text.split("\n")
1079-
if line.strip()
1080-
],
1081-
}
1073+
# Generate text confidence data as markdown table with explicit left alignment
1074+
markdown_lines = ["| Text | Confidence |", "|:-----|:-----------|"]
1075+
for line in page_text.split("\n"):
1076+
if line.strip():
1077+
# Escape pipe characters in text
1078+
escaped_line = line.replace("|", "\\|")
1079+
markdown_lines.append(f"| {escaped_line} | 99.0 |")
1080+
1081+
markdown_table = "\n".join(markdown_lines)
1082+
text_confidence_data = {"text": markdown_table}
10821083

10831084
text_confidence_key = f"{prefix}/pages/{page_id}/textConfidence.json"
10841085
s3.write_content(

lib/idp_common_pkg/tests/unit/ocr/test_ocr_service.py

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -528,20 +528,22 @@ def test_generate_text_confidence_data(self, mock_textract_response):
528528
service = OcrService()
529529
result = service._generate_text_confidence_data(mock_textract_response)
530530

531-
# Verify structure
532-
assert "page_count" in result
533-
assert "text_blocks" in result
534-
assert result["page_count"] == 1
535-
assert len(result["text_blocks"]) == 2 # Two LINE blocks
536-
537-
# Verify text blocks
538-
assert result["text_blocks"][0]["text"] == "Sample text line 1"
539-
assert result["text_blocks"][0]["confidence"] == 98.5
540-
assert result["text_blocks"][0]["type"] == "PRINTED"
541-
542-
assert result["text_blocks"][1]["text"] == "Sample text line 2"
543-
assert result["text_blocks"][1]["confidence"] == 97.2
544-
assert result["text_blocks"][1]["type"] == "PRINTED"
531+
# Verify structure - now returns markdown table in 'text' field
532+
assert "text" in result
533+
assert "page_count" not in result # Removed in new format
534+
assert "text_blocks" not in result # Replaced with markdown table
535+
536+
# Verify markdown table content
537+
markdown_table = result["text"]
538+
lines = markdown_table.split("\n")
539+
540+
# Check header
541+
assert lines[0] == "| Text | Confidence |"
542+
assert lines[1] == "|:-----|:-----------|"
543+
544+
# Check data rows
545+
assert lines[2] == "| Sample text line 1 | 98.5 |"
546+
assert lines[3] == "| Sample text line 2 | 97.2 |"
545547

546548
def test_parse_textract_response_markdown_success(self):
547549
"""Test parsing Textract response to markdown successfully."""

src/api/schema.graphql

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ type Page @aws_cognito_user_pools @aws_iam {
4646
Class: String
4747
ImageUri: String
4848
TextUri: String
49+
TextConfidenceUri: String
4950
}
5051

5152
type DocumentList @aws_cognito_user_pools @aws_iam {
@@ -125,6 +126,7 @@ input PageInput {
125126
Class: String
126127
ImageUri: String
127128
TextUri: String
129+
TextConfidenceUri: String
128130
}
129131

130132
type CopyToBaselineResponse @aws_cognito_user_pools {

src/ui/src/components/common/map-document-attributes.js

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,11 @@ const mapDocumentsAttributes = (documents) => {
8888
workflowStatus,
8989
duration: getDuration(completionTime, initialEventTime),
9090
sections,
91-
pages,
91+
pages:
92+
pages?.map((page) => ({
93+
...page,
94+
TextConfidenceUri: page.TextConfidenceUri || null,
95+
})) || [],
9296
pageCount,
9397
metering,
9498
evaluationReportUri,

0 commit comments

Comments
 (0)