feat: automated semantic mapping #85

zhuohangu · 2025-12-18T23:01:12Z

sem_map.py implements an automated semantic mapping pipeline that turns unstructured {id, text} documents into structured fields based on NL “concepts.” It uses LLM for dynamic schema generation and Palimpzest to execute semantic extraction.

Core Functionality

Dynamic schema generation: Converts concept phrases into typed {name, type, desc} columns with two strategies:
- FLAT: Direct mapping of concepts to canonical fields.
- HIERARCHY_FIRST: May decompose a concept into more granular sub-fields when useful.
Semantic Extraction (sem_map): Runs pz.sem_map over the dataset.
Tagification & Stats (expand_sem_map_results_to_tags): A post-processing step that "flattens" extracted entities into boolean tags. It also calculates selectivity statistics for each tag.
The script also includes a runnable usage example that loads text data, generates schemas, uses pz to execute semantic extraction, and tagifies the resulting columns.

zhuohangu added 5 commits December 16, 2025 15:51

Baseline import: auto_retrieval from zg-dev-index-mgmt

ec652d8

feat: Semantic Mapping

eb3fc1d

feat: Introduce Semantic Mapping with Hierarchy and Flat Strategies

769b7c9

feat: semantic mapping + tag expansion

5839668

Include example usage.

457be27

zhuohangu requested a review from mdr223 December 18, 2025 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: automated semantic mapping #85

feat: automated semantic mapping #85

Uh oh!

zhuohangu commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: automated semantic mapping #85

Are you sure you want to change the base?

feat: automated semantic mapping #85

Uh oh!

Conversation

zhuohangu commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants