An async LangGraph-based agent that:
- Extracts named entities from a video transcript
- Verifies canonical names for each entity using web search context
- Iteratively replaces entity mentions in the transcript with their canonical names
- Reviews each replacement and controls looping with a max of 2 passes per entity
- Python: 3.13+
- OS: macOS/Linux/Windows
git clone <your-fork-or-repo-url>
cd enrichment-agent
This project uses uv with a lockfile (uv.lock
). Install uv and sync deps from the lock:
# Install uv (see https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or: pipx install uv
# From the project root, create/sync the project environment from pyproject + uv.lock
uv sync
Notes:
uv sync
will create and manage a project-local virtualenv automatically.- Dependencies are declared in
pyproject.toml
and pinned byuv.lock
.
Copy .env.example
to .env
and fill in values.
uv run main.py
- By default,
main.py
runs the agent with a sample transcript string. Editmain.py
to provide your own transcript input as needed. - To render a Mermaid PNG diagram of the graph, call
DemoEnrichmentAgent.draw_graph()
inmain.py
.
The agent is implemented in src/enrichment_agent/agent.py
as DemoEnrichmentAgent
using LangGraph.
transcript_text: str
— original transcript inputextracted_entities: NamedEntities
— entities extracted by LLMverified_entities: list[VerifiedEntity]
— per-entity canonical namesupdated_transcript_text: str
— the in-progress, replaced transcriptreplacement_loop_idx: int
— the loop counter/index over verified entities (additive)replacement_pass_count: int
— attempts count for the current entity
extractor
— Uses an LLM to extractNamedEntity
items fromtranscript_text
get_verified_entity_worker
— For eachNamedEntity
, researches and produces aVerifiedEntity
with acanonical_name
replace_entity
— Uses an LLM to replace occurrences of the current entity with the canonical name, updatesupdated_transcript_text
, and incrementsreplacement_pass_count
replacement_reviewer
— Uses an LLM to check if the current entity has been fully replaced inupdated_transcript_text
. Controls whether to advance the loop.
- The graph uses
replacement_loop_idx
with additive semantics. Nodes return the increment, not the absolute index (e.g.,{"replacement_loop_idx": 1}
to advance by 1). replacement_reviewer_node()
decides if we move to the next entity:- Advance when the entity is fully replaced, or when
replacement_pass_count >= 2
. - Reset
replacement_pass_count
to 0 when advancing or when there are no more entities. - This guarantees a maximum of 2 replacement attempts per entity.
- Advance when the entity is fully replaced, or when
Models are instantiated with LangChain’s init_chat_model
and structured output:
- Extractor LLM:
gpt-4o-mini
- Entity Verifier LLM:
gpt-4o-mini
- Entity Replacer LLM:
gpt-4o-mini
- Replacement Reviewer LLM:
gpt-4o-mini
You will need a valid OPENAI_API_KEY
for these to work.
Entry point: main.py
- Instantiate
DemoEnrichmentAgent
- Optionally generate a graph image with
draw_graph()
- Provide a transcript string and run
asyncio.run(enrichment_agent.start(vid_transcript))
- The agent prints intermediate logs about extraction, verification, replacement passes, and loop routing. A higher
recursion_limit
is set for LangGraph execution.
/(repo root)
├── main.py
├── README.md
├── .env.example
├── .env (git-ignored; you create this)
└── src/
└── enrichment_agent/
├── __init__.py
└── agent.py
- Missing API keys: ensure
OPENAI_API_KEY
andTAVILY_API_KEY
are set. - Import errors for LangChain/LangGraph: verify dependencies installed and your venv is active.
- Graph rendering issues: toggle
draw_graph()
and ensure your environment supports graph image generation. - Infinite loop concerns: the reviewer enforces a 2-pass maximum per entity and resets the counter when advancing.
MIT.