Skip to content

CodeWithOz/demo-enrichment-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enrichment Agent

An async LangGraph-based agent that:

  • Extracts named entities from a video transcript
  • Verifies canonical names for each entity using web search context
  • Iteratively replaces entity mentions in the transcript with their canonical names
  • Reviews each replacement and controls looping with a max of 2 passes per entity

Quick Start

  • Python: 3.13+
  • OS: macOS/Linux/Windows

1) Clone and enter the project

git clone <your-fork-or-repo-url>
cd enrichment-agent

2) Set up the environment with uv

This project uses uv with a lockfile (uv.lock). Install uv and sync deps from the lock:

# Install uv (see https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or: pipx install uv

# From the project root, create/sync the project environment from pyproject + uv.lock
uv sync

Notes:

  • uv sync will create and manage a project-local virtualenv automatically.
  • Dependencies are declared in pyproject.toml and pinned by uv.lock.

3) Configure environment variables

Copy .env.example to .env and fill in values.

4) Run the agent (via uv)

uv run main.py
  • By default, main.py runs the agent with a sample transcript string. Edit main.py to provide your own transcript input as needed.
  • To render a Mermaid PNG diagram of the graph, call DemoEnrichmentAgent.draw_graph() in main.py.

How It Works

The agent is implemented in src/enrichment_agent/agent.py as DemoEnrichmentAgent using LangGraph.

State (AgentState)

  • transcript_text: str — original transcript input
  • extracted_entities: NamedEntities — entities extracted by LLM
  • verified_entities: list[VerifiedEntity] — per-entity canonical names
  • updated_transcript_text: str — the in-progress, replaced transcript
  • replacement_loop_idx: int — the loop counter/index over verified entities (additive)
  • replacement_pass_count: int — attempts count for the current entity

Graph Nodes

  • extractor — Uses an LLM to extract NamedEntity items from transcript_text
  • get_verified_entity_worker — For each NamedEntity, researches and produces a VerifiedEntity with a canonical_name
  • replace_entity — Uses an LLM to replace occurrences of the current entity with the canonical name, updates updated_transcript_text, and increments replacement_pass_count
  • replacement_reviewer — Uses an LLM to check if the current entity has been fully replaced in updated_transcript_text. Controls whether to advance the loop.

Loop Control

  • The graph uses replacement_loop_idx with additive semantics. Nodes return the increment, not the absolute index (e.g., {"replacement_loop_idx": 1} to advance by 1).
  • replacement_reviewer_node() decides if we move to the next entity:
    • Advance when the entity is fully replaced, or when replacement_pass_count >= 2.
    • Reset replacement_pass_count to 0 when advancing or when there are no more entities.
    • This guarantees a maximum of 2 replacement attempts per entity.

Models and Providers

Models are instantiated with LangChain’s init_chat_model and structured output:

  • Extractor LLM: gpt-4o-mini
  • Entity Verifier LLM: gpt-4o-mini
  • Entity Replacer LLM: gpt-4o-mini
  • Replacement Reviewer LLM: gpt-4o-mini

You will need a valid OPENAI_API_KEY for these to work.

Running Details

Entry point: main.py

  • Instantiate DemoEnrichmentAgent
  • Optionally generate a graph image with draw_graph()
  • Provide a transcript string and run asyncio.run(enrichment_agent.start(vid_transcript))
  • The agent prints intermediate logs about extraction, verification, replacement passes, and loop routing. A higher recursion_limit is set for LangGraph execution.

Project Structure

/(repo root)
├── main.py
├── README.md
├── .env.example
├── .env (git-ignored; you create this)
└── src/
    └── enrichment_agent/
        ├── __init__.py
        └── agent.py

Troubleshooting

  • Missing API keys: ensure OPENAI_API_KEY and TAVILY_API_KEY are set.
  • Import errors for LangChain/LangGraph: verify dependencies installed and your venv is active.
  • Graph rendering issues: toggle draw_graph() and ensure your environment supports graph image generation.
  • Infinite loop concerns: the reviewer enforces a 2-pass maximum per entity and resets the counter when advancing.

License

MIT.

About

Demo AI agent that enriches named entities in transcripts of YouTube videos.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages