Skip to content

Implement langextract tools for interactive text visualization #24

@praisonai-triage-agent

Description

@praisonai-triage-agent

Summary

Implement langextract tools for PraisonAI agents to create interactive HTML visualizations from text with highlighted extractions. This follows the architectural violation identified in PR #1424 where these tools were incorrectly placed in the core SDK.

Background

Currently being implemented in MervinPraison/PraisonAI#1424, but violates AGENTS.md architecture:

  • Heavy optional dependency (langextract ~50MB+ with ML models)
  • External integration tool (3rd party library)
  • Not core SDK functionality
  • Should be in praisonai-tools per AGENTS.md Section 2.2

Required Implementation

1. Langextract Extract Tool

from praisonai_tools import tool

@tool
def langextract_extract(
    text: str,
    extractions: Optional[List[str]] = None,
    document_id: str = "agent-analysis",
    output_path: Optional[str] = None,
    auto_open: bool = False
) -> Dict[str, Any]:
    """Extract and annotate text using langextract for interactive visualization.
    
    Creates an interactive HTML document with highlighted extractions that can be
    viewed in a browser. Useful for text analysis, entity extraction, and 
    document annotation workflows.
    """
    # Implementation details...

2. Langextract Render File Tool

@tool  
@require_approval(risk_level="high")
def langextract_render_file(
    file_path: str,
    extractions: Optional[List[str]] = None,
    output_path: Optional[str] = None,
    auto_open: bool = False
) -> Dict[str, Any]:
    """Read a text file and create langextract visualization."""
    # Implementation details...

Technical Requirements

Dependencies

  • langextract (optional dependency)
  • Lazy imports with graceful degradation
  • Clear error messages when not installed

API Compliance

  • Use correct langextract API:
    • lx.data.AnnotatedDocument
    • lx.data.CharInterval(start_pos=X, end_pos=Y)
    • lx.io.save_annotated_documents() + lx.visualize()
  • Follow praisonai-tools patterns (BaseTool, @tool decorator)

Security

  • File operations require approval (@require_approval)
  • Input validation for text and file paths
  • Cross-platform file URI handling (Path.as_uri())

Integration with PraisonAI

Agent Usage

from praisonaiagents import Agent
from praisonai_tools import langextract_extract, langextract_render_file

agent = Agent(
    name="text_analyzer",
    instructions="Analyze text and create interactive visualizations",
    tools=[langextract_extract, langextract_render_file]
)

response = agent.start("Analyze this contract text and highlight key terms")

Installation

pip install praisonai-tools[langextract]
# or
pip install praisonai-tools langextract

Files to Create

  • praisonai_tools/tools/langextract_tool.py - Main implementation
  • tests/test_langextract_tool.py - Unit tests with langextract installed
  • Update praisonai_tools/tools/__init__.py - Export tools
  • Update pyproject.toml - Add langextract optional dependency
  • examples/langextract_example.py - Usage example

Success Criteria

  1. ✅ Tools work with PraisonAI agents
  2. ✅ Graceful degradation without langextract installed
  3. ✅ Interactive HTML generation with extractions highlighting
  4. ✅ File I/O with security approval
  5. ✅ Cross-platform compatibility
  6. ✅ Unit tests with real agentic tests
  7. ✅ Documentation and examples

Reference Implementation

The current implementation in MervinPraison/PraisonAI#1424 can be used as a starting point, but needs:

  • API fixes (correct langextract usage)
  • Proper approval decorator usage
  • Architecture compliance (move to praisonai-tools)

Fixes #1421 (PraisonAI follow-up 3 - langextract tools)


Priority: High - Blocks MervinPraison/PraisonAI#1424 architectural compliance
Assignee: Please assign to someone familiar with praisonai-tools patterns

Metadata

Metadata

Assignees

No one assigned

    Labels

    claudeAuto-trigger Claude analysis

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions