Skip to content

LLM inline step suggestions #76

@hawkeyexl

Description

@hawkeyexl

Static Documentation Analysis for Action Step Generation

Overview

Implement static analysis capability in doc-detective/resolver to automatically extract Doc Detective action steps from documentation paragraphs. This feature will use the Vercel AI SDK to support multiple LLM providers (Anthropic, Google Gemini, OpenAI-compatible services) and will prioritize high recall—extracting all possible actions even at the cost of some false positives.

Goals

  1. Enable automated action extraction from documentation without requiring browser context
  2. Support multiple LLM providers through a unified interface
  3. Optimize for high recall to ensure comprehensive action coverage
  4. Provide reviewable output that users can filter and refine
  5. Handle complex patterns including conditionals, multi-step actions, and code blocks

Non-Goals (Future Phases)

  • Interactive analysis with browser context
  • Real-time action execution
  • Action validation against live applications
  • UI for reviewing/editing generated actions
  • Integration with Doc Detective's test runner

Technical Requirements

Dependencies

{
  "dependencies": {
    "ai": "^3.0.0",
    "@ai-sdk/anthropic": "^0.0.x",
    "@ai-sdk/google": "^0.0.x",
    "@ai-sdk/openai": "^0.0.x"
  }
}

Architecture

doc-detective/resolver/
├── src/
│   ├── analyzer/
│   │   ├── index.ts              # Main analyzer entry point
│   │   ├── prompt-builder.ts     # Constructs prompts for LLM
│   │   ├── document-parser.ts    # Splits documents into segments
│   │   └── post-processor.ts     # Adds defensive actions, validation
│   ├── llm/
│   │   ├── provider.ts           # LLM provider abstraction
│   │   └── config.ts             # Provider configuration
│   ├── schemas/
│   │   ├── actions/              # Individual action JSON schemas
│   │   │   ├── click.json
│   │   │   ├── typeKeys.json
│   │   │   ├── goTo.json
│   │   │   ├── find.json
│   │   │   ├── httpRequest.json
│   │   │   ├── runShell.json
│   │   │   ├── conditional.json
│   │   │   └── index.ts
│   │   └── step.json             # Complete step schema
│   ├── types/
│   │   └── index.ts              # TypeScript types
│   └── index.ts                  # Public API
├── tests/
│   ├── analyzer.test.ts
│   ├── prompt-builder.test.ts
│   ├── document-parser.test.ts
│   └── fixtures/
│       └── sample-docs/          # Test documentation samples
└── README.md

Implementation Details

1. Core Types

// types/index.ts

/**
 * Configuration for the static analyzer
 */
export interface AnalyzerConfig {
  provider: 'anthropic' | 'google' | 'openai';
  apiKey: string;
  model?: string;
  temperature?: number;
  maxTokens?: number;
}

/**
 * A segment of documentation to analyze
 */
export interface DocumentSegment {
  type: 'text' | 'code';
  content: string;
  language?: string;
  lineNumber: number;
}

/**
 * Result of analyzing a single segment
 */
export interface SegmentAnalysisResult {
  actions: ActionStep[];
  segment: DocumentSegment;
  metadata: {
    promptTokens: number;
    completionTokens: number;
    latencyMs: number;
  };
}

/**
 * Complete analysis result for a document
 */
export interface DocumentAnalysisResult {
  actions: ActionStep[];
  segments: SegmentAnalysisResult[];
  summary: {
    totalActions: number;
    totalSegments: number;
    analyzedSegments: number;
    skippedSegments: number;
    totalTokens: number;
    totalLatencyMs: number;
  };
}

/**
 * Base action step structure
 * (Extend with specific action types from schemas)
 */
export interface ActionStep {
  action: string;
  description: string;
  _source?: {
    type: 'text' | 'code';
    content: string;
    line: number;
  };
  _generated?: boolean;
  note?: string;
  confidence?: 'high' | 'medium' | 'low';
}

2. Document Parser

// analyzer/document-parser.ts

/**
 * Splits a document into analyzable segments while preserving
 * code blocks intact. Code blocks should not be analyzed as
 * instructions unless they contain shell commands.
 */
export function parseDocument(document: string): DocumentSegment[] {
  // Implementation requirements:
  // - Split on paragraph boundaries (double newlines)
  // - Preserve markdown code blocks (```language...```)
  // - Track line numbers for source attribution
  // - Handle nested structures (lists, blockquotes)
  // - Identify code blocks by language (bash/shell = analyze, others = skip)
}

/**
 * Determines if a code block contains executable instructions
 * that should be analyzed (e.g., shell commands).
 */
export function isAnalyzableCode(segment: DocumentSegment): boolean {
  // Return true for bash, sh, shell, zsh, fish
  // Return false for other languages
}

3. Prompt Builder

// analyzer/prompt-builder.ts

/**
 * Builds the core analysis prompt with high-recall bias
 */
export function buildCorePrompt(): string {
  // Return the CORE_ANALYSIS_PROMPT from the design
  // Include:
  // - Task definition
  // - Extraction philosophy (5 principles)
  // - Action decomposition examples
  // - Conditional logic handling
  // - Common patterns to watch for
  // - Output format requirements
}

/**
 * Builds static mode enhancement prompt
 */
export function buildStaticModePrompt(): string {
  // Return the STATIC_MODE_PROMPT from the design
  // Include guidance on:
  // - Aggressive inference strategies
  // - Placeholder variable usage
  // - Handling ambiguity
  // - Confidence scoring
}

/**
 * Gets relevant action schemas based on paragraph content
 */
export function getRelevantSchemas(
  paragraph: string,
  allSchemas: Record<string, object>
): string {
  // Detect likely action types using regex patterns
  // Return formatted schema documentation for detected types
  // Always include 'find' and 'conditional' schemas
}

/**
 * Builds the complete prompt for a paragraph
 */
export function buildPrompt(
  segment: DocumentSegment,
  schemas: Record<string, object>
): string {
  // Combine:
  // 1. Core prompt
  // 2. Static mode prompt
  // 3. Relevant schemas
  // 4. The segment content
  // 5. Output format reminder
}

4. LLM Provider

// llm/provider.ts

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { openai } from '@ai-sdk/openai';

/**
 * Creates an LLM provider instance based on configuration
 */
export function createProvider(config: AnalyzerConfig) {
  switch (config.provider) {
    case 'anthropic':
      return anthropic(config.model || 'claude-sonnet-4-20250514');
    case 'google':
      return google(config.model || 'gemini-2.0-flash-exp');
    case 'openai':
      return openai(config.model || 'gpt-4o');
    default:
      throw new Error(`Unsupported provider: ${config.provider}`);
  }
}

/**
 * Generates action steps for a segment using the configured LLM
 */
export async function analyzeSegment(
  segment: DocumentSegment,
  prompt: string,
  config: AnalyzerConfig
): Promise<{ actions: ActionStep[]; metadata: any }> {
  const startTime = Date.now();
  
  const model = createProvider(config);
  
  const result = await generateText({
    model,
    prompt,
    temperature: config.temperature ?? 0.3,
    maxTokens: config.maxTokens ?? 4000,
  });
  
  const latencyMs = Date.now() - startTime;
  
  // Parse JSON response
  let actions: ActionStep[] = [];
  try {
    actions = JSON.parse(result.text);
  } catch (error) {
    throw new Error(`Failed to parse LLM response: ${error.message}\nResponse: ${result.text}`);
  }
  
  return {
    actions,
    metadata: {
      promptTokens: result.usage?.promptTokens ?? 0,
      completionTokens: result.usage?.completionTokens ?? 0,
      latencyMs,
    },
  };
}

5. Post-Processor

// analyzer/post-processor.ts

/**
 * Adds defensive find actions before click/typeKeys actions
 * to increase reliability and recall.
 */
export function addDefensiveActions(actions: ActionStep[]): ActionStep[] {
  // For each click action without preceding find:
  //   Insert find action with same selector
  // For each typeKeys action without preceding find:
  //   Insert find action with same selector
  // After significant actions (submit, save, login):
  //   Add verification find action
}

/**
 * Tags actions with source attribution for traceability
 */
export function tagActionsWithSource(
  actions: ActionStep[],
  segment: DocumentSegment
): ActionStep[] {
  // Add _source field to each action containing:
  // - segment type
  // - segment content
  // - line number
}

/**
 * Validates that generated actions conform to schemas
 */
export function validateActions(
  actions: ActionStep[],
  schemas: Record<string, object>
): { valid: ActionStep[]; invalid: any[] } {
  // Use JSON schema validation
  // Return valid actions and array of validation errors
}

6. Main Analyzer

// analyzer/index.ts

/**
 * Analyzes a complete document and returns extracted actions
 */
export async function analyzeDocument(
  document: string,
  config: AnalyzerConfig,
  schemas: Record<string, object>
): Promise<DocumentAnalysisResult> {
  // 1. Parse document into segments
  const segments = parseDocument(document);
  
  // 2. Analyze each segment
  const results: SegmentAnalysisResult[] = [];
  const allActions: ActionStep[] = [];
  
  for (const segment of segments) {
    // Skip non-analyzable code blocks
    if (segment.type === 'code' && !isAnalyzableCode(segment)) {
      continue;
    }
    
    // Skip empty segments
    if (!segment.content.trim()) {
      continue;
    }
    
    // Build prompt
    const prompt = buildPrompt(segment, schemas);
    
    // Call LLM
    const { actions, metadata } = await analyzeSegment(segment, prompt, config);
    
    // Tag actions with source
    const taggedActions = tagActionsWithSource(actions, segment);
    
    results.push({
      actions: taggedActions,
      segment,
      metadata,
    });
    
    allActions.push(...taggedActions);
  }
  
  // 3. Post-process actions
  const enhancedActions = addDefensiveActions(allActions);
  
  // 4. Validate actions
  const { valid, invalid } = validateActions(enhancedActions, schemas);
  
  if (invalid.length > 0) {
    console.warn(`${invalid.length} actions failed validation:`, invalid);
  }
  
  // 5. Build summary
  const summary = {
    totalActions: valid.length,
    totalSegments: segments.length,
    analyzedSegments: results.length,
    skippedSegments: segments.length - results.length,
    totalTokens: results.reduce((sum, r) => sum + r.metadata.promptTokens + r.metadata.completionTokens, 0),
    totalLatencyMs: results.reduce((sum, r) => sum + r.metadata.latencyMs, 0),
  };
  
  return {
    actions: valid,
    segments: results,
    summary,
  };
}

7. Public API

// index.ts

import { analyzeDocument } from './analyzer';
import * as schemas from './schemas';

/**
 * Main export: Static documentation analyzer
 */
export async function analyze(
  document: string,
  config: AnalyzerConfig
): Promise<DocumentAnalysisResult> {
  // Load schemas
  const actionSchemas = schemas.loadActionSchemas();
  
  // Run analysis
  return analyzeDocument(document, config, actionSchemas);
}

// Re-export types
export * from './types';

8. Schema Loading

// schemas/index.ts

/**
 * Loads all action schemas from JSON files
 */
export function loadActionSchemas(): Record<string, object> {
  return {
    click: require('./actions/click.json'),
    typeKeys: require('./actions/typeKeys.json'),
    goTo: require('./actions/goTo.json'),
    find: require('./actions/find.json'),
    httpRequest: require('./actions/httpRequest.json'),
    runShell: require('./actions/runShell.json'),
    conditional: require('./actions/conditional.json'),
    // Add other action types as needed
  };
}

Configuration

Environment Variables

# Choose one provider
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=...
OPENAI_API_KEY=sk-...

# Optional: Override default models
RESOLVER_MODEL=claude-sonnet-4-20250514
RESOLVER_TEMPERATURE=0.3
RESOLVER_MAX_TOKENS=4000

Usage Example

import { analyze } from '@doc-detective/resolver';

const documentation = `
Navigate to https://example.com and log in with your credentials.
Click the Settings button in the top navigation bar.
`;

const result = await analyze(documentation, {
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

console.log(`Extracted ${result.summary.totalActions} actions`);
console.log(JSON.stringify(result.actions, null, 2));

Testing Requirements

Unit Tests

  1. Document Parser

    • Splits paragraphs correctly
    • Preserves code blocks
    • Tracks line numbers accurately
    • Handles edge cases (empty lines, nested structures)
  2. Prompt Builder

    • Generates complete prompts
    • Detects relevant action types
    • Includes appropriate schemas
    • Formats prompts consistently
  3. Post-Processor

    • Adds find actions before click/typeKeys
    • Tags actions with source info
    • Validates action schemas
    • Handles empty action arrays

Integration Tests

  1. End-to-End Analysis

    • Sample documentation → expected actions
    • Test with multiple LLM providers
    • Verify high recall (catches all actions)
    • Ensure valid JSON output
  2. Complex Scenarios

    • Conditional logic extraction
    • Multi-step implicit actions (login flow)
    • Mixed content (text + code blocks)
    • Edge cases (optional steps, ambiguous language)

Test Fixtures

Create sample documentation covering:

  • Simple single actions
  • Multi-step sequences
  • Conditional logic
  • API documentation
  • CLI commands
  • UI interactions
  • Mixed text and code

Error Handling

  1. LLM API Failures

    • Retry with exponential backoff (max 3 attempts)
    • Log errors with context
    • Continue processing remaining segments
  2. Parse Failures

    • Log unparseable LLM responses
    • Return empty actions for failed segment
    • Include error in metadata
  3. Validation Failures

    • Log invalid actions
    • Exclude from final output
    • Include validation errors in result metadata

Performance Considerations

  1. Token Optimization

    • Only include relevant schemas
    • Limit segment size (split large paragraphs)
    • Use lower temperature for consistency
  2. Parallelization

    • Process independent segments in parallel (future enhancement)
    • Respect rate limits per provider
  3. Caching

    • Cache schema loading
    • Consider caching prompt templates (future enhancement)

Documentation

README.md

Include:

  • Installation instructions
  • Quick start example
  • Configuration options
  • Supported LLM providers
  • Action schema documentation
  • Troubleshooting guide

API Documentation

Generate TypeDoc documentation for:

  • All exported functions
  • Configuration interfaces
  • Return types
  • Error types

Success Metrics

  1. High Recall: Captures >95% of actual actions in test documentation
  2. Valid Output: >98% of generated actions pass schema validation
  3. Performance: <5s average analysis time per 1000 words
  4. Provider Agnostic: Works consistently across all supported LLM providers

Future Enhancements (Out of Scope)

  • Interactive analysis mode with browser context
  • Action execution and validation
  • Confidence scoring refinement
  • User feedback loop for improving extraction
  • Custom action type definitions
  • Batch processing API
  • Web UI for reviewing/editing actions

Acceptance Criteria

  • All unit tests pass with >90% coverage
  • Integration tests pass for all three LLM providers
  • Extracts actions from all test fixture documents
  • Generated actions validate against schemas
  • Documentation is complete and accurate
  • Error handling works for common failure scenarios
  • Performance meets benchmarks on sample documents
  • Public API is clean and easy to use
  • TypeScript types are complete and accurate

Implementation Notes for Autonomous Agent

  • Use the Vercel AI SDK consistently for all LLM interactions
  • Follow the TypeScript style in the existing doc-detective codebase
  • Add comprehensive JSDoc comments to all functions
  • Use descriptive variable names that explain intent
  • Write tests alongside implementation (TDD approach)
  • Keep functions focused and single-purpose
  • Handle errors gracefully with meaningful messages
  • Log important operations for debugging
  • Use async/await consistently for async operations
  • Validate inputs at API boundaries

Estimated Effort: 3-5 days for full implementation and testing
Priority: High
Labels: enhancement, ai, static-analysis

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions