LLM inline step suggestions

Static Documentation Analysis for Action Step Generation

## Overview

Implement static analysis capability in `doc-detective/resolver` to automatically extract Doc Detective action steps from documentation paragraphs. This feature will use the Vercel AI SDK to support multiple LLM providers (Anthropic, Google Gemini, OpenAI-compatible services) and will prioritize high recall—extracting all possible actions even at the cost of some false positives.

## Goals

1. **Enable automated action extraction** from documentation without requiring browser context
2. **Support multiple LLM providers** through a unified interface
3. **Optimize for high recall** to ensure comprehensive action coverage
4. **Provide reviewable output** that users can filter and refine
5. **Handle complex patterns** including conditionals, multi-step actions, and code blocks

## Non-Goals (Future Phases)

- Interactive analysis with browser context
- Real-time action execution
- Action validation against live applications
- UI for reviewing/editing generated actions
- Integration with Doc Detective's test runner

## Technical Requirements

### Dependencies

```json
{
  "dependencies": {
    "ai": "^3.0.0",
    "@ai-sdk/anthropic": "^0.0.x",
    "@ai-sdk/google": "^0.0.x",
    "@ai-sdk/openai": "^0.0.x"
  }
}
```

### Architecture

```
doc-detective/resolver/
├── src/
│   ├── analyzer/
│   │   ├── index.ts              # Main analyzer entry point
│   │   ├── prompt-builder.ts     # Constructs prompts for LLM
│   │   ├── document-parser.ts    # Splits documents into segments
│   │   └── post-processor.ts     # Adds defensive actions, validation
│   ├── llm/
│   │   ├── provider.ts           # LLM provider abstraction
│   │   └── config.ts             # Provider configuration
│   ├── schemas/
│   │   ├── actions/              # Individual action JSON schemas
│   │   │   ├── click.json
│   │   │   ├── typeKeys.json
│   │   │   ├── goTo.json
│   │   │   ├── find.json
│   │   │   ├── httpRequest.json
│   │   │   ├── runShell.json
│   │   │   ├── conditional.json
│   │   │   └── index.ts
│   │   └── step.json             # Complete step schema
│   ├── types/
│   │   └── index.ts              # TypeScript types
│   └── index.ts                  # Public API
├── tests/
│   ├── analyzer.test.ts
│   ├── prompt-builder.test.ts
│   ├── document-parser.test.ts
│   └── fixtures/
│       └── sample-docs/          # Test documentation samples
└── README.md
```

## Implementation Details

### 1. Core Types

```typescript
// types/index.ts

/**
 * Configuration for the static analyzer
 */
export interface AnalyzerConfig {
  provider: 'anthropic' | 'google' | 'openai';
  apiKey: string;
  model?: string;
  temperature?: number;
  maxTokens?: number;
}

/**
 * A segment of documentation to analyze
 */
export interface DocumentSegment {
  type: 'text' | 'code';
  content: string;
  language?: string;
  lineNumber: number;
}

/**
 * Result of analyzing a single segment
 */
export interface SegmentAnalysisResult {
  actions: ActionStep[];
  segment: DocumentSegment;
  metadata: {
    promptTokens: number;
    completionTokens: number;
    latencyMs: number;
  };
}

/**
 * Complete analysis result for a document
 */
export interface DocumentAnalysisResult {
  actions: ActionStep[];
  segments: SegmentAnalysisResult[];
  summary: {
    totalActions: number;
    totalSegments: number;
    analyzedSegments: number;
    skippedSegments: number;
    totalTokens: number;
    totalLatencyMs: number;
  };
}

/**
 * Base action step structure
 * (Extend with specific action types from schemas)
 */
export interface ActionStep {
  action: string;
  description: string;
  _source?: {
    type: 'text' | 'code';
    content: string;
    line: number;
  };
  _generated?: boolean;
  note?: string;
  confidence?: 'high' | 'medium' | 'low';
}
```

### 2. Document Parser

```typescript
// analyzer/document-parser.ts

/**
 * Splits a document into analyzable segments while preserving
 * code blocks intact. Code blocks should not be analyzed as
 * instructions unless they contain shell commands.
 */
export function parseDocument(document: string): DocumentSegment[] {
  // Implementation requirements:
  // - Split on paragraph boundaries (double newlines)
  // - Preserve markdown code blocks (```language...```)
  // - Track line numbers for source attribution
  // - Handle nested structures (lists, blockquotes)
  // - Identify code blocks by language (bash/shell = analyze, others = skip)
}

/**
 * Determines if a code block contains executable instructions
 * that should be analyzed (e.g., shell commands).
 */
export function isAnalyzableCode(segment: DocumentSegment): boolean {
  // Return true for bash, sh, shell, zsh, fish
  // Return false for other languages
}
```

### 3. Prompt Builder

```typescript
// analyzer/prompt-builder.ts

/**
 * Builds the core analysis prompt with high-recall bias
 */
export function buildCorePrompt(): string {
  // Return the CORE_ANALYSIS_PROMPT from the design
  // Include:
  // - Task definition
  // - Extraction philosophy (5 principles)
  // - Action decomposition examples
  // - Conditional logic handling
  // - Common patterns to watch for
  // - Output format requirements
}

/**
 * Builds static mode enhancement prompt
 */
export function buildStaticModePrompt(): string {
  // Return the STATIC_MODE_PROMPT from the design
  // Include guidance on:
  // - Aggressive inference strategies
  // - Placeholder variable usage
  // - Handling ambiguity
  // - Confidence scoring
}

/**
 * Gets relevant action schemas based on paragraph content
 */
export function getRelevantSchemas(
  paragraph: string,
  allSchemas: Record<string, object>
): string {
  // Detect likely action types using regex patterns
  // Return formatted schema documentation for detected types
  // Always include 'find' and 'conditional' schemas
}

/**
 * Builds the complete prompt for a paragraph
 */
export function buildPrompt(
  segment: DocumentSegment,
  schemas: Record<string, object>
): string {
  // Combine:
  // 1. Core prompt
  // 2. Static mode prompt
  // 3. Relevant schemas
  // 4. The segment content
  // 5. Output format reminder
}
```

### 4. LLM Provider

```typescript
// llm/provider.ts

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { openai } from '@ai-sdk/openai';

/**
 * Creates an LLM provider instance based on configuration
 */
export function createProvider(config: AnalyzerConfig) {
  switch (config.provider) {
    case 'anthropic':
      return anthropic(config.model || 'claude-sonnet-4-20250514');
    case 'google':
      return google(config.model || 'gemini-2.0-flash-exp');
    case 'openai':
      return openai(config.model || 'gpt-4o');
    default:
      throw new Error(`Unsupported provider: ${config.provider}`);
  }
}

/**
 * Generates action steps for a segment using the configured LLM
 */
export async function analyzeSegment(
  segment: DocumentSegment,
  prompt: string,
  config: AnalyzerConfig
): Promise<{ actions: ActionStep[]; metadata: any }> {
  const startTime = Date.now();
  
  const model = createProvider(config);
  
  const result = await generateText({
    model,
    prompt,
    temperature: config.temperature ?? 0.3,
    maxTokens: config.maxTokens ?? 4000,
  });
  
  const latencyMs = Date.now() - startTime;
  
  // Parse JSON response
  let actions: ActionStep[] = [];
  try {
    actions = JSON.parse(result.text);
  } catch (error) {
    throw new Error(`Failed to parse LLM response: ${error.message}\nResponse: ${result.text}`);
  }
  
  return {
    actions,
    metadata: {
      promptTokens: result.usage?.promptTokens ?? 0,
      completionTokens: result.usage?.completionTokens ?? 0,
      latencyMs,
    },
  };
}
```

### 5. Post-Processor

```typescript
// analyzer/post-processor.ts

/**
 * Adds defensive find actions before click/typeKeys actions
 * to increase reliability and recall.
 */
export function addDefensiveActions(actions: ActionStep[]): ActionStep[] {
  // For each click action without preceding find:
  //   Insert find action with same selector
  // For each typeKeys action without preceding find:
  //   Insert find action with same selector
  // After significant actions (submit, save, login):
  //   Add verification find action
}

/**
 * Tags actions with source attribution for traceability
 */
export function tagActionsWithSource(
  actions: ActionStep[],
  segment: DocumentSegment
): ActionStep[] {
  // Add _source field to each action containing:
  // - segment type
  // - segment content
  // - line number
}

/**
 * Validates that generated actions conform to schemas
 */
export function validateActions(
  actions: ActionStep[],
  schemas: Record<string, object>
): { valid: ActionStep[]; invalid: any[] } {
  // Use JSON schema validation
  // Return valid actions and array of validation errors
}
```

### 6. Main Analyzer

```typescript
// analyzer/index.ts

/**
 * Analyzes a complete document and returns extracted actions
 */
export async function analyzeDocument(
  document: string,
  config: AnalyzerConfig,
  schemas: Record<string, object>
): Promise<DocumentAnalysisResult> {
  // 1. Parse document into segments
  const segments = parseDocument(document);
  
  // 2. Analyze each segment
  const results: SegmentAnalysisResult[] = [];
  const allActions: ActionStep[] = [];
  
  for (const segment of segments) {
    // Skip non-analyzable code blocks
    if (segment.type === 'code' && !isAnalyzableCode(segment)) {
      continue;
    }
    
    // Skip empty segments
    if (!segment.content.trim()) {
      continue;
    }
    
    // Build prompt
    const prompt = buildPrompt(segment, schemas);
    
    // Call LLM
    const { actions, metadata } = await analyzeSegment(segment, prompt, config);
    
    // Tag actions with source
    const taggedActions = tagActionsWithSource(actions, segment);
    
    results.push({
      actions: taggedActions,
      segment,
      metadata,
    });
    
    allActions.push(...taggedActions);
  }
  
  // 3. Post-process actions
  const enhancedActions = addDefensiveActions(allActions);
  
  // 4. Validate actions
  const { valid, invalid } = validateActions(enhancedActions, schemas);
  
  if (invalid.length > 0) {
    console.warn(`${invalid.length} actions failed validation:`, invalid);
  }
  
  // 5. Build summary
  const summary = {
    totalActions: valid.length,
    totalSegments: segments.length,
    analyzedSegments: results.length,
    skippedSegments: segments.length - results.length,
    totalTokens: results.reduce((sum, r) => sum + r.metadata.promptTokens + r.metadata.completionTokens, 0),
    totalLatencyMs: results.reduce((sum, r) => sum + r.metadata.latencyMs, 0),
  };
  
  return {
    actions: valid,
    segments: results,
    summary,
  };
}
```

### 7. Public API

```typescript
// index.ts

import { analyzeDocument } from './analyzer';
import * as schemas from './schemas';

/**
 * Main export: Static documentation analyzer
 */
export async function analyze(
  document: string,
  config: AnalyzerConfig
): Promise<DocumentAnalysisResult> {
  // Load schemas
  const actionSchemas = schemas.loadActionSchemas();
  
  // Run analysis
  return analyzeDocument(document, config, actionSchemas);
}

// Re-export types
export * from './types';
```

### 8. Schema Loading

```typescript
// schemas/index.ts

/**
 * Loads all action schemas from JSON files
 */
export function loadActionSchemas(): Record<string, object> {
  return {
    click: require('./actions/click.json'),
    typeKeys: require('./actions/typeKeys.json'),
    goTo: require('./actions/goTo.json'),
    find: require('./actions/find.json'),
    httpRequest: require('./actions/httpRequest.json'),
    runShell: require('./actions/runShell.json'),
    conditional: require('./actions/conditional.json'),
    // Add other action types as needed
  };
}
```

## Configuration

### Environment Variables

```bash
# Choose one provider
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=...
OPENAI_API_KEY=sk-...

# Optional: Override default models
RESOLVER_MODEL=claude-sonnet-4-20250514
RESOLVER_TEMPERATURE=0.3
RESOLVER_MAX_TOKENS=4000
```

### Usage Example

```typescript
import { analyze } from '@doc-detective/resolver';

const documentation = `
Navigate to https://example.com and log in with your credentials.
Click the Settings button in the top navigation bar.
`;

const result = await analyze(documentation, {
  provider: 'anthropic',
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

console.log(`Extracted ${result.summary.totalActions} actions`);
console.log(JSON.stringify(result.actions, null, 2));
```

## Testing Requirements

### Unit Tests

1. **Document Parser**
   - Splits paragraphs correctly
   - Preserves code blocks
   - Tracks line numbers accurately
   - Handles edge cases (empty lines, nested structures)

2. **Prompt Builder**
   - Generates complete prompts
   - Detects relevant action types
   - Includes appropriate schemas
   - Formats prompts consistently

3. **Post-Processor**
   - Adds find actions before click/typeKeys
   - Tags actions with source info
   - Validates action schemas
   - Handles empty action arrays

### Integration Tests

1. **End-to-End Analysis**
   - Sample documentation → expected actions
   - Test with multiple LLM providers
   - Verify high recall (catches all actions)
   - Ensure valid JSON output

2. **Complex Scenarios**
   - Conditional logic extraction
   - Multi-step implicit actions (login flow)
   - Mixed content (text + code blocks)
   - Edge cases (optional steps, ambiguous language)

### Test Fixtures

Create sample documentation covering:
- Simple single actions
- Multi-step sequences
- Conditional logic
- API documentation
- CLI commands
- UI interactions
- Mixed text and code

## Error Handling

1. **LLM API Failures**
   - Retry with exponential backoff (max 3 attempts)
   - Log errors with context
   - Continue processing remaining segments

2. **Parse Failures**
   - Log unparseable LLM responses
   - Return empty actions for failed segment
   - Include error in metadata

3. **Validation Failures**
   - Log invalid actions
   - Exclude from final output
   - Include validation errors in result metadata

## Performance Considerations

1. **Token Optimization**
   - Only include relevant schemas
   - Limit segment size (split large paragraphs)
   - Use lower temperature for consistency

2. **Parallelization**
   - Process independent segments in parallel (future enhancement)
   - Respect rate limits per provider

3. **Caching**
   - Cache schema loading
   - Consider caching prompt templates (future enhancement)

## Documentation

### README.md

Include:
- Installation instructions
- Quick start example
- Configuration options
- Supported LLM providers
- Action schema documentation
- Troubleshooting guide

### API Documentation

Generate TypeDoc documentation for:
- All exported functions
- Configuration interfaces
- Return types
- Error types

## Success Metrics

1. **High Recall**: Captures >95% of actual actions in test documentation
2. **Valid Output**: >98% of generated actions pass schema validation
3. **Performance**: <5s average analysis time per 1000 words
4. **Provider Agnostic**: Works consistently across all supported LLM providers

## Future Enhancements (Out of Scope)

- Interactive analysis mode with browser context
- Action execution and validation
- Confidence scoring refinement
- User feedback loop for improving extraction
- Custom action type definitions
- Batch processing API
- Web UI for reviewing/editing actions

## Acceptance Criteria

- [ ] All unit tests pass with >90% coverage
- [ ] Integration tests pass for all three LLM providers
- [ ] Extracts actions from all test fixture documents
- [ ] Generated actions validate against schemas
- [ ] Documentation is complete and accurate
- [ ] Error handling works for common failure scenarios
- [ ] Performance meets benchmarks on sample documents
- [ ] Public API is clean and easy to use
- [ ] TypeScript types are complete and accurate

## Implementation Notes for Autonomous Agent

- Use the Vercel AI SDK consistently for all LLM interactions
- Follow the TypeScript style in the existing doc-detective codebase
- Add comprehensive JSDoc comments to all functions
- Use descriptive variable names that explain intent
- Write tests alongside implementation (TDD approach)
- Keep functions focused and single-purpose
- Handle errors gracefully with meaningful messages
- Log important operations for debugging
- Use `async/await` consistently for async operations
- Validate inputs at API boundaries

---

**Estimated Effort**: 3-5 days for full implementation and testing
**Priority**: High
**Labels**: enhancement, ai, static-analysis

Uh oh!

LLM inline step suggestions #76

Description

Overview

Goals

Non-Goals (Future Phases)

Technical Requirements

Dependencies

Architecture

Implementation Details

1. Core Types

2. Document Parser

3. Prompt Builder

4. LLM Provider

5. Post-Processor

6. Main Analyzer

7. Public API

8. Schema Loading

Configuration

Environment Variables

Usage Example

Testing Requirements

Unit Tests

Integration Tests

Test Fixtures

Error Handling

Performance Considerations

Documentation

README.md

API Documentation

Success Metrics

Future Enhancements (Out of Scope)

Acceptance Criteria

Implementation Notes for Autonomous Agent

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions