-
-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Static Documentation Analysis for Action Step Generation
Overview
Implement static analysis capability in doc-detective/resolver to automatically extract Doc Detective action steps from documentation paragraphs. This feature will use the Vercel AI SDK to support multiple LLM providers (Anthropic, Google Gemini, OpenAI-compatible services) and will prioritize high recall—extracting all possible actions even at the cost of some false positives.
Goals
- Enable automated action extraction from documentation without requiring browser context
- Support multiple LLM providers through a unified interface
- Optimize for high recall to ensure comprehensive action coverage
- Provide reviewable output that users can filter and refine
- Handle complex patterns including conditionals, multi-step actions, and code blocks
Non-Goals (Future Phases)
- Interactive analysis with browser context
- Real-time action execution
- Action validation against live applications
- UI for reviewing/editing generated actions
- Integration with Doc Detective's test runner
Technical Requirements
Dependencies
{
"dependencies": {
"ai": "^3.0.0",
"@ai-sdk/anthropic": "^0.0.x",
"@ai-sdk/google": "^0.0.x",
"@ai-sdk/openai": "^0.0.x"
}
}Architecture
doc-detective/resolver/
├── src/
│ ├── analyzer/
│ │ ├── index.ts # Main analyzer entry point
│ │ ├── prompt-builder.ts # Constructs prompts for LLM
│ │ ├── document-parser.ts # Splits documents into segments
│ │ └── post-processor.ts # Adds defensive actions, validation
│ ├── llm/
│ │ ├── provider.ts # LLM provider abstraction
│ │ └── config.ts # Provider configuration
│ ├── schemas/
│ │ ├── actions/ # Individual action JSON schemas
│ │ │ ├── click.json
│ │ │ ├── typeKeys.json
│ │ │ ├── goTo.json
│ │ │ ├── find.json
│ │ │ ├── httpRequest.json
│ │ │ ├── runShell.json
│ │ │ ├── conditional.json
│ │ │ └── index.ts
│ │ └── step.json # Complete step schema
│ ├── types/
│ │ └── index.ts # TypeScript types
│ └── index.ts # Public API
├── tests/
│ ├── analyzer.test.ts
│ ├── prompt-builder.test.ts
│ ├── document-parser.test.ts
│ └── fixtures/
│ └── sample-docs/ # Test documentation samples
└── README.md
Implementation Details
1. Core Types
// types/index.ts
/**
* Configuration for the static analyzer
*/
export interface AnalyzerConfig {
provider: 'anthropic' | 'google' | 'openai';
apiKey: string;
model?: string;
temperature?: number;
maxTokens?: number;
}
/**
* A segment of documentation to analyze
*/
export interface DocumentSegment {
type: 'text' | 'code';
content: string;
language?: string;
lineNumber: number;
}
/**
* Result of analyzing a single segment
*/
export interface SegmentAnalysisResult {
actions: ActionStep[];
segment: DocumentSegment;
metadata: {
promptTokens: number;
completionTokens: number;
latencyMs: number;
};
}
/**
* Complete analysis result for a document
*/
export interface DocumentAnalysisResult {
actions: ActionStep[];
segments: SegmentAnalysisResult[];
summary: {
totalActions: number;
totalSegments: number;
analyzedSegments: number;
skippedSegments: number;
totalTokens: number;
totalLatencyMs: number;
};
}
/**
* Base action step structure
* (Extend with specific action types from schemas)
*/
export interface ActionStep {
action: string;
description: string;
_source?: {
type: 'text' | 'code';
content: string;
line: number;
};
_generated?: boolean;
note?: string;
confidence?: 'high' | 'medium' | 'low';
}2. Document Parser
// analyzer/document-parser.ts
/**
* Splits a document into analyzable segments while preserving
* code blocks intact. Code blocks should not be analyzed as
* instructions unless they contain shell commands.
*/
export function parseDocument(document: string): DocumentSegment[] {
// Implementation requirements:
// - Split on paragraph boundaries (double newlines)
// - Preserve markdown code blocks (```language...```)
// - Track line numbers for source attribution
// - Handle nested structures (lists, blockquotes)
// - Identify code blocks by language (bash/shell = analyze, others = skip)
}
/**
* Determines if a code block contains executable instructions
* that should be analyzed (e.g., shell commands).
*/
export function isAnalyzableCode(segment: DocumentSegment): boolean {
// Return true for bash, sh, shell, zsh, fish
// Return false for other languages
}3. Prompt Builder
// analyzer/prompt-builder.ts
/**
* Builds the core analysis prompt with high-recall bias
*/
export function buildCorePrompt(): string {
// Return the CORE_ANALYSIS_PROMPT from the design
// Include:
// - Task definition
// - Extraction philosophy (5 principles)
// - Action decomposition examples
// - Conditional logic handling
// - Common patterns to watch for
// - Output format requirements
}
/**
* Builds static mode enhancement prompt
*/
export function buildStaticModePrompt(): string {
// Return the STATIC_MODE_PROMPT from the design
// Include guidance on:
// - Aggressive inference strategies
// - Placeholder variable usage
// - Handling ambiguity
// - Confidence scoring
}
/**
* Gets relevant action schemas based on paragraph content
*/
export function getRelevantSchemas(
paragraph: string,
allSchemas: Record<string, object>
): string {
// Detect likely action types using regex patterns
// Return formatted schema documentation for detected types
// Always include 'find' and 'conditional' schemas
}
/**
* Builds the complete prompt for a paragraph
*/
export function buildPrompt(
segment: DocumentSegment,
schemas: Record<string, object>
): string {
// Combine:
// 1. Core prompt
// 2. Static mode prompt
// 3. Relevant schemas
// 4. The segment content
// 5. Output format reminder
}4. LLM Provider
// llm/provider.ts
import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { openai } from '@ai-sdk/openai';
/**
* Creates an LLM provider instance based on configuration
*/
export function createProvider(config: AnalyzerConfig) {
switch (config.provider) {
case 'anthropic':
return anthropic(config.model || 'claude-sonnet-4-20250514');
case 'google':
return google(config.model || 'gemini-2.0-flash-exp');
case 'openai':
return openai(config.model || 'gpt-4o');
default:
throw new Error(`Unsupported provider: ${config.provider}`);
}
}
/**
* Generates action steps for a segment using the configured LLM
*/
export async function analyzeSegment(
segment: DocumentSegment,
prompt: string,
config: AnalyzerConfig
): Promise<{ actions: ActionStep[]; metadata: any }> {
const startTime = Date.now();
const model = createProvider(config);
const result = await generateText({
model,
prompt,
temperature: config.temperature ?? 0.3,
maxTokens: config.maxTokens ?? 4000,
});
const latencyMs = Date.now() - startTime;
// Parse JSON response
let actions: ActionStep[] = [];
try {
actions = JSON.parse(result.text);
} catch (error) {
throw new Error(`Failed to parse LLM response: ${error.message}\nResponse: ${result.text}`);
}
return {
actions,
metadata: {
promptTokens: result.usage?.promptTokens ?? 0,
completionTokens: result.usage?.completionTokens ?? 0,
latencyMs,
},
};
}5. Post-Processor
// analyzer/post-processor.ts
/**
* Adds defensive find actions before click/typeKeys actions
* to increase reliability and recall.
*/
export function addDefensiveActions(actions: ActionStep[]): ActionStep[] {
// For each click action without preceding find:
// Insert find action with same selector
// For each typeKeys action without preceding find:
// Insert find action with same selector
// After significant actions (submit, save, login):
// Add verification find action
}
/**
* Tags actions with source attribution for traceability
*/
export function tagActionsWithSource(
actions: ActionStep[],
segment: DocumentSegment
): ActionStep[] {
// Add _source field to each action containing:
// - segment type
// - segment content
// - line number
}
/**
* Validates that generated actions conform to schemas
*/
export function validateActions(
actions: ActionStep[],
schemas: Record<string, object>
): { valid: ActionStep[]; invalid: any[] } {
// Use JSON schema validation
// Return valid actions and array of validation errors
}6. Main Analyzer
// analyzer/index.ts
/**
* Analyzes a complete document and returns extracted actions
*/
export async function analyzeDocument(
document: string,
config: AnalyzerConfig,
schemas: Record<string, object>
): Promise<DocumentAnalysisResult> {
// 1. Parse document into segments
const segments = parseDocument(document);
// 2. Analyze each segment
const results: SegmentAnalysisResult[] = [];
const allActions: ActionStep[] = [];
for (const segment of segments) {
// Skip non-analyzable code blocks
if (segment.type === 'code' && !isAnalyzableCode(segment)) {
continue;
}
// Skip empty segments
if (!segment.content.trim()) {
continue;
}
// Build prompt
const prompt = buildPrompt(segment, schemas);
// Call LLM
const { actions, metadata } = await analyzeSegment(segment, prompt, config);
// Tag actions with source
const taggedActions = tagActionsWithSource(actions, segment);
results.push({
actions: taggedActions,
segment,
metadata,
});
allActions.push(...taggedActions);
}
// 3. Post-process actions
const enhancedActions = addDefensiveActions(allActions);
// 4. Validate actions
const { valid, invalid } = validateActions(enhancedActions, schemas);
if (invalid.length > 0) {
console.warn(`${invalid.length} actions failed validation:`, invalid);
}
// 5. Build summary
const summary = {
totalActions: valid.length,
totalSegments: segments.length,
analyzedSegments: results.length,
skippedSegments: segments.length - results.length,
totalTokens: results.reduce((sum, r) => sum + r.metadata.promptTokens + r.metadata.completionTokens, 0),
totalLatencyMs: results.reduce((sum, r) => sum + r.metadata.latencyMs, 0),
};
return {
actions: valid,
segments: results,
summary,
};
}7. Public API
// index.ts
import { analyzeDocument } from './analyzer';
import * as schemas from './schemas';
/**
* Main export: Static documentation analyzer
*/
export async function analyze(
document: string,
config: AnalyzerConfig
): Promise<DocumentAnalysisResult> {
// Load schemas
const actionSchemas = schemas.loadActionSchemas();
// Run analysis
return analyzeDocument(document, config, actionSchemas);
}
// Re-export types
export * from './types';8. Schema Loading
// schemas/index.ts
/**
* Loads all action schemas from JSON files
*/
export function loadActionSchemas(): Record<string, object> {
return {
click: require('./actions/click.json'),
typeKeys: require('./actions/typeKeys.json'),
goTo: require('./actions/goTo.json'),
find: require('./actions/find.json'),
httpRequest: require('./actions/httpRequest.json'),
runShell: require('./actions/runShell.json'),
conditional: require('./actions/conditional.json'),
// Add other action types as needed
};
}Configuration
Environment Variables
# Choose one provider
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_GENERATIVE_AI_API_KEY=...
OPENAI_API_KEY=sk-...
# Optional: Override default models
RESOLVER_MODEL=claude-sonnet-4-20250514
RESOLVER_TEMPERATURE=0.3
RESOLVER_MAX_TOKENS=4000Usage Example
import { analyze } from '@doc-detective/resolver';
const documentation = `
Navigate to https://example.com and log in with your credentials.
Click the Settings button in the top navigation bar.
`;
const result = await analyze(documentation, {
provider: 'anthropic',
apiKey: process.env.ANTHROPIC_API_KEY!,
});
console.log(`Extracted ${result.summary.totalActions} actions`);
console.log(JSON.stringify(result.actions, null, 2));Testing Requirements
Unit Tests
-
Document Parser
- Splits paragraphs correctly
- Preserves code blocks
- Tracks line numbers accurately
- Handles edge cases (empty lines, nested structures)
-
Prompt Builder
- Generates complete prompts
- Detects relevant action types
- Includes appropriate schemas
- Formats prompts consistently
-
Post-Processor
- Adds find actions before click/typeKeys
- Tags actions with source info
- Validates action schemas
- Handles empty action arrays
Integration Tests
-
End-to-End Analysis
- Sample documentation → expected actions
- Test with multiple LLM providers
- Verify high recall (catches all actions)
- Ensure valid JSON output
-
Complex Scenarios
- Conditional logic extraction
- Multi-step implicit actions (login flow)
- Mixed content (text + code blocks)
- Edge cases (optional steps, ambiguous language)
Test Fixtures
Create sample documentation covering:
- Simple single actions
- Multi-step sequences
- Conditional logic
- API documentation
- CLI commands
- UI interactions
- Mixed text and code
Error Handling
-
LLM API Failures
- Retry with exponential backoff (max 3 attempts)
- Log errors with context
- Continue processing remaining segments
-
Parse Failures
- Log unparseable LLM responses
- Return empty actions for failed segment
- Include error in metadata
-
Validation Failures
- Log invalid actions
- Exclude from final output
- Include validation errors in result metadata
Performance Considerations
-
Token Optimization
- Only include relevant schemas
- Limit segment size (split large paragraphs)
- Use lower temperature for consistency
-
Parallelization
- Process independent segments in parallel (future enhancement)
- Respect rate limits per provider
-
Caching
- Cache schema loading
- Consider caching prompt templates (future enhancement)
Documentation
README.md
Include:
- Installation instructions
- Quick start example
- Configuration options
- Supported LLM providers
- Action schema documentation
- Troubleshooting guide
API Documentation
Generate TypeDoc documentation for:
- All exported functions
- Configuration interfaces
- Return types
- Error types
Success Metrics
- High Recall: Captures >95% of actual actions in test documentation
- Valid Output: >98% of generated actions pass schema validation
- Performance: <5s average analysis time per 1000 words
- Provider Agnostic: Works consistently across all supported LLM providers
Future Enhancements (Out of Scope)
- Interactive analysis mode with browser context
- Action execution and validation
- Confidence scoring refinement
- User feedback loop for improving extraction
- Custom action type definitions
- Batch processing API
- Web UI for reviewing/editing actions
Acceptance Criteria
- All unit tests pass with >90% coverage
- Integration tests pass for all three LLM providers
- Extracts actions from all test fixture documents
- Generated actions validate against schemas
- Documentation is complete and accurate
- Error handling works for common failure scenarios
- Performance meets benchmarks on sample documents
- Public API is clean and easy to use
- TypeScript types are complete and accurate
Implementation Notes for Autonomous Agent
- Use the Vercel AI SDK consistently for all LLM interactions
- Follow the TypeScript style in the existing doc-detective codebase
- Add comprehensive JSDoc comments to all functions
- Use descriptive variable names that explain intent
- Write tests alongside implementation (TDD approach)
- Keep functions focused and single-purpose
- Handle errors gracefully with meaningful messages
- Log important operations for debugging
- Use
async/awaitconsistently for async operations - Validate inputs at API boundaries
Estimated Effort: 3-5 days for full implementation and testing
Priority: High
Labels: enhancement, ai, static-analysis