diff --git a/codegen-on-oss/codegen_on_oss/analyzers/README.md b/codegen-on-oss/codegen_on_oss/analyzers/README.md index e268fbd32..756a5b5e0 100644 --- a/codegen-on-oss/codegen_on_oss/analyzers/README.md +++ b/codegen-on-oss/codegen_on_oss/analyzers/README.md @@ -1,248 +1,123 @@ -# CodeGen Analyzer +# Codegen Analyzers -The CodeGen Analyzer module provides comprehensive static analysis capabilities for codebases, focusing on code quality, dependencies, structure, and visualization. It serves as a backend API that can be used by frontend applications to analyze repositories. +This directory contains the code analysis modules for the Codegen project. These analyzers provide comprehensive static code analysis, quality checking, dependency analysis, and PR validation capabilities. -## Architecture +## Modules -The analyzer system is built with a modular plugin-based architecture: +### Core Analyzers -``` -analyzers/ -├── api.py # Main API endpoints for frontend integration -├── analyzer.py # Plugin-based analyzer system -├── issues.py # Issue tracking and management -├── code_quality.py # Code quality analysis -├── dependencies.py # Dependency analysis -├── models/ -│ └── analysis_result.py # Data models for analysis results -├── context/ # Code context management -├── visualization/ # Visualization support -└── resolution/ # Issue resolution tools -``` - -## Core Components - -### 1. API Interface (`api.py`) - -The main entry point for frontend applications. Provides REST-like endpoints for: -- Codebase analysis -- PR analysis -- Dependency visualization -- Issue reporting -- Code quality assessment - -### 2. Analyzer System (`analyzer.py`) - -Plugin-based system that coordinates different types of analysis: -- Code quality analysis (complexity, maintainability) -- Dependency analysis (imports, cycles, coupling) -- PR impact analysis -- Type checking and error detection - -### 3. Issue Tracking (`issues.py`) +- **analyzer.py**: Modern analyzer architecture with plugin system +- **base_analyzer.py**: Base class for all code analyzers +- **codebase_analyzer.py**: Comprehensive codebase analysis +- **code_quality.py**: Code quality analysis +- **dependencies.py**: Dependency analysis +- **error_analyzer.py**: Error detection and analysis +- **parser.py**: Code parsing and AST generation for multiple languages -Comprehensive issue model with: -- Severity levels (critical, error, warning, info) -- Categories (dead code, complexity, dependency, etc.) -- Location information and suggestions -- Filtering and grouping capabilities +### Support Modules -### 4. Dependency Analysis (`dependencies.py`) +- **api.py**: API interface for analyzers +- **analyzer_manager.py**: Manages analyzer plugins +- **codebase_context.py**: Provides context for codebase analysis +- **codebase_visualizer.py**: Visualization tools for codebases +- **issue_analyzer.py**: Issue detection and analysis +- **issue_types.py**: Definitions for issue types +- **issues.py**: Issue tracking system -Analysis of codebase dependencies: -- Import dependencies between modules -- Circular dependency detection -- Module coupling analysis -- External dependencies tracking -- Call graphs and class hierarchies +## Parser Module -### 5. Code Quality Analysis (`code_quality.py`) +The `parser.py` module provides specialized parsing functionality for code analysis, including abstract syntax tree (AST) generation and traversal for multiple programming languages. It serves as a foundation for various code analyzers in the system. -Analysis of code quality aspects: -- Dead code detection (unused functions, variables) -- Complexity metrics (cyclomatic, cognitive) -- Parameter checking (types, usage) -- Style issues and maintainability +### Key Features -## Using the API +- Abstract syntax tree (AST) generation and traversal +- Support for multiple programming languages (Python, JavaScript, TypeScript) +- Symbol extraction (functions, classes, variables) +- Dependency analysis (imports, requires) +- Error handling and reporting -### Setup +### Usage Examples -```python -from codegen_on_oss.analyzers.api import CodegenAnalyzerAPI - -# Create API instance with repository -api = CodegenAnalyzerAPI(repo_path="/path/to/repo") -# OR -api = CodegenAnalyzerAPI(repo_url="https://github.com/owner/repo") -``` - -### Analyzing a Codebase +#### Basic Parsing ```python -# Run comprehensive analysis -results = api.analyze_codebase() +from codegen_on_oss.analyzers.parser import parse_file, parse_code -# Run specific analysis types -results = api.analyze_codebase(analysis_types=["code_quality", "dependency"]) +# Parse a file +ast = parse_file("path/to/file.py") -# Force refresh of cached analysis -results = api.analyze_codebase(force_refresh=True) +# Parse code directly +code = "def hello(): print('Hello, World!')" +ast = parse_code(code, "python") ``` -### Analyzing a PR +#### Language-Specific Parsing ```python -# Analyze a specific PR -pr_results = api.analyze_pr(pr_number=123) +from codegen_on_oss.analyzers.parser import PythonParser, JavaScriptParser, TypeScriptParser -# Get PR impact visualization -impact_viz = api.get_pr_impact(pr_number=123, format="json") -``` - -### Getting Issues - -```python -# Get all issues -all_issues = api.get_issues() +# Python parsing +python_parser = PythonParser() +python_ast = python_parser.parse_file("script.py") -# Get issues by severity -critical_issues = api.get_issues(severity="critical") -error_issues = api.get_issues(severity="error") +# JavaScript parsing +js_parser = JavaScriptParser() +js_ast = js_parser.parse_file("app.js") -# Get issues by category -dependency_issues = api.get_issues(category="dependency_cycle") +# TypeScript parsing +ts_parser = TypeScriptParser() +ts_ast = ts_parser.parse_file("component.ts") ``` -### Getting Visualizations +#### Symbol and Dependency Extraction ```python -# Get module dependency graph -module_deps = api.get_module_dependencies(format="json") - -# Get function call graph -call_graph = api.get_function_call_graph( - function_name="main", - depth=3, - format="json" -) - -# Export visualization to file -api.export_visualization(call_graph, format="html", filename="call_graph.html") -``` +from codegen_on_oss.analyzers.parser import parse_file, create_parser -### Common Analysis Patterns +# Parse a file +ast = parse_file("path/to/file.py") -```python -# Find dead code -api.analyze_codebase(analysis_types=["code_quality"]) -dead_code = api.get_issues(category="dead_code") +# Create a parser for the language +parser = create_parser("python") -# Find circular dependencies -api.analyze_codebase(analysis_types=["dependency"]) -circular_deps = api.get_circular_dependencies() +# Extract symbols (functions, classes, variables) +symbols = parser.get_symbols(ast) +for symbol in symbols: + print(f"{symbol['type']}: {symbol['name']}") -# Find parameter issues -api.analyze_codebase(analysis_types=["code_quality"]) -param_issues = api.get_parameter_issues() +# Extract dependencies (imports, requires) +dependencies = parser.get_dependencies(ast) +for dep in dependencies: + if dep["type"] == "import": + print(f"import {dep['module']}") + elif dep["type"] == "from_import": + print(f"from {dep['module']} import {dep['name']}") ``` -## REST API Endpoints +## Integration with Other Analyzers -The analyzer can be exposed as REST API endpoints for integration with frontend applications: +The analyzers in this directory work together to provide comprehensive code analysis capabilities. The typical workflow is: -### Codebase Analysis +1. Parse the code using `parser.py` +2. Analyze the code quality using `code_quality.py` +3. Analyze dependencies using `dependencies.py` +4. Detect errors using `error_analyzer.py` +5. Generate reports and visualizations -``` -POST /api/analyze/codebase -{ - "repo_path": "/path/to/repo", - "analysis_types": ["code_quality", "dependency"] -} -``` +## API Usage -### PR Analysis +The `api.py` module provides a high-level interface for using the analyzers: -``` -POST /api/analyze/pr -{ - "repo_path": "/path/to/repo", - "pr_number": 123 -} -``` +```python +from codegen_on_oss.analyzers.api import create_api, api_analyze_codebase -### Visualization +# Create API instance +api = create_api() -``` -POST /api/visualize -{ - "repo_path": "/path/to/repo", - "viz_type": "module_dependencies", - "params": { - "layout": "hierarchical", - "format": "json" - } -} -``` - -### Issues +# Analyze a codebase +result = api_analyze_codebase(repo_url="https://github.com/user/repo") +# Access analysis results +print(f"Issues found: {len(result.issues)}") +print(f"Code quality score: {result.quality_score}") ``` -GET /api/issues?severity=error&category=dependency_cycle -``` - -## Implementation Example - -For a web application exposing these endpoints with Flask: - -```python -from flask import Flask, request, jsonify -from codegen_on_oss.analyzers.api import ( - api_analyze_codebase, - api_analyze_pr, - api_get_visualization, - api_get_static_errors -) - -app = Flask(__name__) - -@app.route("/api/analyze/codebase", methods=["POST"]) -def analyze_codebase(): - data = request.json - result = api_analyze_codebase( - repo_path=data.get("repo_path"), - analysis_types=data.get("analysis_types") - ) - return jsonify(result) - -@app.route("/api/analyze/pr", methods=["POST"]) -def analyze_pr(): - data = request.json - result = api_analyze_pr( - repo_path=data.get("repo_path"), - pr_number=data.get("pr_number") - ) - return jsonify(result) - -@app.route("/api/visualize", methods=["POST"]) -def visualize(): - data = request.json - result = api_get_visualization( - repo_path=data.get("repo_path"), - viz_type=data.get("viz_type"), - params=data.get("params", {}) - ) - return jsonify(result) - -@app.route("/api/issues", methods=["GET"]) -def get_issues(): - repo_path = request.args.get("repo_path") - severity = request.args.get("severity") - category = request.args.get("category") - - api = create_api(repo_path=repo_path) - return jsonify(api.get_issues(severity=severity, category=category)) - -if __name__ == "__main__": - app.run(debug=True) -``` \ No newline at end of file diff --git a/codegen-on-oss/codegen_on_oss/analyzers/__init__.py b/codegen-on-oss/codegen_on_oss/analyzers/__init__.py index f1ef5c5b4..a5262bffd 100644 --- a/codegen-on-oss/codegen_on_oss/analyzers/__init__.py +++ b/codegen-on-oss/codegen_on_oss/analyzers/__init__.py @@ -6,33 +6,40 @@ as an API backend for frontend applications. """ -# Main API interface -from codegen_on_oss.analyzers.api import ( - CodegenAnalyzerAPI, - create_api, - api_analyze_codebase, - api_analyze_pr, - api_get_visualization, - api_get_static_errors -) - # Modern analyzer architecture from codegen_on_oss.analyzers.analyzer import ( AnalyzerManager, AnalyzerPlugin, AnalyzerRegistry, CodeQualityPlugin, - DependencyPlugin + DependencyPlugin, +) +from codegen_on_oss.analyzers.api import ( + CodegenAnalyzerAPI, + api_analyze_codebase, + api_analyze_pr, + api_get_static_errors, + api_get_visualization, + create_api, ) +# Legacy analyzer interfaces (for backward compatibility) +from codegen_on_oss.analyzers.base_analyzer import BaseCodeAnalyzer + +# Core analysis modules +from codegen_on_oss.analyzers.code_quality import CodeQualityAnalyzer +from codegen_on_oss.analyzers.codebase_analyzer import CodebaseAnalyzer +from codegen_on_oss.analyzers.dependencies import DependencyAnalyzer +from codegen_on_oss.analyzers.error_analyzer import CodebaseAnalyzer as ErrorAnalyzer + # Issue tracking system from codegen_on_oss.analyzers.issues import ( + AnalysisType, + CodeLocation, Issue, + IssueCategory, IssueCollection, IssueSeverity, - AnalysisType, - IssueCategory, - CodeLocation ) # Analysis result models @@ -40,54 +47,60 @@ AnalysisResult, CodeQualityResult, DependencyResult, - PrAnalysisResult + PrAnalysisResult, +) +from codegen_on_oss.analyzers.parser import ( + ASTNode, + BaseParser, + CodegenParser, + JavaScriptParser, + PythonParser, + TypeScriptParser, + create_parser, + parse_code, + parse_file, ) - -# Core analysis modules -from codegen_on_oss.analyzers.code_quality import CodeQualityAnalyzer -from codegen_on_oss.analyzers.dependencies import DependencyAnalyzer - -# Legacy analyzer interfaces (for backward compatibility) -from codegen_on_oss.analyzers.base_analyzer import BaseCodeAnalyzer -from codegen_on_oss.analyzers.codebase_analyzer import CodebaseAnalyzer -from codegen_on_oss.analyzers.error_analyzer import CodebaseAnalyzer as ErrorAnalyzer __all__ = [ - # Main API - 'CodegenAnalyzerAPI', - 'create_api', - 'api_analyze_codebase', - 'api_analyze_pr', - 'api_get_visualization', - 'api_get_static_errors', - - # Modern architecture - 'AnalyzerManager', - 'AnalyzerPlugin', - 'AnalyzerRegistry', - 'CodeQualityPlugin', - 'DependencyPlugin', - - # Issue tracking - 'Issue', - 'IssueCollection', - 'IssueSeverity', - 'AnalysisType', - 'IssueCategory', - 'CodeLocation', - + "ASTNode", # Analysis results - 'AnalysisResult', - 'CodeQualityResult', - 'DependencyResult', - 'PrAnalysisResult', - - # Core analyzers - 'CodeQualityAnalyzer', - 'DependencyAnalyzer', - + "AnalysisResult", + "AnalysisType", + # Modern architecture + "AnalyzerManager", + "AnalyzerPlugin", + "AnalyzerRegistry", # Legacy interfaces (for backward compatibility) - 'BaseCodeAnalyzer', - 'CodebaseAnalyzer', - 'ErrorAnalyzer', -] \ No newline at end of file + "BaseCodeAnalyzer", + "BaseParser", + "CodeLocation", + # Core analyzers + "CodeQualityAnalyzer", + "CodeQualityPlugin", + "CodeQualityResult", + "CodebaseAnalyzer", + # Main API + "CodegenAnalyzerAPI", + "CodegenParser", + "DependencyAnalyzer", + "DependencyPlugin", + "DependencyResult", + "ErrorAnalyzer", + # Issue tracking + "Issue", + "IssueCategory", + "IssueCollection", + "IssueSeverity", + "JavaScriptParser", + "PrAnalysisResult", + "PythonParser", + "TypeScriptParser", + "api_analyze_codebase", + "api_analyze_pr", + "api_get_static_errors", + "api_get_visualization", + "create_api", + "create_parser", + "parse_code", + "parse_file", +] diff --git a/codegen-on-oss/codegen_on_oss/analyzers/parser.py b/codegen-on-oss/codegen_on_oss/analyzers/parser.py new file mode 100644 index 000000000..354979902 --- /dev/null +++ b/codegen-on-oss/codegen_on_oss/analyzers/parser.py @@ -0,0 +1,529 @@ +#!/usr/bin/env python3 +""" +Code Parser Module for Analyzers + +This module provides specialized parsing functionality for code analysis, +including abstract syntax tree (AST) generation and traversal for multiple +programming languages. It serves as a foundation for various code analyzers +in the system. +""" + +import importlib.util +import logging +import sys +from enum import Enum +from typing import Any, Optional, TypeVar, Dict, List, Tuple, Union, Protocol, runtime_checkable, cast, Type, Callable + +# Check if required modules are available +if importlib.util.find_spec("codegen.sdk") is None: + print("Codegen SDK not found.") + sys.exit(1) + +# Configure logging +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + handlers=[logging.StreamHandler()], +) +logger = logging.getLogger(__name__) + +# Type variable for generic parser implementations +T = TypeVar("T") + + +class ParserType(Enum): + """Enum defining the types of parsers available.""" + + PYTHON = "python" + JAVASCRIPT = "javascript" + TYPESCRIPT = "typescript" + GENERIC = "generic" + + +class ParseError(Exception): + """Exception raised for errors during parsing.""" + + pass + + +class ASTNode: + """ + Base class representing a node in an Abstract Syntax Tree. + + This provides a common interface for working with AST nodes + regardless of the underlying parser implementation. + """ + + def __init__( + self, + node_type: str, + value: str | None = None, + children: list["ASTNode"] | None = None, + parent: Optional["ASTNode"] = None, + start_position: tuple[int, int] | None = None, + end_position: tuple[int, int] | None = None, + metadata: dict[str, Any] | None = None, + ): + """ + Initialize an AST node. + + Args: + node_type: Type of the node (e.g., 'function', 'class', 'variable') + value: Optional value associated with the node + children: List of child nodes + parent: Parent node + start_position: Tuple of (line, column) for the start position + end_position: Tuple of (line, column) for the end position + metadata: Additional metadata for the node + """ + self.node_type = node_type + self.value = value + self.children = children or [] + self.parent = parent + self.start_position = start_position + self.end_position = end_position + self.metadata = metadata or {} + + def add_child(self, child: "ASTNode") -> None: + """ + Add a child node to this node. + + Args: + child: Child node to add + """ + self.children.append(child) + child.parent = self + + def find_nodes_by_type(self, node_type: str) -> list["ASTNode"]: + """ + Find all descendant nodes of a specific type. + + Args: + node_type: Type of nodes to find + + Returns: + List of matching nodes + """ + result = [] + if self.node_type == node_type: + result.append(self) + + for child in self.children: + result.extend(child.find_nodes_by_type(node_type)) + + return result + + def to_dict(self) -> dict[str, Any]: + """ + Convert the node to a dictionary representation. + + Returns: + Dictionary representation of the node + """ + return { + "type": self.node_type, + "value": self.value, + "start_position": self.start_position, + "end_position": self.end_position, + "metadata": self.metadata, + "children": [child.to_dict() for child in self.children], + } + + def __repr__(self) -> str: + """String representation of the node.""" + return f"ASTNode({self.node_type}, value={self.value}, children={len(self.children)})" + + +class BaseParser: + """ + Abstract base class for all parsers. + + This defines the interface that all parsers must implement. + """ + + def parse_file(self, file_path: str) -> ASTNode: + """ + Parse a file and return an AST. + + Args: + file_path: Path to the file to parse + + Returns: + AST node representing the parsed file + + Raises: + ParseError: If there is an error parsing the file + """ + raise NotImplementedError("Subclasses must implement parse_file") + + def parse_code(self, code: str, filename: str = "") -> ASTNode: + """ + Parse code directly and return an AST. + + Args: + code: Code to parse + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + + Raises: + ParseError: If there is an error parsing the code + """ + raise NotImplementedError("Subclasses must implement parse_code") + + def get_symbols(self, ast: ASTNode) -> List[Dict[str, Any]]: + """ + Extract symbols (functions, classes, variables) from an AST. + + Args: + ast: AST to extract symbols from + + Returns: + List of symbols with their metadata + """ + raise NotImplementedError("Subclasses must implement get_symbols") + + def get_dependencies(self, ast: ASTNode) -> List[Dict[str, Any]]: + """ + Extract dependencies (imports, requires) from an AST. + + Args: + ast: AST to extract dependencies from + + Returns: + List of dependencies with their metadata + """ + raise NotImplementedError("Subclasses must implement get_dependencies") + + +class CodegenParser(BaseParser): + """ + Parser implementation using Codegen SDK. + + This parser uses the Codegen SDK to parse code and generate ASTs. + """ + + def __init__(self) -> None: + """Initialize the parser.""" + super().__init__() + # Import Codegen SDK here to avoid circular imports + try: + from codegen.sdk.codebase import codebase_analysis + self.codebase_analysis = codebase_analysis + except ImportError: + logger.error("Failed to import Codegen SDK. Make sure it's installed.") + raise ImportError("Codegen SDK is required for CodegenParser") + + def parse_file(self, file_path: str) -> ASTNode: + """ + Parse a file using Codegen SDK. + + Args: + file_path: Path to the file to parse + + Returns: + AST node representing the parsed file + """ + try: + # This is a placeholder for actual SDK implementation + # In a real implementation, we would use the SDK to parse the file + with open(file_path, "r", encoding="utf-8") as f: + code = f.read() + return self.parse_code(code, file_path) + except Exception as e: + logger.error(f"Error parsing file {file_path}: {e}") + raise ParseError(f"Error parsing file {file_path}: {e}") + + def parse_code(self, code: str, filename: str = "") -> ASTNode: + """ + Parse code using Codegen SDK. + + Args: + code: Code to parse + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + """ + try: + # This is a placeholder for actual SDK implementation + # In a real implementation, we would use the SDK to parse the code + root = ASTNode("file", value=filename) + # Add some basic structure based on simple parsing + lines = code.split("\n") + for i, line in enumerate(lines): + line = line.strip() + if line.startswith("def "): + # Simple function detection + func_name = line[4:].split("(")[0].strip() + func_node = ASTNode( + "function", + value=func_name, + start_position=(i, 0), + end_position=(i, len(line)), + metadata={"line": i} + ) + root.add_child(func_node) + elif line.startswith("class "): + # Simple class detection + class_name = line[6:].split("(")[0].split(":")[0].strip() + class_node = ASTNode( + "class", + value=class_name, + start_position=(i, 0), + end_position=(i, len(line)), + metadata={"line": i} + ) + root.add_child(class_node) + elif line.startswith("import ") or line.startswith("from "): + # Simple import detection + import_node = ASTNode( + "import", + value=line, + start_position=(i, 0), + end_position=(i, len(line)), + metadata={"line": i} + ) + root.add_child(import_node) + return root + except Exception as e: + logger.error(f"Error parsing code: {e}") + raise ParseError(f"Error parsing code: {e}") + + def get_symbols(self, ast: ASTNode) -> List[Dict[str, Any]]: + """ + Extract symbols from an AST. + + Args: + ast: AST to extract symbols from + + Returns: + List of symbols with their metadata + """ + symbols = [] + + # Find function nodes + for func_node in ast.find_nodes_by_type("function"): + symbols.append({ + "type": "function", + "name": func_node.value or "", + "line": func_node.metadata.get("line", 0), + "start_position": func_node.start_position, + "end_position": func_node.end_position + }) + + # Find class nodes + for class_node in ast.find_nodes_by_type("class"): + methods = [] + for method_node in class_node.find_nodes_by_type("function"): + methods.append(method_node.value or "") + + symbols.append({ + "type": "class", + "name": class_node.value or "", + "methods": methods, + "line": class_node.metadata.get("line", 0), + "start_position": class_node.start_position, + "end_position": class_node.end_position + }) + + return symbols + + def get_dependencies(self, ast: ASTNode) -> List[Dict[str, Any]]: + """ + Extract dependencies from an AST. + + Args: + ast: AST to extract dependencies from + + Returns: + List of dependencies with their metadata + """ + dependencies = [] + + # Find import nodes + for import_node in ast.find_nodes_by_type("import"): + if import_node.value: + if import_node.value.startswith("import "): + module = import_node.value[7:].strip() + dependencies.append({ + "type": "import", + "module": module, + "line": import_node.metadata.get("line", 0) + }) + elif import_node.value.startswith("from "): + parts = import_node.value.split(" import ") + if len(parts) == 2: + module = parts[0][5:].strip() + names = [n.strip() for n in parts[1].split(",")] + for name in names: + dependencies.append({ + "type": "from_import", + "module": module, + "name": name, + "line": import_node.metadata.get("line", 0) + }) + + return dependencies + + +class PythonParser(CodegenParser): + """ + Parser for Python code. + + This parser specializes in parsing Python code and extracting Python-specific + symbols and dependencies. + """ + + def parse_code(self, code: str, filename: str = "") -> ASTNode: + """ + Parse Python code. + + Args: + code: Python code to parse + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + """ + try: + # In a real implementation, we would use Python's ast module + # or a more sophisticated parser + return super().parse_code(code, filename) + except Exception as e: + logger.error(f"Error parsing Python code: {e}") + raise ParseError(f"Error parsing Python code: {e}") + + +class JavaScriptParser(CodegenParser): + """ + Parser for JavaScript code. + + This parser specializes in parsing JavaScript code and extracting JavaScript-specific + symbols and dependencies. + """ + + def parse_code(self, code: str, filename: str = "") -> ASTNode: + """ + Parse JavaScript code. + + Args: + code: JavaScript code to parse + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + """ + try: + # In a real implementation, we would use a JavaScript parser + # like esprima or acorn + return super().parse_code(code, filename) + except Exception as e: + logger.error(f"Error parsing JavaScript code: {e}") + raise ParseError(f"Error parsing JavaScript code: {e}") + + +class TypeScriptParser(CodegenParser): + """ + Parser for TypeScript code. + + This parser specializes in parsing TypeScript code and extracting TypeScript-specific + symbols and dependencies. + """ + + def parse_code(self, code: str, filename: str = "") -> ASTNode: + """ + Parse TypeScript code. + + Args: + code: TypeScript code to parse + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + """ + try: + # In a real implementation, we would use a TypeScript parser + # like typescript-eslint or ts-morph + return super().parse_code(code, filename) + except Exception as e: + logger.error(f"Error parsing TypeScript code: {e}") + raise ParseError(f"Error parsing TypeScript code: {e}") + + +def create_parser(language: str) -> BaseParser: + """ + Create a parser for the specified language. + + Args: + language: Language to create a parser for (python, javascript, typescript) + + Returns: + Parser for the specified language + + Raises: + ValueError: If the language is not supported + """ + language = language.lower() + if language == "python": + return PythonParser() + elif language == "javascript": + return JavaScriptParser() + elif language == "typescript": + return TypeScriptParser() + else: + logger.warning(f"Unsupported language: {language}, using generic parser") + return CodegenParser() + + +def parse_file(file_path: str) -> ASTNode: + """ + Parse a file and return an AST. + + This is a convenience function that creates a parser based on the file extension + and uses it to parse the file. + + Args: + file_path: Path to the file to parse + + Returns: + AST node representing the parsed file + + Raises: + ParseError: If there is an error parsing the file + """ + # Determine language from file extension + if file_path.endswith(".py"): + language = "python" + elif file_path.endswith(".js"): + language = "javascript" + elif file_path.endswith(".ts"): + language = "typescript" + else: + language = "generic" + + parser = create_parser(language) + return parser.parse_file(file_path) + + +def parse_code(code: str, language: str, filename: str = "") -> ASTNode: + """ + Parse code directly and return an AST. + + This is a convenience function that creates a parser for the specified language + and uses it to parse the code. + + Args: + code: Code to parse + language: Language of the code (python, javascript, typescript) + filename: Optional filename for error reporting + + Returns: + AST node representing the parsed code + + Raises: + ParseError: If there is an error parsing the code + """ + parser = create_parser(language) + return parser.parse_code(code, filename) diff --git a/codegen-on-oss/examples/parser_example.py b/codegen-on-oss/examples/parser_example.py new file mode 100644 index 000000000..6f8fffaba --- /dev/null +++ b/codegen-on-oss/examples/parser_example.py @@ -0,0 +1,237 @@ +#!/usr/bin/env python3 +""" +Example script demonstrating how to use the analyzers.parser module. +""" + +import os +import sys +from pathlib import Path + +# Add the parent directory to the path so we can import the module +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) + +from codegen_on_oss.analyzers.parser import ( + parse_file, + parse_code, + create_parser, + PythonParser, + JavaScriptParser, + TypeScriptParser +) + +def parse_file_example(): + """Example of parsing a file.""" + # Create a sample Python file + sample_file = Path("sample_code.py") + with open(sample_file, "w") as f: + f.write(""" +import os +import sys +from pathlib import Path + +def hello_world(): + print("Hello, World!") + return True + +class ExampleClass: + def __init__(self, name): + self.name = name + + def greet(self): + print(f"Hello, {self.name}!") + return self.name +""") + + try: + # Parse the file + print(f"Parsing file: {sample_file}") + ast = parse_file(sample_file) + + # Get symbols + parser = create_parser("python") + symbols = parser.get_symbols(ast) + + print(f"\nSymbols found ({len(symbols)}):") + for symbol in symbols: + if symbol["type"] == "class": + print(f" Class: {symbol['name']} with methods: {', '.join(symbol['methods'])}") + elif symbol["type"] == "function": + print(f" Function: {symbol['name']}") + elif symbol["type"] == "variable": + print(f" Variable: {symbol['name']}") + + # Get dependencies + dependencies = parser.get_dependencies(ast) + + print(f"\nDependencies found ({len(dependencies)}):") + for dep in dependencies: + if dep["type"] == "import": + if "alias" in dep: + print(f" import {dep['module']} as {dep['alias']}") + else: + print(f" import {dep['module']}") + elif dep["type"] == "from_import": + print(f" from {dep['module']} import {dep['name']}") + + finally: + # Clean up + if sample_file.exists(): + sample_file.unlink() + +def parse_code_example(): + """Example of parsing code directly.""" + # Sample JavaScript code + js_code = """ +import { useState } from 'react'; +import axios from 'axios'; + +function FetchData() { + const [data, setData] = useState(null); + const [loading, setLoading] = useState(false); + const [error, setError] = useState(null); + + const fetchData = async (url) => { + try { + setLoading(true); + const response = await axios.get(url); + setData(response.data); + setError(null); + } catch (err) { + setError(err.message); + setData(null); + } finally { + setLoading(false); + } + }; + + return { data, loading, error, fetchData }; +} + +class DataProvider { + constructor(baseUrl) { + this.baseUrl = baseUrl; + this.client = axios.create({ + baseURL: baseUrl + }); + } + + async get(endpoint) { + return await this.client.get(endpoint); + } +} + +export { FetchData, DataProvider }; +""" + + # Parse the code + print("\nParsing JavaScript code:") + ast = parse_code(js_code, "javascript", "example.js") + + # Get symbols + parser = create_parser("javascript") + symbols = parser.get_symbols(ast) + + print(f"\nSymbols found ({len(symbols)}):") + for symbol in symbols: + if symbol["type"] == "class": + print(f" Class: {symbol['name']} with methods: {', '.join(symbol['methods'])}") + elif symbol["type"] == "function": + print(f" Function: {symbol['name']}") + elif symbol["type"] == "variable": + print(f" Variable: {symbol['name']}") + + # Get dependencies + dependencies = parser.get_dependencies(ast) + + print(f"\nDependencies found ({len(dependencies)}):") + for dep in dependencies: + if dep["type"] == "import": + if "alias" in dep: + print(f" import {dep['module']} as {dep['alias']}") + else: + print(f" import {dep['module']}") + elif dep["type"] == "from_import": + print(f" from {dep['module']} import {dep['name']}") + +def language_specific_parsers_example(): + """Example of using language-specific parsers.""" + # Sample TypeScript code + ts_code = """ +import { Component } from '@angular/core'; +import { HttpClient } from '@angular/common/http'; +import { Observable } from 'rxjs'; + +interface User { + id: number; + name: string; + email: string; +} + +@Component({ + selector: 'app-user-list', + templateUrl: './user-list.component.html' +}) +export class UserListComponent { + users: User[] = []; + loading: boolean = false; + + constructor(private http: HttpClient) {} + + ngOnInit(): void { + this.getUsers(); + } + + getUsers(): void { + this.loading = true; + this.http.get('/api/users') + .subscribe({ + next: (data) => { + this.users = data; + this.loading = false; + }, + error: (err) => { + console.error('Error fetching users', err); + this.loading = false; + } + }); + } +} +""" + + # Parse with TypeScript parser + print("\nParsing TypeScript code with TypeScriptParser:") + parser = TypeScriptParser() + ast = parser.parse_code(ts_code, "example.ts") + + # Get symbols + symbols = parser.get_symbols(ast) + + print(f"\nSymbols found ({len(symbols)}):") + for symbol in symbols: + if symbol["type"] == "class": + print(f" Class: {symbol['name']} with methods: {', '.join(symbol['methods'])}") + elif symbol["type"] == "function": + print(f" Function: {symbol['name']}") + elif symbol["type"] == "variable": + print(f" Variable: {symbol['name']}") + + # Get dependencies + dependencies = parser.get_dependencies(ast) + + print(f"\nDependencies found ({len(dependencies)}):") + for dep in dependencies: + if dep["type"] == "import": + if "alias" in dep: + print(f" import {dep['module']} as {dep['alias']}") + else: + print(f" import {dep['module']}") + elif dep["type"] == "from_import": + print(f" from {dep['module']} import {dep['name']}") + +if __name__ == "__main__": + print("=== Parser Examples ===") + parse_file_example() + parse_code_example() + language_specific_parsers_example() + print("\nAll examples completed successfully!") + diff --git a/codegen-on-oss/tests/test_analyzers_parser.py b/codegen-on-oss/tests/test_analyzers_parser.py new file mode 100644 index 000000000..5e054d4f4 --- /dev/null +++ b/codegen-on-oss/tests/test_analyzers_parser.py @@ -0,0 +1,374 @@ +#!/usr/bin/env python3 +""" +Tests for the analyzers.parser module. +""" + +import os +import sys +import unittest +from pathlib import Path +from unittest.mock import MagicMock, patch + +# Add the parent directory to the path so we can import the module +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) + +from codegen_on_oss.analyzers.parser import ( + ASTNode, + BaseParser, + CodegenParser, + PythonParser, + JavaScriptParser, + TypeScriptParser, + create_parser, + parse_file, + parse_code, + ParseError +) + +class TestASTNode(unittest.TestCase): + """Tests for the ASTNode class.""" + + def test_init(self): + """Test initialization of ASTNode.""" + node = ASTNode( + node_type="function", + value="test_func", + start_position=(1, 1), + end_position=(10, 10), + metadata={"test": "value"} + ) + + self.assertEqual(node.node_type, "function") + self.assertEqual(node.value, "test_func") + self.assertEqual(node.start_position, (1, 1)) + self.assertEqual(node.end_position, (10, 10)) + self.assertEqual(node.metadata, {"test": "value"}) + self.assertEqual(node.children, []) + self.assertIsNone(node.parent) + + def test_add_child(self): + """Test adding a child to a node.""" + parent = ASTNode(node_type="class", value="TestClass") + child = ASTNode(node_type="method", value="test_method") + + parent.add_child(child) + + self.assertEqual(len(parent.children), 1) + self.assertEqual(parent.children[0], child) + self.assertEqual(child.parent, parent) + + def test_find_nodes_by_type(self): + """Test finding nodes by type.""" + root = ASTNode(node_type="file", value="test.py") + class_node = ASTNode(node_type="class", value="TestClass") + method1 = ASTNode(node_type="method", value="test_method1") + method2 = ASTNode(node_type="method", value="test_method2") + + root.add_child(class_node) + class_node.add_child(method1) + class_node.add_child(method2) + + # Find all method nodes + methods = root.find_nodes_by_type("method") + self.assertEqual(len(methods), 2) + self.assertEqual(methods[0].value, "test_method1") + self.assertEqual(methods[1].value, "test_method2") + + # Find all class nodes + classes = root.find_nodes_by_type("class") + self.assertEqual(len(classes), 1) + self.assertEqual(classes[0].value, "TestClass") + + def test_to_dict(self): + """Test converting a node to a dictionary.""" + node = ASTNode( + node_type="function", + value="test_func", + start_position=(1, 1), + end_position=(10, 10), + metadata={"test": "value"} + ) + + node_dict = node.to_dict() + + self.assertEqual(node_dict["type"], "function") + self.assertEqual(node_dict["value"], "test_func") + self.assertEqual(node_dict["start_position"], (1, 1)) + self.assertEqual(node_dict["end_position"], (10, 10)) + self.assertEqual(node_dict["metadata"], {"test": "value"}) + self.assertEqual(node_dict["children"], []) + +class TestCodegenParser(unittest.TestCase): + """Tests for the CodegenParser class.""" + + def setUp(self): + """Set up test fixtures.""" + self.mock_codebase = MagicMock() + self.parser = CodegenParser(language="python", codebase=self.mock_codebase) + + @patch('builtins.open', new_callable=unittest.mock.mock_open, read_data="def test_func():\n pass\n") + def test_parse_file(self, mock_open): + """Test parsing a file.""" + # Mock the parse_code method to avoid actual parsing + self.parser.parse_code = MagicMock(return_value=ASTNode(node_type="file", value="test.py")) + + result = self.parser.parse_file("test.py") + + # Verify that parse_code was called with the file content + self.parser.parse_code.assert_called_once() + self.assertEqual(result.node_type, "file") + self.assertEqual(result.value, "test.py") + + def test_parse_code_simple(self): + """Test parsing a simple code snippet.""" + code = """ +def test_func(): + x = 1 + return x + +class TestClass: + def __init__(self): + self.value = 0 + + def test_method(self): + return self.value +""" + + result = self.parser.parse_code(code, "test.py") + + # Verify the basic structure + self.assertEqual(result.node_type, "file") + self.assertEqual(result.value, "test.py") + + # Find all functions + functions = result.find_nodes_by_type("function") + self.assertEqual(len(functions), 1) + self.assertEqual(functions[0].value, "test_func") + + # Find all classes + classes = result.find_nodes_by_type("class") + self.assertEqual(len(classes), 1) + self.assertEqual(classes[0].value, "TestClass") + + # Find all methods + methods = result.find_nodes_by_type("method") + self.assertEqual(len(methods), 2) + self.assertEqual(methods[0].value, "__init__") + self.assertEqual(methods[1].value, "test_method") + + def test_get_symbols(self): + """Test extracting symbols from an AST.""" + # Create a simple AST + root = ASTNode(node_type="file", value="test.py") + + class_node = ASTNode( + node_type="class", + value="TestClass", + start_position=(5, 1), + end_position=(15, 1), + metadata={"indentation": 0} + ) + + method_node = ASTNode( + node_type="method", + value="test_method", + start_position=(7, 5), + end_position=(9, 5), + metadata={"indentation": 4, "class": "TestClass"} + ) + + func_node = ASTNode( + node_type="function", + value="test_func", + start_position=(1, 1), + end_position=(3, 1), + metadata={"indentation": 0} + ) + + var_node = ASTNode( + node_type="variable", + value="test_var", + start_position=(17, 1), + end_position=(17, 10), + metadata={} + ) + + root.add_child(func_node) + root.add_child(class_node) + class_node.add_child(method_node) + root.add_child(var_node) + + # Get symbols + symbols = self.parser.get_symbols(root) + + # Verify symbols + self.assertEqual(len(symbols), 3) # 1 class, 1 function, 1 variable + + # Check class symbol + class_symbol = next(s for s in symbols if s["type"] == "class") + self.assertEqual(class_symbol["name"], "TestClass") + self.assertEqual(class_symbol["start_line"], 5) + self.assertEqual(class_symbol["end_line"], 15) + self.assertEqual(class_symbol["methods"], ["test_method"]) + + # Check function symbol + func_symbol = next(s for s in symbols if s["type"] == "function") + self.assertEqual(func_symbol["name"], "test_func") + self.assertEqual(func_symbol["start_line"], 1) + self.assertEqual(func_symbol["end_line"], 3) + + # Check variable symbol + var_symbol = next(s for s in symbols if s["type"] == "variable") + self.assertEqual(var_symbol["name"], "test_var") + self.assertEqual(var_symbol["line"], 17) + + def test_get_dependencies(self): + """Test extracting dependencies from an AST.""" + # Create a simple AST with imports + root = ASTNode(node_type="file", value="test.py") + + import1 = ASTNode( + node_type="import", + value="import os", + start_position=(1, 1), + end_position=(1, 9), + metadata={} + ) + + import2 = ASTNode( + node_type="import", + value="import sys as system", + start_position=(2, 1), + end_position=(2, 20), + metadata={} + ) + + import3 = ASTNode( + node_type="import", + value="from pathlib import Path", + start_position=(3, 1), + end_position=(3, 25), + metadata={} + ) + + root.add_child(import1) + root.add_child(import2) + root.add_child(import3) + + # Get dependencies + dependencies = self.parser.get_dependencies(root) + + # Verify dependencies + self.assertEqual(len(dependencies), 3) + + # Check simple import + os_import = next(d for d in dependencies if d.get("module") == "os") + self.assertEqual(os_import["type"], "import") + self.assertEqual(os_import["line"], 1) + + # Check import with alias + sys_import = next(d for d in dependencies if d.get("module") == "sys") + self.assertEqual(sys_import["type"], "import") + self.assertEqual(sys_import["alias"], "system") + self.assertEqual(sys_import["line"], 2) + + # Check from import + path_import = next(d for d in dependencies if d.get("module") == "pathlib") + self.assertEqual(path_import["type"], "from_import") + self.assertEqual(path_import["name"], "Path") + self.assertEqual(path_import["line"], 3) + +class TestLanguageSpecificParsers(unittest.TestCase): + """Tests for language-specific parsers.""" + + def test_python_parser(self): + """Test PythonParser initialization.""" + parser = PythonParser() + self.assertEqual(parser.language, "python") + + def test_javascript_parser(self): + """Test JavaScriptParser initialization.""" + parser = JavaScriptParser() + self.assertEqual(parser.language, "javascript") + + def test_typescript_parser(self): + """Test TypeScriptParser initialization.""" + parser = TypeScriptParser() + self.assertEqual(parser.language, "typescript") + + def test_create_parser(self): + """Test create_parser factory function.""" + python_parser = create_parser("python") + self.assertIsInstance(python_parser, PythonParser) + + js_parser = create_parser("javascript") + self.assertIsInstance(js_parser, JavaScriptParser) + + ts_parser = create_parser("typescript") + self.assertIsInstance(ts_parser, TypeScriptParser) + + # Test case insensitivity + py_parser = create_parser("PYTHON") + self.assertIsInstance(py_parser, PythonParser) + + # Test unknown language + generic_parser = create_parser("unknown") + self.assertIsInstance(generic_parser, CodegenParser) + self.assertEqual(generic_parser.language, "unknown") + +class TestParserUtilityFunctions(unittest.TestCase): + """Tests for parser utility functions.""" + + @patch('codegen_on_oss.analyzers.parser.create_parser') + def test_parse_file(self, mock_create_parser): + """Test parse_file utility function.""" + # Setup mock parser + mock_parser = MagicMock() + mock_parser.parse_file.return_value = ASTNode(node_type="file", value="test.py") + mock_create_parser.return_value = mock_parser + + # Call parse_file + result = parse_file("test.py", "python") + + # Verify parser creation and method calls + mock_create_parser.assert_called_once_with("python", None, None) + mock_parser.parse_file.assert_called_once() + self.assertEqual(result.node_type, "file") + self.assertEqual(result.value, "test.py") + + @patch('codegen_on_oss.analyzers.parser.create_parser') + def test_parse_code(self, mock_create_parser): + """Test parse_code utility function.""" + # Setup mock parser + mock_parser = MagicMock() + mock_parser.parse_code.return_value = ASTNode(node_type="file", value="test.py") + mock_create_parser.return_value = mock_parser + + # Call parse_code + code = "def test(): pass" + result = parse_code(code, "python", "test.py") + + # Verify parser creation and method calls + mock_create_parser.assert_called_once_with("python", None, None) + mock_parser.parse_code.assert_called_once_with(code, "test.py") + self.assertEqual(result.node_type, "file") + self.assertEqual(result.value, "test.py") + + @patch('codegen_on_oss.analyzers.parser.create_parser') + def test_parse_file_auto_language_detection(self, mock_create_parser): + """Test auto language detection in parse_file.""" + # Setup mock parser + mock_parser = MagicMock() + mock_parser.parse_file.return_value = ASTNode(node_type="file", value="test.py") + mock_create_parser.return_value = mock_parser + + # Call parse_file with no language specified + result = parse_file("test.py") + + # Verify parser creation with auto-detected language + mock_create_parser.assert_called_once_with("python", None, None) + mock_parser.parse_file.assert_called_once() + +if __name__ == '__main__': + unittest.main() +