|
| 1 | +# ABCoder - Language Parser Introduction |
| 2 | + |
| 3 | +ABCoder currently implements Parser based on the [LSP](https://microsoft.github.io/language-server-protocol/) protocol to achieve precise dependency collection and facilitate future multi-language extensions. |
| 4 | + |
| 5 | +## Code Structure |
| 6 | + |
| 7 | +Located under the [lang](/lang) package, including: |
| 8 | + |
| 9 | +- uniast: Golang definitions for unified AST structure |
| 10 | +- lsp: LSP protocol processing client, providing interfaces for file parsing, reference lookup, syntax tree parsing, definition lookup, etc., as well as the **generic language specification LanguageSpec interface** |
| 11 | +- collect: Responsible for LSP symbol collection and UniAST export, which is the core computation logic |
| 12 | +- {language}: Mainly implements the corresponding {language} specification for the lsp#Spec interface. Also includes some specific calling logic for LSP servers |
| 13 | + |
| 14 | +## Operation Process |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +1. Identify the language through command line parameters to start the corresponding LSP server and pass initialization parameters |
| 19 | +2. Traverse repository files, call the `textDocument/documentSymbol` method to get all symbols for each file. For each symbol: |
| 20 | + 1. Call the `textDocument/semanticTokens/range` method to get tokens in the symbol code |
| 21 | + 2. Identify valid entity tokens, call `textDocument/definition` to jump to the corresponding symbol location, thus establishing node dependency relationships |
| 22 | +3. Repeat step 2 until file processing is complete. Finally convert the collected LSP symbols to UniAST format and output |
| 23 | + |
| 24 | +## Extending Other Language Implementations |
| 25 | + |
| 26 | +Since UniAST is not completely equivalent to LSP, some language-specific behavior interfaces need to be implemented for conversion. Refer to the lang/rust package, generally the following capabilities need to be implemented: |
| 27 | + |
| 28 | +- GetDefaultLSP(): Map user input language to specific lsp.Language and corresponding LSP name |
| 29 | +- CheckRepo(): Check user repository status, handle toolchain issues according to language specifications, and return the first file to open by default (for triggering LSP server) and the waiting time for server initialization (determined by repository size) |
| 30 | +- **LanguageSpec interface**: Core module for handling non-LSP generic syntax information, such as determining if a token is a standard library symbol, function signature parsing, etc. |
| 31 | +- ModulePatcher: Post-processing module for handling language-specific information collection. For example, rust's use symbol collection (not collected by LSP). Can be left unimplemented |
| 32 | + |
| 33 | +### LanguageSpec |
| 34 | + |
| 35 | +```go |
| 36 | +// Detailed implementation used for collect LSP symbols and transform them to UniAST |
| 37 | +type LanguageSpec interface { |
| 38 | + // initialize a root workspace, and return all modules [modulename=>abs-path] inside |
| 39 | + WorkSpace(root string) (map[string]string, error) |
| 40 | + |
| 41 | + // give an absolute file path and returns its module name and package path |
| 42 | + // external path should alse be supported |
| 43 | + // FIXEM: some language (like rust) may have sub-mods inside a file, but we still consider it as a unity mod here |
| 44 | + NameSpace(path string) (string, string, error) |
| 45 | + |
| 46 | + // tells if a file belang to language AST |
| 47 | + ShouldSkip(path string) bool |
| 48 | + |
| 49 | + // FileImports parse file codes to get its imports |
| 50 | + FileImports(content []byte) ([]uniast.Import, error) |
| 51 | + |
| 52 | + // return the first declaration token of a symbol, as Type-Name |
| 53 | + DeclareTokenOfSymbol(sym DocumentSymbol) int |
| 54 | + |
| 55 | + // tells if a token is an AST entity |
| 56 | + IsEntityToken(tok Token) bool |
| 57 | + |
| 58 | + // tells if a token is a std token |
| 59 | + IsStdToken(tok Token) bool |
| 60 | + |
| 61 | + // return the SymbolKind of a token |
| 62 | + TokenKind(tok Token) SymbolKind |
| 63 | + |
| 64 | + // tells if a symbol is a main function |
| 65 | + IsMainFunction(sym DocumentSymbol) bool |
| 66 | + |
| 67 | + // tells if a symbol is a language symbol (func, type, variable, etc) in workspace |
| 68 | + IsEntitySymbol(sym DocumentSymbol) bool |
| 69 | + |
| 70 | + // tells if a symbol is public in workspace |
| 71 | + IsPublicSymbol(sym DocumentSymbol) bool |
| 72 | + |
| 73 | + // declare if the language has impl symbol |
| 74 | + // if it return true, the ImplSymbol() will be called |
| 75 | + HasImplSymbol() bool |
| 76 | + // if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found) |
| 77 | + // ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found) |
| 78 | + ImplSymbol(sym DocumentSymbol) (int, int, int) |
| 79 | + |
| 80 | + // if a symbol is a Function or Method symbol, return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs |
| 81 | + FunctionSymbol(sym DocumentSymbol) (int, []int, []int, []int) |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +- Rust-parser implementation location: [RustSpec](/lang/rust/spec.go) |
| 86 | + |
| 87 | +### ModulePatcher |
| 88 | + |
| 89 | +```go |
| 90 | +// ModulePatcher supplements some information for module |
| 91 | +type ModulePatcher interface { |
| 92 | + // Patch is called after collect all symbols |
| 93 | + Patch(ast *parse.Module) |
| 94 | +} |
| 95 | +``` |
| 96 | + |
| 97 | +- Rust-parser implementation: [RustModulePatcher](/lang/rust/patch.go) |
0 commit comments