Skip to content

Commit afd0d4f

Browse files
authored
feat: indexer (#5)
* Replaces the `index` method by the `Indexer` class * Adds a lot of tests * Better readme
1 parent b0fddeb commit afd0d4f

File tree

12 files changed

+417
-38
lines changed

12 files changed

+417
-38
lines changed

README.md

Lines changed: 82 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,57 +2,93 @@
22

33
Opinionated Python bindings for the [tree-sitter-stack-graphs](https://github.com/github/stack-graphs) rust library.
44

5-
It exposes very few, easy to use functions to index files and query references.
5+
It exposes a minimal, opinionated API to leverage the stack-graphs library for reference resolution in source code.
66

7-
This is a proof of concept draft, to test scripting utilities using stack-graphs easily.
7+
The rust bindings are built using [PyO3](https://pyo3.rs) and [maturin](https://maturin.rs).
88

9-
It uses pyo3 and maturin to generate the bindings.
9+
Note that this is a work in progress, and the API is subject to change. This project is not affiliated with GitHub.
1010

1111
## Installation & Usage
1212

1313
```bash
14-
pip install stack-graphs-python-bindings # or poetry, ...
14+
pip install stack-graphs-python-bindings
1515
```
1616

17+
### Example
18+
19+
Given the following directory structure:
20+
21+
```bash
22+
tests/js_sample
23+
├── index.js
24+
└── module.js
25+
```
26+
27+
`index.js`:
28+
29+
```javascript
30+
import { foo } from "./module"
31+
const baz = foo
32+
```
33+
34+
`module.js`:
35+
36+
```javascript
37+
export const foo = "bar"
38+
```
39+
40+
The following Python script:
41+
1742
```python
1843
import os
19-
from stack_graphs_python import index, Querier, Position, Language
44+
from stack_graphs_python import Indexer, Querier, Position, Language
2045

2146
db_path = os.path.abspath("./db.sqlite")
2247
dir = os.path.abspath("./tests/js_sample")
2348

2449
# Index the directory (creates stack-graphs database)
25-
index([dir], db_path, language=Language.JavaScript)
50+
indexer = Indexer(db_path, [Language.JavaScript])
51+
indexer.index_all([dir])
2652

2753
# Instantiate a querier
2854
querier = Querier(db_path)
2955

30-
# Query a reference at a given position (0-indexed line and column):
56+
# Query a reference at a given position (0-indexed line and column):
3157
# foo in: const baz = foo
3258
source_reference = Position(path=dir + "/index.js", line=2, column=12)
3359
results = querier.definitions(source_reference)
3460

3561
for r in results:
36-
print(f"{r.path}, l:{r.line}, c: {r.column}")
62+
print(r)
3763
```
3864

39-
Will result in:
65+
Will output:
4066

4167
```bash
42-
[...]/stack-graphs-python-bindings/tests/js_sample/index.js, l:0, c: 9
43-
[...]/stack-graphs-python-bindings/tests/js_sample/module.js, l:0, c: 13
68+
Position(path="[...]/tests/js_sample/index.js", line=0, column=9)
69+
Position(path="[...]/tests/js_sample/module.js", line=0, column=13)
4470
```
4571

4672
That translates to:
4773

4874
```javascript
4975
// index.js
5076
import { foo } from "./module"
77+
// ^ line 0, column 9
5178

5279
// module.js
5380
export const foo = "bar"
81+
// ^ line 0, column 13
5482
```
5583

84+
> **Note**: All the paths are absolute, and line and column numbers are 0-indexed (first line is 0, first column is 0).
85+
86+
## Known stack-graphs / tree-sitter issues
87+
88+
- Python: module resolution / imports seems to be broken: <https://github.com/github/stack-graphs/issues/430>
89+
- Typescript: module resolution doesn't work with file extensions (eg. `import { foo } from "./module"` is ok, but `import { foo } from "./module.ts"` is not). **An issue should be opened on the stack-graphs repo**. See: `tests/ts_ok_test.py`
90+
- Typescript: tree-sitter-typescript fails when passing a generic type to a decorator: <https://github.com/tree-sitter/tree-sitter-typescript/issues/283>
91+
5692
## Development
5793

5894
### Ressources
@@ -67,7 +103,7 @@ https://pyo3.rs/v0.21.2/getting-started
67103
### Setup
68104

69105
```bash
70-
# Setup venv and install maturin through pip
106+
# Setup venv and install dev dependencies
71107
make setup
72108
```
73109

@@ -76,3 +112,37 @@ make setup
76112
```bash
77113
make test
78114
```
115+
116+
### Manual testing
117+
118+
```bash
119+
# build the package
120+
make develop
121+
# activate the venv
122+
. venv/bin/activate
123+
```
124+
125+
### Roadmap
126+
127+
Before releasing 0.1.0, which I expect to be a first stable API, the following needs to be done:
128+
129+
- [ ] Add more testing, especially:
130+
- [ ] Test all supported languages (Java, ~~Python~~, ~~TypeScript~~, ~~JavaScript~~)
131+
- [ ] Test failing cases, eg. files that cannot be indexed
132+
- [ ] Add options to the classes:
133+
- [ ] Verbosity
134+
- [ ] Force for the Indexer
135+
- [ ] Fail on error for the Indexer, or continue indexing
136+
- [ ] Handle the storage (database) in a dedicated class, and pass it to the Indexer and Querier
137+
- [ ] Add methods to query the indexing status (eg. which files have been indexed, which failed, etc.)
138+
- [ ] Rely on the main branch of stack-graphs, and update the bindings accordingly
139+
- [ ] Better error handling, return clear errors, test them and add them to the `.pyi` interface
140+
- [ ] Lint and format the rust code
141+
- [ ] CI/CD for the rust code
142+
- [ ] Lint and format the python code
143+
- [ ] Propper changelog, starting in 0.1.0
144+
145+
I'd also like to add the following features, after 0.1.0:
146+
147+
- [ ] Expose the exact, lower-level API of stack-graphs, for more flexibility, in a separate module (eg. `stack_graphs_python.core`)
148+
- [ ] Benchmark performance

src/classes.rs

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@ use std::fmt::Display;
22

33
use pyo3::prelude::*;
44

5-
use stack_graphs::storage::SQLiteReader;
5+
use stack_graphs::storage::{SQLiteReader, SQLiteWriter};
66
use tree_sitter_stack_graphs::cli::util::{SourcePosition, SourceSpan};
7+
use tree_sitter_stack_graphs::loader::Loader;
78

8-
use crate::stack_graphs_wrapper::query_definition;
9+
use crate::stack_graphs_wrapper::{index_all, new_loader, query_definition};
910

1011
#[pyclass]
1112
#[derive(Clone)]
@@ -62,7 +63,41 @@ impl Querier {
6263
}
6364
}
6465

65-
// TODO(@nohehf): Indexer class
66+
#[pyclass]
67+
pub struct Indexer {
68+
db_writer: SQLiteWriter,
69+
db_path: String,
70+
loader: Loader,
71+
}
72+
73+
#[pymethods]
74+
impl Indexer {
75+
#[new]
76+
pub fn new(db_path: String, languages: Vec<Language>) -> Self {
77+
Indexer {
78+
db_writer: SQLiteWriter::open(db_path.clone()).unwrap(),
79+
db_path: db_path,
80+
loader: new_loader(languages),
81+
}
82+
}
83+
84+
pub fn index_all(&mut self, paths: Vec<String>) -> PyResult<()> {
85+
let paths: Vec<std::path::PathBuf> =
86+
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();
87+
88+
match index_all(paths, &mut self.loader, &mut self.db_writer) {
89+
Ok(_) => Ok(()),
90+
Err(e) => Err(e.into()),
91+
}
92+
}
93+
94+
// @TODO: Add a method to retrieve the status of the files (indexed, failed, etc.)
95+
// This might be done on a separate class (Database / Storage), as it is tied to the storage, not a specific indexer
96+
97+
fn __repr__(&self) -> String {
98+
format!("Indexer(db_path=\"{}\")", self.db_path)
99+
}
100+
}
66101

67102
#[pymethods]
68103
impl Position {

src/lib.rs

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ use pyo3::prelude::*;
33
mod classes;
44
mod stack_graphs_wrapper;
55

6-
use classes::{Language, Position, Querier};
6+
use classes::{Indexer, Language, Position, Querier};
77

88
/// Formats the sum of two numbers as string.
99
#[pyfunction]
@@ -20,10 +20,10 @@ fn index(paths: Vec<String>, db_path: String, language: Language) -> PyResult<()
2020
let paths: Vec<std::path::PathBuf> =
2121
paths.iter().map(|p| std::path::PathBuf::from(p)).collect();
2222

23-
Ok(stack_graphs_wrapper::index(
23+
Ok(stack_graphs_wrapper::index_legacy(
2424
paths,
2525
&db_path,
26-
language.into(),
26+
&language.into(),
2727
)?)
2828
}
2929

@@ -35,5 +35,6 @@ fn stack_graphs_python(_py: Python, m: &PyModule) -> PyResult<()> {
3535
m.add_class::<Position>()?;
3636
m.add_class::<Language>()?;
3737
m.add_class::<Querier>()?;
38+
m.add_class::<Indexer>()?;
3839
Ok(())
3940
}

src/stack_graphs_wrapper/mod.rs

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ impl std::convert::From<StackGraphsError> for PyErr {
2121
}
2222
}
2323

24-
fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
24+
pub fn get_langauge_configuration(lang: &Language) -> LanguageConfiguration {
2525
match lang {
2626
Language::Python => {
2727
tree_sitter_stack_graphs_python::language_configuration(&NoCancellation)
@@ -36,10 +36,10 @@ fn get_langauge_configuration(lang: Language) -> LanguageConfiguration {
3636
}
3737
}
3838

39-
pub fn index(
39+
pub fn index_legacy(
4040
paths: Vec<PathBuf>,
4141
db_path: &str,
42-
language: Language,
42+
language: &Language,
4343
) -> Result<(), StackGraphsError> {
4444
let configurations = vec![get_langauge_configuration(language)];
4545

@@ -81,6 +81,40 @@ pub fn index(
8181
}
8282
}
8383

84+
pub fn new_loader(languages: Vec<Language>) -> Loader {
85+
let configurations = languages
86+
.iter()
87+
.map(|l| get_langauge_configuration(l))
88+
.collect();
89+
90+
Loader::from_language_configurations(configurations, None).unwrap()
91+
}
92+
93+
pub fn index_all(
94+
paths: Vec<PathBuf>,
95+
loader: &mut Loader,
96+
db_writer: &mut SQLiteWriter,
97+
) -> Result<(), StackGraphsError> {
98+
let reporter = ConsoleReporter::none();
99+
100+
let mut indexer = Indexer::new(db_writer, loader, &reporter);
101+
102+
// For now, force reindexing
103+
indexer.force = true;
104+
105+
let paths = canonicalize_paths(paths);
106+
107+
// https://github.com/github/stack-graphs/blob/7db914c01b35ce024f6767e02dd1ad97022a6bc1/tree-sitter-stack-graphs/src/cli/index.rs#L107
108+
let continue_from_none: Option<PathBuf> = None;
109+
110+
match indexer.index_all(paths, continue_from_none, &NoCancellation) {
111+
Ok(_) => Ok(()),
112+
Err(e) => Err(StackGraphsError {
113+
message: format!("Failed to index: {}", e),
114+
}),
115+
}
116+
}
117+
84118
pub fn query_definition(
85119
reference: SourcePosition,
86120
db_reader: &mut SQLiteReader,

stack_graphs_python.pyi

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,13 @@ class Language(Enum):
77
Java = 3
88

99
class Position:
10+
"""
11+
A position in a given file:
12+
- path: the path to the file
13+
- line: the line number (0-indexed)
14+
- column: the column number (0-indexed)
15+
"""
16+
1017
path: str
1118
line: int
1219
column: int
@@ -16,8 +23,38 @@ class Position:
1623
def __repr__(self) -> str: ...
1724

1825
class Querier:
26+
"""
27+
A class to query the stack graphs database
28+
- db_path: the path to the database
29+
30+
Usage: see Querier.definitions
31+
"""
1932
def __init__(self, db_path: str) -> None: ...
20-
def definitions(self, reference: Position) -> list[Position]: ...
33+
def definitions(self, reference: Position) -> list[Position]:
34+
"""
35+
Get the definitions of a given reference
36+
- reference: the position of the reference
37+
- returns: a list of positions of the definitions
38+
"""
39+
...
40+
def __repr__(self) -> str: ...
41+
42+
class Indexer:
43+
"""
44+
A class to build the stack graphs of a given set of files
45+
- db_path: the path to the database
46+
- languages: the list of languages to index
47+
"""
48+
def __init__(self, db_path: str, languages: list[Language]) -> None: ...
49+
def index_all(self, paths: list[str]) -> None:
50+
"""
51+
Index all the files in the given paths, recursively
52+
"""
53+
...
2154
def __repr__(self) -> str: ...
2255

23-
def index(paths: list[str], db_path: str, language: Language) -> None: ...
56+
def index(paths: list[str], db_path: str, language: Language) -> None:
57+
"""
58+
DeprecationWarning: The 'index' function is deprecated. Use 'Indexer' instead.
59+
"""
60+
...

tests/helpers/virtual_files.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ def _get_positions_in_file(file_path: str, contents: str) -> dict[str, Position]
4646

4747

4848
@contextlib.contextmanager
49-
def string_to_virtual_repo(
49+
def string_to_virtual_files(
5050
string: str,
5151
) -> Iterator[tuple[str, dict[str, Position]]]:
5252
"""
@@ -62,7 +62,7 @@ def string_to_virtual_repo(
6262
^{pos2}
6363
\"""
6464
65-
with string_to_virtual_repo(string) as (repo_path, positions):
65+
with string_to_virtual_files(string) as (repo_path, positions):
6666
...
6767
```
6868
@@ -104,7 +104,7 @@ def string_to_virtual_repo(
104104
105105
When parsed via:
106106
```py
107-
with string_to_virtual_repo(string) as (repo_path, positions):
107+
with string_to_virtual_files(string) as (repo_path, positions):
108108
...
109109
```
110110

0 commit comments

Comments
 (0)