Skip to content

Commit 522d468

Browse files
committed
docs: improve documentation of COMMIT_STATS and COMMIT_FILE_STATS
Closes #884 Closes #885 - Adds documentation for version function along with the returned format. - Adds examples for COMMIT_STATS and COMMIT_FILE_STATS. - Adds section about how to use COMMIT_STATS and COMMIT_FILE_STATS and their output shape. Signed-off-by: Miguel Molina <[email protected]>
1 parent b1f3ca3 commit 522d468

File tree

2 files changed

+185
-3
lines changed

2 files changed

+185
-3
lines changed

docs/using-gitbase/examples.md

Lines changed: 64 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ GROUP BY committer_email,
7676
repo_id;
7777
```
7878

79-
## Report of line count per file from HEAD references
79+
## Report of line count per file from HEAD references
8080

8181
```sql
8282
SELECT
@@ -141,6 +141,69 @@ CREATE INDEX files_lang_idx ON files USING pilosa (language(file_path, blob_cont
141141
DROP INDEX files_lang_idx ON files;
142142
```
143143

144+
## Calculating code line changes in the last commit
145+
146+
This query will report how many lines of actual code (only code, not comments, blank lines or text) changed in the last commit of each repository.
147+
148+
```
149+
SELECT
150+
repo,
151+
JSON_EXTRACT(stats, '$.Code.Additions') AS code_lines_added,
152+
JSON_EXTRACT(stats, '$.Code.Deletions') AS code_lines_removed
153+
FROM (
154+
SELECT
155+
repository_id AS repo,
156+
COMMIT_STATS(repository_id, commit_hash) AS stats
157+
FROM refs
158+
WHERE ref_name = 'HEAD'
159+
) t;
160+
```
161+
162+
The output will be similar to this:
163+
164+
```
165+
+-----------------+------------------+--------------------+
166+
| repo | code_lines_added | code_lines_removed |
167+
+-----------------+------------------+--------------------+
168+
| salty-wombat | 56 | 2 |
169+
| sugar-boogaloo | 11 | 1 |
170+
+-----------------+------------------+--------------------+
171+
```
172+
173+
## Calculating code line changes for files in the last commit
174+
175+
This query will report how many lines of actual code (only code, not comments, blank lines or text) changed in each file of the last commit of each repository. It's similar to the previous example. `COMMIT_STATS` is an aggregation over the result of `COMMIT_FILE_STATS` so to speak.
176+
We will only report those files that whose language has been identified.
177+
178+
```
179+
SELECT
180+
repo,
181+
JSON_UNQUOTE(JSON_EXTRACT(stats, '$.Path')) AS file_path,
182+
JSON_UNQUOTE(JSON_EXTRACT(stats, '$.Language')) AS file_language,
183+
JSON_EXTRACT(stats, '$.Code.Additions') AS code_lines_added,
184+
JSON_EXTRACT(stats, '$.Code.Deletions') AS code_lines_removed
185+
FROM (
186+
SELECT
187+
repository_id AS repo,
188+
EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
189+
FROM refs
190+
WHERE ref_name = 'HEAD'
191+
) t
192+
WHERE file_language <> '';
193+
```
194+
195+
The output will be similar to this:
196+
197+
```
198+
+-----------------+--------------------------------------+---------------+------------------+--------------------+
199+
| repo | file_path | file_language | code_lines_added | code_lines_removed |
200+
+-----------------+--------------------------------------+---------------+------------------+--------------------+
201+
| salty-wombat | main.py | Python | 40 | 0 |
202+
| salty-wombat | __init__.py | Python | 16 | 2 |
203+
| sugar-boogaloo | server.go | Go | 11 | 1 |
204+
+-----------------+--------------------------------------+---------------+------------------+--------------------+
205+
```
206+
144207
# UAST UDFs Examples
145208

146209
First of all, you should check out the [bblfsh documentation](https://docs.sourced.tech/babelfish) to get yourself familiar with UAST concepts.

docs/using-gitbase/functions.md

Lines changed: 121 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ To make some common tasks easier for the user, there are some functions to inter
66

77
| Name | Description |
88
|:-------------|:-------------------------------------------------------------------------------------------------------------------------------|
9-
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash)`|returns the stats between two commits for a repository. If from is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function.|
10-
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash)`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If from is not given, the parent commit will be used. Vendored files stats are not included in the result of this function.|
9+
|`commit_stats(repository_id, [from_commit_hash], to_commit_hash) json`|returns the stats between two commits for a repository. If from is empty, it will compare the given `to_commit_hash` with its parent commit. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
10+
|`commit_file_stats(repository_id, [from_commit_hash], to_commit_hash) json array`|returns an array with the stats of each file in `to_commit_hash` since the given `from_commit_hash`. If from is not given, the parent commit will be used. Vendored files stats are not included in the result of this function. This function is more thoroughly explained later in this document.|
1111
|`is_remote(reference_name)bool`| check if the given reference name is from a remote one |
1212
|`is_tag(reference_name)bool`| check if the given reference name is a tag |
1313
|`is_vendor(file_path)bool`| check if the given file name is a vendored file |
@@ -18,6 +18,7 @@ To make some common tasks easier for the user, there are some functions to inter
1818
|`uast_extract(blob, key) text array`| extracts information identified by the given key from the uast nodes |
1919
|`uast_children(blob) blob`| returns a flattened array of the children UAST nodes from each one of the UAST nodes in the given array |
2020
|`loc(path, blob) json`| returns a JSON map, containing the lines of code of a file, separated in three categories: Code, Blank and Comment lines |
21+
|`version() text`| returns the gitbase version in the following format `8.0.11-{GITBASE_VERSION}` for compatibility with MySQL versioning |
2122
## Standard functions
2223

2324
These are all functions that are available because they are implemented in `go-mysql-server`, used by gitbase.
@@ -165,3 +166,121 @@ Nodes that have no value for the requested property will not be present in any w
165166
Also, if you want to retrieve values from a non common property, you can pass it directly
166167

167168
> uast_extract(nodes_column, 'some-property')
169+
170+
## How to use `commit_file_stats`
171+
172+
`commit_file_stats` will return statistics about the line changes in all files in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.
173+
174+
It can be used in two ways:
175+
- To get the statistics of files in a specific commit `COMMIT_FILE_STATS(repository_id, commit_hash)`
176+
- To get the statistics of files in a commit range `COMMIT_FILE_STATS(repository_id, from_commit, to_commit)`
177+
178+
The result of this function is an array of JSON documents with the following shape:
179+
180+
```
181+
{
182+
"Path": file path,
183+
"Language": file language,
184+
"Code": {
185+
"Additions": number of code additions in this file,
186+
"Deletions": number of code deletions in this file,
187+
},
188+
"Comment": {
189+
"Additions": number of comment line additions in this file,
190+
"Deletions": number of comment line deletions in this file,
191+
},
192+
"Blank": {
193+
"Additions": number of blank line additions in this file,
194+
"Deletions": number of blank line deletions in this file,
195+
},
196+
"Other": {
197+
"Additions": number of other additions in this file,
198+
"Deletions": number of other deletions in this file,
199+
},
200+
"Total": {
201+
"Additions": number of total additions in this file,
202+
"Deletions": number of total deletions in this file,
203+
},
204+
}
205+
```
206+
207+
**NOTE:** Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that `.gitignore` is considered a vendored file.
208+
209+
Because the result of this function is an array of JSON documents, we will need two functions to make use of its data effectively:
210+
- `EXPLODE` which will make each element in the array have its own row
211+
- `JSON_EXTRACT` to get data from inside the documents
212+
213+
For example, to get the stats of the HEAD commits:
214+
```sql
215+
SELECT
216+
repository_id,
217+
EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
218+
FROM refs
219+
WHERE ref_name = 'HEAD'
220+
```
221+
222+
`EXPLODE` here will make sure a single row is returned for every single result returned by `COMMIT_FILE_STATS` instead of an array with all of them combined.
223+
224+
Then, to extract code additions from this:
225+
226+
```sql
227+
SELECT
228+
repository_id
229+
JSON_EXTRACT(stats, '$.Code.Additions')
230+
FROM (
231+
SELECT
232+
repository_id,
233+
EXPLODE(COMMIT_FILE_STATS(repository_id, commit_hash)) AS stats
234+
FROM refs
235+
WHERE ref_name = 'HEAD'
236+
) t
237+
```
238+
239+
**NOTE:** When extracting `Path` or `Language` using `JSON_EXTRACT`, by the way that function works, the result will be quoted (e.g. `"Python"` instead of `Python`). For that reason, for these two string fields `JSON_EXTRACT` should be combined with `JSON_UNQUOTE` like `JSON_UNQUOTE(JSON_EXTRACT(stats, '$.Path'))`.
240+
241+
## How to use `commit_stats`
242+
243+
`commit_stats` will return statistics about the line changes in the given range of commits classifying them in 4 categories: code, comments, blank lines and other.
244+
245+
It can be used in two ways:
246+
- To get the statistics of a specific commit `COMMIT_STATS(repository_id, commit_hash)`
247+
- To get the statistics of a the diff of a commit range `COMMIT_STATS(repository_id, from_commit, to_commit)`
248+
249+
`commit_stats` it's pretty much an aggregation of the result of `commit_file_stats`. While `commit_file_stats` has the stats for each file in a commit, `commit_stats` has the global stats of all files in the commit. As a result, it outputs a single structure instead of an array of them.
250+
251+
The shape of the result returned by this function is the following:
252+
253+
```
254+
{
255+
"Files": number of files changed in this commit,
256+
"Code": {
257+
"Additions": number of code additions in this commit,
258+
"Deletions": number of code deletions in this commit,
259+
},
260+
"Comment": {
261+
"Additions": number of comment line additions in this commit,
262+
"Deletions": number of comment line deletions in this commit,
263+
},
264+
"Blank": {
265+
"Additions": number of blank line additions in this commit,
266+
"Deletions": number of blank line deletions in this commit,
267+
},
268+
"Other": {
269+
"Additions": number of other additions in this commit,
270+
"Deletions": number of other deletions in this commit,
271+
},
272+
"Total": {
273+
"Additions": number of total additions in this commit,
274+
"Deletions": number of total deletions in this commit,
275+
},
276+
}
277+
```
278+
279+
**NOTE:** Files that are considered vendored files are ignored for the purpose of computing these statistics. Note that `.gitignore` is considered a vendored file.
280+
281+
The result returned by this function is a JSON, which means to access its fields, the use of `JSON_EXTRACT is needed.
282+
283+
For example, code additions would be accessed like this:
284+
```sql
285+
JSON_EXTRACT(COMMIT_STATS(repository_id, commit_hash), '$.Code.Additions')
286+
```

0 commit comments

Comments
 (0)