Skip to content

Commit de0b22e

Browse files
author
jguerreiro
committed
chore(readme): split README and CONTRIBUTING
1 parent ecdcc1a commit de0b22e

File tree

2 files changed

+79
-64
lines changed

2 files changed

+79
-64
lines changed

CONTRIBUTING.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Contributing
2+
3+
## Architecture
4+
5+
### Main overview
6+
7+
All the git information can be found inside commits that are located inside git repositories
8+
Our tree element steps are the following:
9+
10+
- Collect all repository URL's from an object (org, user, group).
11+
- Clone them with the appropriate authentication.
12+
- Run git commands to extract the information we need on each repository.
13+
- Gather data and store this information in a json file.
14+
15+
### Implementation
16+
17+
The root package is the abstract implementation of the extractor.
18+
19+
It contains a Pipeline that extracts git information for every git artifact
20+
(currently a git file but we could support commit), of every repository of an organization.
21+
22+
The cmd/src-fingerprint package contains the binary code.
23+
It reads from CLI and environment the configuration and run the Pipeline on an organization.
24+
25+
## Development build and testing
26+
27+
- Build binary
28+
29+
```sh
30+
go build ./cmd/src-fingerprint
31+
```
32+
33+
- Set env var `VCS_TOKEN` to the GitHub Token or GitLab Token
34+
35+
```sh
36+
export VCS_TOKEN="<token>"
37+
```
38+
39+
- Run and read doc
40+
41+
```sh
42+
./src-fingerprint
43+
```
44+
45+
- Run on a given user/group
46+
```sh
47+
./src-fingerprint --provider github --object Uber
48+
./src-fingerprint --provider-url http://gitlab.example.com --provider gitlab --object Groupe
49+
```
50+
51+
## Performance considerations
52+
53+
Streaming is prefered in this scenario to avoid accumulation in memory of objects.
54+
55+
What we have done for now to improve performance:
56+
57+
- Write object by object to output/file by using jsonl format by default
58+
- Clone using the native git executable. Natively written libraries tend to clone
59+
in memory at some point.
60+
61+
### To consider
62+
63+
- Limiting go channel numbers
64+
65+
## Libraries we use
66+
67+
#### Providers
68+
69+
- GitHub wrapper: "github.com/google/go-github/v18/github"
70+
- Gitlab go wrapper: "github.com/xanzy/go-gitlab"
71+
- Bitbucket wrapper: "github.com/suhaibmujahid/go-bitbucket-server/bitbucket"
72+
- Repository: None
73+
74+
#### Cloning
75+
76+
- native wrapped git command
77+
78+
Using go-git resulted in in-memory cloning (stream to memory and then to directory).
79+
This caused too high peaks of memory unsuitable for small VMs.

README.md

Lines changed: 0 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -77,67 +77,3 @@ src-fingerprint -p repository -o 'https://user:[email protected]/GitGuardian/g
7777
```sh
7878
src-fingerprint -p repository -o 'https://github.com/GitGuardian/gg-shield.git'
7979
```
80-
81-
## Architecture
82-
83-
### Main overview
84-
85-
All the git information can be found inside commit that are located inside git repositories
86-
Our tree element step are the following:
87-
88-
- Collect all repositories URL from the company.
89-
- Clone them with the appropriate authentication.
90-
- Run git commands to extract the information we need on each repository.
91-
- Gather data and store this information in a json file.
92-
93-
### Implementation
94-
95-
The root package is the abstract implementation of the extractor. It contains a Cloner, that clones a git repository.
96-
It contains a Pipeline that extracts git information for every git artifact (currently a git file but we could support commit), of every repository of an organization.
97-
98-
The github package contains the implementation of the Github Provider.
99-
The gitlab package contains the implementation of the Gitlab Provider.
100-
101-
The cmd/src-fingerprint package contains the binary code. It reads from CLI and environment the configuration and run the Pipeline on an organization.
102-
103-
## Development build and testing
104-
105-
- Build binary
106-
107-
```sh
108-
go build ./cmd/src-fingerprint
109-
```
110-
111-
- Set env var `VCS_TOKEN` to the GitHub Token or GitLab Token
112-
113-
```sh
114-
export VCS_TOKEN="<token>"
115-
```
116-
117-
- Run and read doc
118-
119-
```sh
120-
./src-fingerprint
121-
```
122-
123-
- Run on a given user/group
124-
```sh
125-
./src-fingerprint --provider github --object Uber
126-
./src-fingerprint --provider-url http://gitlab.example.com --provider gitlab --object Groupe
127-
```
128-
129-
### Libraries we use
130-
131-
#### Providers
132-
133-
- GitHub wrapper: "github.com/google/go-github/v18/github"
134-
- Gitlab go wrapper: "github.com/xanzy/go-gitlab"
135-
- Bitbucket wrapper: "github.com/suhaibmujahid/go-bitbucket-server/bitbucket"
136-
- Repository: None
137-
138-
#### Cloning
139-
140-
- native wrapped git command
141-
142-
Using go-git resulted in in-memory cloning (stream to memory and then to directory).
143-
This caused too high peaks of memory unsuitable for small VMs.

0 commit comments

Comments
 (0)