|
| 1 | +# Contributing |
| 2 | + |
| 3 | +## Architecture |
| 4 | + |
| 5 | +### Main overview |
| 6 | + |
| 7 | +All the git information can be found inside commits that are located inside git repositories |
| 8 | +Our tree element steps are the following: |
| 9 | + |
| 10 | +- Collect all repository URL's from an object (org, user, group). |
| 11 | +- Clone them with the appropriate authentication. |
| 12 | +- Run git commands to extract the information we need on each repository. |
| 13 | +- Gather data and store this information in a json file. |
| 14 | + |
| 15 | +### Implementation |
| 16 | + |
| 17 | +The root package is the abstract implementation of the extractor. |
| 18 | + |
| 19 | +It contains a Pipeline that extracts git information for every git artifact |
| 20 | +(currently a git file but we could support commit), of every repository of an organization. |
| 21 | + |
| 22 | +The cmd/src-fingerprint package contains the binary code. |
| 23 | +It reads from CLI and environment the configuration and run the Pipeline on an organization. |
| 24 | + |
| 25 | +## Development build and testing |
| 26 | + |
| 27 | +- Build binary |
| 28 | + |
| 29 | + ```sh |
| 30 | + go build ./cmd/src-fingerprint |
| 31 | + ``` |
| 32 | + |
| 33 | +- Set env var `VCS_TOKEN` to the GitHub Token or GitLab Token |
| 34 | + |
| 35 | + ```sh |
| 36 | + export VCS_TOKEN="<token>" |
| 37 | + ``` |
| 38 | + |
| 39 | +- Run and read doc |
| 40 | + |
| 41 | + ```sh |
| 42 | + ./src-fingerprint |
| 43 | + ``` |
| 44 | + |
| 45 | +- Run on a given user/group |
| 46 | + ```sh |
| 47 | + ./src-fingerprint --provider github --object Uber |
| 48 | + ./src-fingerprint --provider-url http://gitlab.example.com --provider gitlab --object Groupe |
| 49 | + ``` |
| 50 | + |
| 51 | +## Performance considerations |
| 52 | + |
| 53 | +Streaming is prefered in this scenario to avoid accumulation in memory of objects. |
| 54 | + |
| 55 | +What we have done for now to improve performance: |
| 56 | + |
| 57 | +- Write object by object to output/file by using jsonl format by default |
| 58 | +- Clone using the native git executable. Natively written libraries tend to clone |
| 59 | + in memory at some point. |
| 60 | + |
| 61 | +### To consider |
| 62 | + |
| 63 | +- Limiting go channel numbers |
| 64 | + |
| 65 | +## Libraries we use |
| 66 | + |
| 67 | +#### Providers |
| 68 | + |
| 69 | +- GitHub wrapper: "github.com/google/go-github/v18/github" |
| 70 | +- Gitlab go wrapper: "github.com/xanzy/go-gitlab" |
| 71 | +- Bitbucket wrapper: "github.com/suhaibmujahid/go-bitbucket-server/bitbucket" |
| 72 | +- Repository: None |
| 73 | + |
| 74 | +#### Cloning |
| 75 | + |
| 76 | +- native wrapped git command |
| 77 | + |
| 78 | +Using go-git resulted in in-memory cloning (stream to memory and then to directory). |
| 79 | +This caused too high peaks of memory unsuitable for small VMs. |
0 commit comments