Skip to content

Commit a984aa0

Browse files
authored
Merge pull request #163 from the-mama-ai/update-docs
feat(github): add documentation
2 parents e4c45e4 + 6bdd8a9 commit a984aa0

File tree

7 files changed

+281
-3
lines changed

7 files changed

+281
-3
lines changed

docs/source/auth-providers.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ Giftless provides the following authentication and authorization modules by defa
2323

2424
* `giftless.auth.jwt:JWTAuthenticator` - uses [JWT tokens](https://jwt.io/) to both identify
2525
the user and grant permissions based on scopes embedded in the token payload.
26+
* `giftless.auth.github:GithubAuthenticator` - uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) to both identify the user and grant permissions based on those for a GitHub repository of the same organization/name.
2627
* `giftless.auth.allow_anon:read_only` - grants read-only permissions on everything to every
2728
request; Typically, this is only useful in testing environments or in very limited
2829
deployments.
@@ -75,7 +76,7 @@ Basic HTTP authentication.
7576

7677
You can disable this functionality or change the expected username using the `basic_auth_user` configuration option.
7778

78-
### Configuration Options
79+
### `giftless.auth.jwt` Configuration Options
7980
The following options are available for the `jwt` auth module:
8081

8182
* `algorithm` (`str`): JWT algorithm to use, e.g. `HS256` (default) or `RS256`. Must match the algorithm
@@ -191,6 +192,37 @@ The `leeway` parameter allows for providing a leeway / grace time to be
191192
considered when checking expiry times, to cover for clock skew between
192193
servers.
193194

195+
## GitHub Authenticator
196+
This authenticator lets you provide a frictionless LFS backend for existing GitHub repositories. It plays nicely with `git` credential helpers and allows you to use GitHub as the single authentication & authorization provider.
197+
198+
### Details
199+
The authenticator uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), the same ones used for cloning a GitHub repo over HTTPS. The provided token is used in a couple GitHub API calls that identify the token's identity and [its permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) for the GitHub organization & repository. The token is supposed to be passed in the password part of the `Basic` HTTP auth (username is ignored). `Bearer` token HTTP auth is also supported, although no git client will likely use it.
200+
201+
For the authenticator to work properly the token must have the `read:org` for "Classic" or `metadata:read` permission for the fine-grained kind.
202+
203+
Note: Authentication via SSH that could be used to verify the user is [not possible with GitHub at the time of writing](https://github.com/datopian/giftless/issues/128#issuecomment-2037190728).
204+
205+
The GitHub repository permissions are mapped to [Giftless permissions](#permissions) in the straightforward sense that those able to write will be able to write, same with read; invalid tokens or identities with no repository access will get rejected.
206+
207+
To minimize the traffic to GitHub for each LFS action, most of the auth data is being temporarily cached in memory, which improves performance, but naturally also ignores immediate changes for identities with changed permissions.
208+
209+
### GitHub Auth Flow
210+
Here's a description of the authentication & authorization flow. If any of these steps fails, the request gets rejected.
211+
212+
1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used (as usual) to determine what GitHub organization and repository is being targeted (e.g. `https://<server>/<org>/<repo>.git/info/lfs/...`). The request's `Authentication` header is also searched for the required GitHub personal access token.
213+
2. The token is then used in a [`/user`](https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#get-the-authenticated-user) GitHub API call to get its identity data.
214+
3. Further on the GitHub API is asked for the [user's permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) to the org/repo in question.
215+
4. Based on the information above the user will be granted or rejected access.
216+
217+
### `giftless.auth.github` Configuration Options
218+
* `api_url` (`str` = `"https://api.github.com"`): Base URL for the GitHub API (enterprise servers have API at `"https://<custom-hostname>/api/v3/"`).
219+
* `api_version` (`str | None` = `"2022-11-28"`): Target GitHub API version; set to `None` to use GitHub's latest (rather experimental).
220+
* `cache` (`dict`): Cache configuration section
221+
* `token_max_size` (`int` = `32`): Max number of entries in the token -> user LRU cache. This cache holds the authentication data for a token. Evicted tokens will need to be re-authenticated.
222+
* `auth_max_size` (`int` = `32`): Max number of [un]authorized org/repos TTL(LRU) for each user. Evicted repos will need to get re-authorized.
223+
* `auth_write_ttl` (`float` = `15 * 60`): Max age [seconds] of user's org/repo authorizations able to `WRITE`. A repo writer will also need to be re-authorized after this period.
224+
* `auth_other_ttl` (`float` = `30`): Max age [seconds] of user's org/repo authorizations **not** able to `WRITE`. A repo reader or a rejected user will get a chance for a permission upgrade after this period.
225+
194226
## Understanding Authentication and Authorization Providers
195227

196228
This part is more abstract, and will help you understand how Giftless handles
@@ -220,6 +252,10 @@ Very simply, an `Identity` object encapsulates information about the current use
220252
request, and is expected to have the following interface:
221253

222254
```python
255+
from typing import Optional
256+
from giftless.auth.identity import Permission
257+
258+
223259
class Identity:
224260
name: Optional[str] = None
225261
id: Optional[str] = None
@@ -244,9 +280,12 @@ Authorizer classes may use the default built-in `DefaultIdentity`, or implement
244280
subclass of their own.
245281

246282
#### Permissions
247-
Giftless defines the following permissions on entites:
283+
Giftless defines the following permissions on entities:
248284

249285
```python
286+
from enum import Enum
287+
288+
250289
class Permission(Enum):
251290
READ = "read"
252291
READ_META = "read-meta"

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# documentation root, use os.path.abspath to make it absolute, like shown here.
1212
#
1313
import os
14-
import importlib
14+
import importlib.metadata
1515

1616
from recommonmark.transform import AutoStructify
1717

docs/source/configuration.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,5 +126,17 @@ clients using these URLs. By default, the JWT auth provider is used here.
126126

127127
There is typically no need to override the default behavior.
128128

129+
#### `LEGACY_ENDPOINTS`
130+
This is a `bool` flag, default `true` (deprecated, use `false` where possible), that affects the base URI of all the service endpoints. Previously, the endpoints didn't adhere to the rules for [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md), which needed additional routing or client configuration.
131+
132+
The default base URI for all giftless endpoints is now `/<org_path>/<repo>.git/info/lfs` while the legacy one is `/<org>/<repo>`.
133+
* `<org>` is a simple organization name not containing slashes (common for GitHub)
134+
* `<org_path>` is a more versatile organization path which can contain slashes (common for GitLab)
135+
* `<repo>` is a simple repository name not containing slashes
136+
137+
With `LEGACY_ENDPOINTS` set to `true`, **both the current and legacy** endpoints work simultaneously. When using the `basic_streamimg` transfer adapter, for backward compatibility it is the **legacy URI** that is being used for the object URLs in the batch API responses.
138+
139+
Setting `LEGACY_ENDPOINTS` to `false` makes everything use the current base URI, requests to the legacy URIs will get rejected.
140+
129141
#### `DEBUG`
130142
If set to `true`, enables more verbose debugging output in logs.

docs/source/github-lfs.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
Shadowing GitHub LFS
2+
====================
3+
4+
This guide shows how to use Giftless as the LFS server for an existing GitHub repository (not using GitHub LFS). Thanks to a handful tricks it also acts as a full remote HTTPS-based `git` repository, making this a zero client configuration setup.
5+
6+
This guide uses `docker compose`, so you need to [install it](https://docs.docker.com/compose/install/). It also relies on you using HTTPS for cloning GitHub repos. The SSH way is not supported.
7+
8+
### Running docker containers
9+
To run the setup, `git clone https://github.com/datopian/giftless`, step into the `examples/github-lfs` and run `docker compose up`.
10+
11+
This will run two containers:
12+
- `giftless`: Locally built Giftless server configured to use solely the [GitHub authentication provider](auth-providers.md#github-authenticator) and a local docker compose volume as the storage backend.
13+
- `proxy`: An [Envoy reverse proxy](https://www.envoyproxy.io/) which acts as the frontend listening on a local port 5000, configured to route LFS traffic to `giftless` and pretty much anything else to `[api.]github.com`. **The proxy listens at an unencrypted HTTP**, setting the proxy to provide TLS termination is very much possible, but isn't yet covered (your turn, thanks for the contribution!).
14+
15+
Feel free to explore the `compose.yaml`, which contains all the details.
16+
17+
### Cloning a GitHub repository via proxy
18+
The frontend proxy forwards the usual `git` traffic to GitHub, so go there and pick/create some testing repository where you have writable access and clone it via the proxy hostname (just change `github.com` for wherever you host):
19+
```shell
20+
git clone http://localhost:5000/$YOUR_ORG/$YOUR_REPO
21+
```
22+
When you don't use a credential helper, you might get asked a few times for the same credentials before the call gets through. [Make sure to get one](https://git-scm.com/doc/credential-helpers) before it drives you insane.
23+
24+
Thanks to the [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md) this is all you should need to become LFS-enabled!
25+
26+
### Pushing binary blobs
27+
Let's try pushing some binary blobs then! See also [Quickstart](quickstart.md#create-a-local-repository-and-push-some-file).
28+
```shell
29+
# create some blob
30+
dd if=/dev/urandom of=blob.bin bs=1M count=1
31+
# make it tracked by LFS
32+
git lfs track blob.bin
33+
# the LFS tracking is written in .gitattributes, which you also want committed
34+
git add .gitattributes blob.bin
35+
git commit -m 'Hello LFS!'
36+
# push it, assuming the local branch is main
37+
# this might fail for the 1st time, when git automatically runs 'git config lfs.locksverify false'
38+
git push -u origin main
39+
```
40+
41+
This should eventually succeed, and you will find the LFS digest in place of the blob on GitHub and the binary blob on your local storage:
42+
```shell
43+
docker compose exec -it giftless find /lfs-storage
44+
/lfs-storage
45+
/lfs-storage/$YOUR_ORG
46+
/lfs-storage/$YOUR_ORG/$YOUR_REPO
47+
/lfs-storage/$YOUR_ORG/$YOUR_REPO/deadbeefb10bb10bad40beaa8c68c4863e8b00b7e929efbc6dcdb547084b01
48+
```
49+
50+
Next time anyone clones the repo (via the proxy), the binary blob will get properly downloaded. Failing to use the proxy hostname will make `git` use GitHub's own LFS, which is a paid service you are obviously trying to avoid.
51+
52+
### Service teardown
53+
54+
Finally, to shut down your containers, break (`^C`) the current compose run and clean up dead containers with:
55+
```shell
56+
docker compose down [--volumes]
57+
```
58+
Using `--volumes` tears down the `lfs-storage` volume too, so make sure it's what you wanted.

docs/source/guides.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ This section includes several how-to guides designed to get you started with Gif
99
quickstart
1010
using-gcs
1111
jwt-auth-guide
12+
github-lfs

examples/github-lfs/.env

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# listening (proxy) port on the host
2+
SERVICE_PORT=5000
3+
# inner port giftless listens on
4+
GIFTLESS_PORT=5000
5+
# inner port the reverse proxy listens on
6+
PROXY_PORT=8080

examples/github-lfs/compose.yaml

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
name: github-lfs
2+
3+
volumes:
4+
lfs-storage: {}
5+
6+
services:
7+
giftless:
8+
image: docker.io/datopian/giftless:latest
9+
volumes:
10+
- lfs-storage:/lfs-storage
11+
environment:
12+
GIFTLESS_DEBUG: "1"
13+
GIFTLESS_CONFIG_STR: |
14+
# use endpoints at /<org>/<repo>.git/info/lfs/ only
15+
LEGACY_ENDPOINTS: false
16+
AUTH_PROVIDERS:
17+
- factory: giftless.auth.github:factory
18+
TRANSFER_ADAPTERS:
19+
basic:
20+
factory: giftless.transfer.basic_streaming:factory
21+
options:
22+
# use the lfs-storage volume as local storage
23+
storage_class: giftless.storage.local_storage:LocalStorage
24+
storage_options:
25+
path: /lfs-storage
26+
# disable the default JWT pre-auth provider, object up/downloads get also authorized via GitHub
27+
PRE_AUTHORIZED_ACTION_PROVIDER: null
28+
command: "--http=0.0.0.0:$GIFTLESS_PORT -M -T --threads 2 -p 2 --manage-script-name --callable app"
29+
pull_policy: never # prefer local build
30+
build:
31+
cache_from:
32+
- docker.io/datopian/giftless:latest
33+
context: ../..
34+
35+
proxy:
36+
image: docker.io/envoyproxy/envoy:v1.30-latest
37+
configs:
38+
- source: envoy
39+
target: /etc/envoy/envoy.yaml
40+
command: "/usr/local/bin/envoy -c /etc/envoy/envoy.yaml"
41+
ports:
42+
- "$SERVICE_PORT:$PROXY_PORT"
43+
depends_on:
44+
giftless:
45+
condition: service_started
46+
47+
configs:
48+
envoy:
49+
content: |
50+
static_resources:
51+
listeners:
52+
- address:
53+
socket_address:
54+
address: 0.0.0.0
55+
port_value: $PROXY_PORT # proxy port
56+
filter_chains:
57+
- filters:
58+
- name: envoy.filters.network.http_connection_manager
59+
typed_config:
60+
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
61+
stat_prefix: ingress_http
62+
http_filters:
63+
- name: envoy.filters.http.router
64+
typed_config:
65+
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
66+
suppress_envoy_headers: true
67+
access_log:
68+
- name: envoy.access_loggers.file
69+
typed_config:
70+
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
71+
path: /dev/stdout
72+
generate_request_id: false
73+
preserve_external_request_id: true
74+
route_config:
75+
name: ingress_route
76+
virtual_hosts:
77+
- name: giftless
78+
domains:
79+
- "*"
80+
routes:
81+
- name: giftless
82+
# Only this goes to the giftless service
83+
match:
84+
safe_regex:
85+
regex: (?:/[^/]+){2,}\.git/info/lfs(?:/.*|$)
86+
route:
87+
timeout: 0s # don't break long-running downloads
88+
cluster: giftless
89+
- name: api_github_com
90+
# Routing 3rd party tools assuming this is a GitHub Enterprise URL /api/v#/X to public api.github.com/X
91+
match:
92+
safe_regex: &api_regex
93+
regex: /api/v\d(?:/(.*)|$)
94+
route:
95+
regex_rewrite:
96+
pattern: *api_regex
97+
substitution: /\1
98+
host_rewrite_literal: api.github.com
99+
timeout: 3600s
100+
cluster: api_github_com
101+
request_headers_to_remove:
102+
- x-forwarded-proto
103+
- name: github_com
104+
# Anything else is forwarded directly to GitHub
105+
match:
106+
prefix: "/"
107+
route:
108+
host_rewrite_literal: github.com
109+
timeout: 3600s
110+
cluster: github_com
111+
request_headers_to_remove:
112+
- x-forwarded-proto
113+
clusters:
114+
- name: giftless
115+
connect_timeout: 0.25s
116+
type: strict_dns
117+
lb_policy: round_robin
118+
load_assignment:
119+
cluster_name: giftless
120+
endpoints:
121+
- lb_endpoints:
122+
- endpoint:
123+
address:
124+
socket_address:
125+
address: giftless # inner giftless hostname
126+
port_value: $GIFTLESS_PORT # local giftless port
127+
- name: api_github_com
128+
type: logical_dns
129+
# Comment out the following line to test on v6 networks
130+
dns_lookup_family: v4_only
131+
load_assignment:
132+
cluster_name: api_github_com
133+
endpoints:
134+
- lb_endpoints:
135+
- endpoint:
136+
address:
137+
socket_address:
138+
address: api.github.com
139+
port_value: 443
140+
transport_socket:
141+
name: envoy.transport_sockets.tls
142+
typed_config:
143+
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
144+
sni: api.github.com
145+
- name: github_com
146+
type: logical_dns
147+
# Comment out the following line to test on v6 networks
148+
dns_lookup_family: v4_only
149+
load_assignment:
150+
cluster_name: github_com
151+
endpoints:
152+
- lb_endpoints:
153+
- endpoint:
154+
address:
155+
socket_address:
156+
address: github.com
157+
port_value: 443
158+
transport_socket:
159+
name: envoy.transport_sockets.tls
160+
typed_config:
161+
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
162+
sni: github.com

0 commit comments

Comments
 (0)