Skip to content

Commit 2be984a

Browse files
committed
Merge branch 'describe-architecture-with-hugo'
Now that the site is converted to be built with Hugo and Pagefind, let's reflect that status quo in the document describing the site's architecture. Signed-off-by: Johannes Schindelin <[email protected]>
2 parents 52c286b + 29c426c commit 2be984a

File tree

1 file changed

+51
-115
lines changed

1 file changed

+51
-115
lines changed

ARCHITECTURE.md

Lines changed: 51 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -1,161 +1,97 @@
11
# git-scm.com architecture
22

33
This document describes the general setup and architecture that runs the
4-
git-scm.com site. The idea is to document all the moving parts that
5-
_aren't_ checked in to this repository. That may help new people joining
6-
the project to help out, as well provide some continuity in case the
7-
maintainer is hit by a bus.
4+
git-scm.com site.
85

96
## Content
107

11-
Though the site is a rails app, it can _mostly_ be thought of as serving
12-
static content. It's just that we suck in that static content and
13-
pre-process it using nightly scheduled jobs. We never write anything to
14-
the database on behalf of user requests.
8+
This site is served via GitHub Pages and is a [Hugo](https://gohugo.io/) site
9+
with the search implemented using [Pagefind](https://pagefind.app/).
1510

1611
The content is a mix of:
1712

18-
- actual static content in this repository
13+
- original content from this repository
1914

2015
- community book content brought in from https://github.com/progit;
21-
see the `lib/tasks/book2.rake` file.
16+
see the `script/update-book2.rb` and `script/book.rb` files.
2217

23-
- manpages from releases of the git project, imported and formatted
24-
via asciidoctor; see the `lib/tasks/index.rake` task.
18+
The content is pre-rendered and tracked in the `external/book/` directory
19+
tree.
2520

21+
- manual pages from releases of the git project, imported and formatted via
22+
AsciiDoctor, and translated versions of the manual pages from
23+
https://github.com/jnavila/git-manpages-l10n/ (which itself contains
24+
pre-rendered pages from https://github.com/jnavila/git-manpages-l10n/); see
25+
the `script/update-docs.rb` file.
2626

27-
## Heroku
27+
The pre-rendered pages are tracked in the `external/docs/` directory tree.
2828

29-
The app itself is served by Heroku. The app name is `git-scm` (so you
30-
can visit it directly as https://git-scm.herokuapp.com). The site is
31-
owned by the git-scm.com team. If you want to be involved in managing
32-
uptime/deploys/etc, you'll need a Heroku account and request to be added
33-
to that team.
29+
To deploy to GitHub Pages, it is necessary to turn off the default setting to
30+
"publish from a branch" and instead change the setting to "publish with a
31+
custom GitHub Actions workflow":
32+
https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site#publishing-with-a-custom-github-actions-workflow
33+
With this change, the site can be tested in the fork by pushing to the
34+
`gh-pages` branch (which will trigger the `deploy.yml` workflow) and then
35+
navigating to https://git-scm.<user>.github.io/.
3436

35-
We use a few Heroku add-ons:
37+
## Non-static parts
3638

37-
- Bonsai elasticsearch (see below)
39+
While the site consists mostly of static content, there are a couple of
40+
parts that are sort of dynamic.
3841

39-
- Heroku Postgres as the database
42+
The search is implemented client-side, via [Pagefind](https://pagefind.app/).
4043

41-
- Heroku Redis for rails caching
44+
A few scheduled GitHub workflows keep the content up to date:
4245

43-
- Heroku scheduler for cron jobs
46+
- `update-git-version-and-manual-pages` and `update-download-data` (pick
47+
up newly released git versions)
4448

45-
The nightly scheduled jobs are:
49+
- `update-translated-manual-pages` (fetch and format translated manual
50+
pages from the jnavila/git-html-l10n repository)
4651

47-
- `rake downloads` (pick up newly released git versions)
48-
49-
- `rake preindex` (pull in and format manpages for released git
50-
versions)
51-
52-
- `rake remote_genbook2` (pull in and format progit2 book content,
52+
- `update-book` (fetch and format progit2 book content,
5353
including translations)
5454

55-
It should be safe to run any of those jobs more frequently. E.g., if you
56-
know there's a new Git release out, then:
57-
58-
heroku run rake preindex
59-
heroku run rake downloads
60-
61-
will get it on the site without waiting for the nightly run.
62-
63-
Merges to the `main` branch on GitHub auto-deploy to Heroku, so unless
64-
you're doing something tricky you generally shouldn't need to manually
65-
deploy.
66-
67-
Note that some of the formatting of manpages and book content happens
68-
when they are imported by the rake tasks. So after fixing some
69-
formatting and deploying, the rake jobs may need to be re-run with a
70-
special flag to re-import (see the individual tasks for details).
71-
72-
73-
## Cloudflare
74-
75-
We get enough requests that it's easy to overwhelm the single Heroku
76-
dyno. So we have Cloudflare sitting in front of it, aggressively caching
77-
everything. That also should make the site faster to serve to regions
78-
far away from Heroku's servers.
79-
80-
The Cloudflare setup is mostly pretty simple:
55+
These workflows are also marked as `workflow_dispatch`, i.e. they can be run
56+
manually (e.g. to update the download links just after Git for Windows
57+
published a new release).
8158

82-
- they serve DNS for the whole domain (that's where they insert the CDN
83-
magic)
84-
85-
- Cloudflare provides `https://` support to the user. Obviously the
86-
site is totally open and doesn't have any sensitive data, so this is
87-
really more about integrity. The certificate is generated by
88-
Cloudflare (and requires SNI on the browser side).
89-
90-
- the Cloudflare connection to Heroku is passed over TLS; they provide an
91-
"internal" certificate that we ask Heroku to use, so the connection
92-
is secured between the two (again, mostly for integrity)
93-
94-
- the most exotic config is that we use "page rules" to mark the whole
95-
site to be cached aggressively, regardless of any caching headers
96-
sent from Heroku. This is a bit of a hack, but there's very little on
97-
the site that can't be cached (which is perhaps a sign that the rails
98-
setup needs to be tweaked to send more reasonable caching headers,
99-
but this has been simple and effective so far).
100-
101-
There are a few special page rules to lift this caching for cases
102-
where we do server-side logic (e.g.,
103-
https://github.com/git/git-scm.com/issues/1129#issuecomment-363067019"),
104-
but the long-term goal is to push that logic onto the client side as
105-
much as possible.
106-
107-
Both domains (c.f., the section on [DNS](#DNS) below) are owned by a
108-
Cloudflare "Team", and membership of that team is required to
109-
administrate the domains. Similar to the Heroku setup, you can ask to
110-
join this team if you wish to help out. The information about the team
111-
setup is in escrow with the Git PLC at Software Freedom Conservancy.
112-
Cloudflare provides the project with enough credits that it doesn't cost
113-
anything (though we're not using very many features, so it's possible
114-
that a free account would be sufficient, too).
115-
116-
## Bonsai Elasticsearch
117-
118-
The search functionality on the site is served by an elasticsearch
119-
cluster. The index can be populated by running `rake search_index`
120-
(manpages) and `rake search_index_book` (book) on Heroku (we only index
121-
the manpages and book). This perhaps should be run nightly, or at least
122-
after pulling in new content, but it currently isn't done automatically.
123-
124-
The elasticsearch cluster is provided by Bonsai via their Heroku plugin.
125-
Our needs are larger than their free tier provides, but we receive
126-
credits from them that provide the service for free.
59+
Merges to the `gh-pages` branch on GitHub auto-deploy to GitHub Pages via the
60+
`deploy` GitHub workflow.
12761

62+
Note that some of the formatting of manual pages and book content happens
63+
when they are imported by the GitHub workflows. Therefore, whenever there are
64+
changes to the scripts/workflows/automation that affect formatting, these
65+
workflows may need to be triggered using the force-rebuild flag to be toggled
66+
(see the individual workflows for details).
12867

12968
## DNS
13069

131-
The actual DNS service is provided by Cloudflare (see above). The domain
132-
itself is registered with Gandi, and is owned by the project via
133-
Software Freedom Conservancy. Funds for the registration are provided
134-
from the Git project's Conservancy funds, and both the Git PLC and
135-
Conservancy have credentials to modify the setup.
70+
The actual DNS service is provided by Cloudflare. The domain itself is
71+
registered with Gandi, and is owned by the project via Software Freedom
72+
Conservancy. Funds for the registration are provided from the Git project's
73+
Conservancy funds, and both the Git PLC and Conservancy have credentials to
74+
modify the setup.
13675

13776
Note that we own both git-scm.com and git-scm.org; the latter redirects
13877
to the former.
13978

140-
14179
## Manual Intervention
14280

14381
The site mostly just runs without intervention:
14482

145-
- code merged to `main` is auto-deployed
83+
- code merged to `gh-pages` is auto-deployed
14684

147-
- new git versions are detected daily and manpages and download links
85+
- new git versions are detected daily and manual pages and download links
14886
updated
14987

15088
- book updates (including translations) are picked up daily
15189

15290
There are a few tasks that still need to be handled by a human:
15391

154-
- new images added to the book have to be copied manually from
155-
progit/progit2
156-
15792
- new languages for book translations need to be added to
158-
`lib/tasks/book2.rake`
93+
`script/book.rb`
15994

160-
- forced re-imports of content (e.g., a formatting fix to imported
161-
manpages) must be triggered manually
95+
- forced re-imports of content (e.g., when fixing formatting in the
96+
imported manual pages) must be triggered manually with `force-rebuild`
97+
toggled

0 commit comments

Comments
 (0)