Skip to content

Commit 9da6496

Browse files
committed
feat: document the changes
1 parent 6a7dd3e commit 9da6496

File tree

7 files changed

+40
-8
lines changed

7 files changed

+40
-8
lines changed

Dockerfile

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
1-
# docker build . -f Dockerfile -t distributed-wikipedia-mirror
2-
# docker run --rm -v $(pwd)/snapshots:/github/workspace/snapshots -v $(pwd)/tmp:/github/workspace/tmp distributed-wikipedia-mirror <mirrorzim.sh arguments>
1+
# This Dockerfile creates a self-contained image in which mirrorzim.sh can be executed
2+
#
3+
# You can build the image as follows (remember to use this repo as context for the build):
4+
# docker build . -f Dockerfile -t distributed-wikipedia-mirror
5+
#
6+
# You can then run the container anywhere as follows
7+
# docker run --rm -v $(pwd)/snapshots:/github/workspace/snapshots -v $(pwd)/tmp:/github/workspace/tmp distributed-wikipedia-mirror <mirrorzim.sh arguments>
8+
# NOTE(s):
9+
# - volume attached at /github/workspace/snapshots will contain downloaded zim files after the run
10+
# - volume attached at /github/workspace/tmp will contain created website directories after the run
311

412
FROM openzim/zim-tools:3.1.0 AS openzim
513

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ This step won't be necessary when automatic sharding lands in go-ipfs (wip).
136136

137137
### Step 3: Download the latest snapshot from kiwix.org
138138

139-
Source of ZIM files is at https://download.kiwix.org/zim/wikipedia/
139+
Source of ZIM files is at https://download.kiwix.org/zim/wikipedia/
140140
Make sure you download `_all_maxi_` snapshots, as those include images.
141141

142142
To automate this, you can also use the `getzim.sh` script:
@@ -164,8 +164,8 @@ $ zimdump dump ./snapshots/wikipedia_tr_all_maxi_2021-01.zim --dir ./tmp/wikiped
164164

165165
> ### ℹ️ ZIM's main page
166166
>
167-
> Each ZIM file has "main page" attribute which defines the landing page set for the ZIM archive.
168-
> It is often different than the "main page" of upstream Wikipedia.
167+
> Each ZIM file has "main page" attribute which defines the landing page set for the ZIM archive.
168+
> It is often different than the "main page" of upstream Wikipedia.
169169
> Kiwix Main page needs to be passed in the next step, so until there is an automated way to determine "main page" of ZIM, you need to open ZIM in Kiwix reader and eyeball the name of the landing page.
170170
171171
### Step 5: Convert the unpacked zim directory to a website with mirror info
@@ -242,7 +242,7 @@ Make sure at least two full reliable copies exist before updating DNSLink.
242242

243243
## mirrorzim.sh
244244

245-
It is possible to automate steps 3-6 via a wrapper script named `mirrorzim.sh`.
245+
It is possible to automate steps 3-6 via a wrapper script named `mirrorzim.sh`.
246246
It will download the latest snapshot of specified language (if needed), unpack it, and add it to IPFS.
247247

248248
To see how the script behaves try running it on one of the smallest wikis, such as `cu`:
@@ -253,9 +253,9 @@ $ ./mirrorzim.sh --languagecode=cu --wikitype=wikipedia --hostingdnsdomain=cu.wi
253253

254254
## Docker build
255255

256-
A `Dockerfile` with all the software requirements is provided.
256+
A `Dockerfile` with all the software requirements is provided.
257257
For now it is only a handy container for running the process on non-Linux
258-
systems or if you don't want to pollute your system with all the dependencies.
258+
systems or if you don't want to pollute your system with all the dependencies.
259259
In the future it will be end-to-end blackbox that takes ZIM and spits out CID
260260
and repo.
261261

@@ -340,3 +340,7 @@ We are working on improving deduplication between snapshots, but for now YMMV.
340340
## Code
341341

342342
If you would like to contribute more to this effort, look at the [issues](https://github.com/ipfs/distributed-wikipedia-mirror/issues) in this github repo. Especially check for [issues marked with the "wishlist" label](https://github.com/ipfs/distributed-wikipedia-mirror/labels/wishlist) and issues marked ["help wanted"](https://github.com/ipfs/distributed-wikipedia-mirror/labels/help%20wanted).
343+
344+
## GitHub Actions Workflow
345+
346+
The GitHub Actions workflow that is available in this repository takes information about the wiki website that you want to mirror, downloads its' zim, unpacks it, converts it to a website and uploads it to S3 as a tar.gz package which is publicly accessible.

packer/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Packer configuration that resides here creates AMI in which:
2+
- ipfs service is started on machine boot
3+
- `publish_website_from_s3.sh` is available

terraform/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Terraform configuration that resides here creates:
2+
- S3 bucket where website packages can be uploaded
3+
- EC2 instance which runs ipfs where `publish_website_from_s3.sh` can be run to publish mirrors
4+
5+
To run `terraform` here you have to export:
6+
- `TF_VAR_public_key` - public key which will be used to give you SSH access to EC2
7+
- `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` - creds to AWS account that have enough permissions to create the resources
8+
- `AWS_REGION` - the name of the region where S3 and EC2 should be created

tools/add_website_to_ipfs.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
set -euo pipefail
44

5+
# This script adds website that was created at <unpacked zim dir>
6+
# from <zim file name> to ipfs
7+
58
usage() {
69
echo "USAGE:"
710
echo " $0 <zim file name> <unpacked zim dir> [<extra ipfs add flags>]";

tools/publish_website_from_s3.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
set -euo pipefail
44

5+
# This scripts downloads <website name> from s3://wikipedia-on-ipfs,
6+
# unpacks it and adds it to ipfs
7+
58
usage() {
69
echo "USAGE:"
710
echo " $0 <website name>";

tools/start_ipfs.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
set -euo pipefail
44

5+
# This script starts ipfs daemon
6+
# If ipfs was not initialised before, this script also initialises ipfs
7+
58
if ! ipfs repo stat; then
69
ipfs init -p server,local-discovery,flatfs,randomports --empty-repo
710
ipfs config --json Experimental.AcceleratedDHTClient true

0 commit comments

Comments
 (0)