Skip to content

Commit 1794564

Browse files
authored
feat: wikipedia_uk_all_maxi_2022-03 and wikipedia_ru_all_maxi_2022-03 (#120)
* feat: wikipedia_uk_all_maxi_2022-03.zim * feat: wikipedia_ru_all_maxi_2022-03.zim
1 parent be2f337 commit 1794564

File tree

3 files changed

+28
-13
lines changed

3 files changed

+28
-13
lines changed

README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Putting Wikipedia Snapshots on IPFS and working towards making it fully read-wri
1515
- https://my.wikipedia-on-ipfs.org
1616
- https://ar.wikipedia-on-ipfs.org
1717
- https://zh.wikipedia-on-ipfs.org
18+
- https://uk.wikipedia-on-ipfs.org
1819
- https://ru.wikipedia-on-ipfs.org
1920
- https://fa.wikipedia-on-ipfs.org
2021

@@ -115,24 +116,31 @@ It is advised to use separate IPFS node for this:
115116

116117
```console
117118
$ export IPFS_PATH=/path/to/IPFS_PATH_WIKIPEDIA_MIRROR
118-
$ ipfs init -p server,local-discovery,badgerds,randomports --empty-repo
119+
$ ipfs init -p server,local-discovery,flatfs,randomports --empty-repo
119120
```
120121

121-
#### Tune datastore for speed
122+
#### Tune DHT for speed
122123

123-
Make sure repo is initialized with datastore backed by `badgerds` for improved performance, or if you choose to use slower `flatfs` at least use it with `sync` set to `false`.
124+
Wikipedia has a lot of blocks, to publish them as fast as possible,
125+
enable [Accelerated DHT Client](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#accelerated-dht-client):
124126

125-
**NOTE:** While badgerv1 datastore _is_ faster, one may choose to avoid using it with bigger builds like English because of [memory issues due to the number of files](https://github.com/ipfs/distributed-wikipedia-mirror/issues/85). Potential workaround is to use [`filestore`](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipfs-filestore) that avoids duplicating data and reuses unpacked files as-is.
127+
```console
128+
$ ipfs config --json Experimental.AcceleratedDHTClient true
129+
```
126130

127-
#### Enable HAMT sharding
131+
#### Tune datastore for speed
128132

129-
Configure your IPFS node to enable directory sharding
133+
Make sure repo uses `flatfs` with `sync` set to `false`:
130134

131-
```sh
132-
$ ipfs config --json 'Experimental.ShardingEnabled' true
135+
```console
136+
$ ipfs config --json 'Datastore.Spec.mounts' "$(ipfs config 'Datastore.Spec.mounts' | jq -c '.[0].child.sync=false')"
133137
```
134138

135-
This step won't be necessary when automatic sharding lands in go-ipfs (wip).
139+
**NOTE:** While badgerv1 datastore is faster is nome configurations, we choose to avoid using it with bigger builds like English because of [memory issues due to the number of files](https://github.com/ipfs/distributed-wikipedia-mirror/issues/85). Potential workaround is to use [`filestore`](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#ipfs-filestore) that avoids duplicating data and reuses unpacked files as-is.
140+
141+
#### HAMT sharding
142+
143+
Make sure you use go-ipfs 0.12 or later, it has automatic sharding of big directories.
136144

137145
### Step 3: Download the latest snapshot from kiwix.org
138146

mirrorzim.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ fi
8484

8585
printf "\nEnsure zimdump is present...\n"
8686
PATH=$PATH:$(realpath ./bin)
87-
which zimdump &> /dev/null || (curl --progress-bar -L https://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64-3.0.0.tar.gz | tar -xvz --strip-components=1 -C ./bin zim-tools_linux-x86_64-3.0.0/zimdump && chmod +x ./bin/zimdump)
87+
which zimdump &> /dev/null || (curl --progress-bar -L https://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64-3.1.0.tar.gz | tar -xvz --strip-components=1 -C ./bin zim-tools_linux-x86_64-3.1.0/zimdump && chmod +x ./bin/zimdump)
8888

8989
printf "\nDownload and verify the zim file...\n"
9090
ZIM_FILE_SOURCE_URL="$(./tools/getzim.sh download $WIKI_TYPE $WIKI_TYPE $LANGUAGE_CODE all maxi latest | grep 'URL:' | cut -d' ' -f3)"

snapshot-hashes.yml

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,20 @@ zh:
3737
date: 2021-03-16
3838
ipns:
3939
ipfs: https://dweb.link/ipfs/bafybeiazgazbrj6qprr4y5hx277u4g2r5nzgo3jnxkhqx56doxdqrzms6y
40+
uk:
41+
name: Ukrainian
42+
original: uk.wikipedia.org
43+
source: wikipedia_uk_all_maxi_2022-03.zim
44+
date: 2022-03-09
45+
ipns:
46+
ipfs: https://dweb.link/ipfs/bafybeibiqlrnmws6psog7rl5ofeci3ontraitllw6wyyswnhxbwdkmw4ka
4047
ru:
4148
name: Russian
4249
original: ru.wikipedia.org
43-
source: wikipedia_ru_all_maxi_2021-03.zim
44-
date: 2021-03-25
50+
source: wikipedia_ru_all_maxi_2022-03.zim
51+
date: 2022-03-12
4552
ipns:
46-
ipfs: https://dweb.link/ipfs/bafybeieto6mcuvqlechv4iadoqvnffondeiwxc2bcfcewhvpsd2odvbmvm
53+
ipfs: https://dweb.link/ipfs/bafybeiezqkklnjkqywshh4lg65xblaz2scbbdgzip4vkbrc4gn37horokq
4754
fa:
4855
name: Persian
4956
original: fa.wikipedia.org

0 commit comments

Comments
 (0)