-
-
Notifications
You must be signed in to change notification settings - Fork 36
Closed
Description
Initial tests with #1164 show a number of problems:
- even with git commit-graph, the runs are slow, this may be because the repo starts out shallowed to 5 commits but can then build up 6000 commits in a single run. So it's probably necessary to avoid wildcards - will look into that today.
- the script sometimes trips over itself, and errors with 'git in use' or 'expected only one file to be changed' and then fails
- if git is left with uncommitted changes, the next run will fail too.
- the script builds up a lot of memory usage (several Gigabytes)
To reduce the effect of a git repo getting into the wrong state, it would help to shard the snapshots and versions repos -> #1172
Apart from sharding the repos, I also want to rethink the scheduling through a cronjob -> #1173
I think these two measures combined will make the crawler a lot more stable, and also allow us to run it on the same server as the rest of ToS;DR, so it will not only be more reliable, but also cheaper for us to host.
Metadata
Metadata
Assignees
Labels
No labels