Add doc-redirects, a maintenance script for our RTD HTTP redirects#79
Add doc-redirects, a maintenance script for our RTD HTTP redirects#79
doc-redirects, a maintenance script for our RTD HTTP redirects#79Conversation
1193ffb to
6a250d6
Compare
awelzel
left a comment
There was a problem hiding this comment.
Only fly-by on requests.Session usage.
I was also initially wondering if this could be in zeek/doc/tools or zeek/tools/doc/ rather than in a separate repo, but I think this clashes with the proposed workflow.
devel-tools/doc-redirects
Outdated
| headers = {"Authorization": f"token {RTD_API_TOKEN}"} | ||
|
|
||
| try: | ||
| response = requests.request( |
There was a problem hiding this comment.
If you want to be a bit nicer on the Internet, re-use a single connection for all requests via requests.Session() ?
There was a problem hiding this comment.
That's a good thought, perhaps that also tames the rate-limiter a bit. Will do.
evantypanski
left a comment
There was a problem hiding this comment.
I added in some feature request for tracking the number of uploaded files and what changed, curious what you think. It stems from me being very nervous about changing production stuff, even if trivial
but: this looks awesome!
devel-tools/doc-redirects
Outdated
| return 1 | ||
|
|
||
| for rd_data in jdict["results"]: | ||
| # Not sure what "pk" stands for, but it identifies the specific redirect: |
| return 0 | ||
|
|
||
|
|
||
| def cmd_upload(args: argparse.Namespace) -> int: |
There was a problem hiding this comment.
I have a feature request: It seems like it clears out the existing redirects, doesn't tell you how many or what it did, then sets new ones. I would want:
- Somewhere in the file that keeps a total of "how many redirects" so I can see at a glance. This isn't super necessary (since I could just like
$ grep -c '^-' redirects.yml) but I think it'd be nice as a sanity check - When uploading, check the number of uploaded redirects and make sure it matches the number in the file. Then compare the uploaded redirects versus number of downloaded redirects when purging. This way we can say like "Added X redirects" or "Removed X redirects." It'd be nice for feedback I think, since every time I run this I will be nervous about breaking something :)
A bonus point 3 would be to track all changed redirects, whether added, removed, or location change. Then we could report that info and have a --dry-run. Again easing my fear of breaking things
though I haven't run it so I might be missing some output
There was a problem hiding this comment.
Good thoughts, will do!
Btw I was talking to RTD support because of some details of their API and the guy was shocked that we'd blow away the redirects and re-establish them. Maybe this is too simplistic, but this only takes them out for a few seconds at a time, and just seems easier than tracking individual redirects via their unique identifiers (primary key! :-) and reworking the resulting overall ordering for any edits. But we could do that, especially if we notice any trouble with this approach.
6a250d6 to
e6e94ad
Compare
|
I've pushed a couple of updates to add session use, add more detailed output during the upload, and to add a result check ( Perhaps simply try it out. The API key is in the usual place. I've been messing with the |
evantypanski
left a comment
There was a problem hiding this comment.
felt myself reaching for nits so I stopped, not really attached to my comments to take em or leave em
For the record I didn't run this yet but will in the future, then might just put in a PR if I have more comments
devel-tools/doc-redirects
Outdated
| msg(f"Weird: expected {name_from} to start with 'doc/'") | ||
|
|
||
| # Strip the "doc/" prefixes since the HTML files won't have it either. | ||
| renamings[name_to[4:]] = name_orig[4:] |
There was a problem hiding this comment.
Maybe a bit safer to use removeprefix? Then it doesn't need the comment imo (or really the startswith checks, but that's debatable I guess)
| renamings[name_to[4:]] = name_orig[4:] | |
| renamings[name_to.removeprefix("doc/")] = name_orig.removeprefix("doc/") |
| try: | ||
| response = self.sess.request(method.upper(), url, json=jdict) | ||
|
|
||
| if self.rate_limit(response): |
There was a problem hiding this comment.
Interesting, I hadn't seen that in Requests! I'll give that a try.
There was a problem hiding this comment.
I tried a few things and failed. :-( The 429 always immediately errors out instead of being absorbed and retried over. I don't want to spend more time on it right now since it's working as-is, but feel free to open iterations in future PRs.
There was a problem hiding this comment.
Out of curiosity I asked Gemini to make the change:
Can you adapt the rate-limiting logic in the file to using the Python request module's built-in retry logic?
It came up with nearly the exact changes I made, including moving the comment — pretty cool. The only real differences I spot is that it takes the Retry class out of urllib3 where I used requests.adapters.Retry, and it hardcodes status_forcelist and allowed_methods. I haven't tried it out yet, but it's a great result.
This script allows downloading & uploading of redirects in a RTD project, as well as identifying candidates for redirects from a git revision range. See docstring for details.
This adds more informative output and adds an optional validation mechanism that downloads the final redirects after their upload, then compares them. The script will exit with a corresponding error message if validation fails, and leave old and new redirects in YAML files in the local directory.
e6e94ad to
c508199
Compare
|
I did notice a bug in my validation logic — it wasn't correctly comparing the post-update live redirects to the content of the update itself, it instead compared them to what was there before the update. Since the whole point is that an update changes the redirects that didn't make much sense. :-) I tested a few more combinations on the |
|
Something weird happened here — Github still shows this as unmerged, but I did merge I'm closing this manually. |
Since we're about to move a bunch of our documentation around, I wrote an API client for the RTD API that can assist in the maintenance of our redirect file. It defines a new YAML format for the file that's a simple list of redirects, each with a set of attributes that mirrors what RTD supports. It supports uploading and downloading the redirects, and can also assist in identifying redirect candidates from files moved around in git.
See docstring for details and zeek/zeek#5195 for an example of the new redirects.yml (the output of
doc-redirects download -f path/to/zeek/doc/redirects.yml).I used the "zeek" docs project (labeled "placeholder/redirect" and not used in years) for experimentation & testing.