Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The wikifarm I use only offers full gzip xml dumps of all revisions. I do a lot of pywikibot scripting, so I prefer to test offline to avoid constantly slamming the API. Because all the revisions are included in the dump, it takes a while to retrieve the current version of a page. While I was able to use the API with this library to generate a
--curonly
dump (thanks for making it so easy to set up and use!), I wonder if that procedure could be adapted to make a current versions dump from an existing full local dump.I started trying to script it myself and it wasn't hard to write a little etree-based function to iterate through a single
<page>
node and remove all<revision>
elements but the latest one, determined byrevision_id
. However XML is pretty persnickety and I've struggled with incrementally writing to a new file while preserving all other xml data outside of<page>
elements so that the pywikibot xmlreader can still interpret it.I wouldn't mind working on a PR for a new arg like
--from-local
if folks could point me in the right direction in this code base. Relatedly, an ability to make a test dump with e.g.--max-pages=3
would help me work on my code with a validly constructed compressed xml file.Beta Was this translation helpful? Give feedback.
All reactions