Set up Python3 environment with pip.
Install requirements: pip install -r requirements.txt. Make sure to be
using a Python3 version of pip.
First, convert the EM (Explanatory Memorandum) PDF files to text (using for
example, pdftotext -layout to preserve layout), then run the uksiem parser.
The sip-circulars folder has Statutory Instrument Practice (SIP) circulars relating to the schema for Explanatory Memoranda.
This project additionally hosts two recent strands of related work.
-
em-toolsA collection of tools for building a complete repository of EM documents. -
text-parserAn alternative EM document parser to uksiem. Makes use of the librarypyparsing.
Follow the links for the details.