Skip to content

Releases: seanpedrick-case/doc_redaction

v1.7.4

06 Feb 13:42
1da988a

Choose a tag to compare

What's Changed

  • Original redaction annotations now removed from review files before replacement by app by @seanpedrick-case in #135

Full Changelog: v1.7.3...v1.7.4

v1.7.3

05 Feb 14:56
8bf927b

Choose a tag to compare

What's Changed

  • Updated lambda function for bringing together dynamodb logs
  • Updated logging for summarisation and deduplication in main app to correct minor issues

by @seanpedrick-case in #133

Full Changelog: v1.7.2...v1.7.3

v1.7.2

05 Feb 09:16
a802316

Choose a tag to compare

What's Changed

  • Removed reference to model source in summarisation token counts by @seanpedrick-case in #129
  • Updated DynamoDB and S3 log load scripts.
  • Updated user guide.
  • Fix to loading review file when loading in '_redactions_for_review' file.
  • Front page GUI fix when SHOW_COSTS is False
    by @seanpedrick-case in #131

Full Changelog: v1.7.1...v1.7.2

v1.7.1

04 Feb 14:11
58bd5e8

Choose a tag to compare

What's Changed

  • Package updates. Minor documentation updates. Fixes to review pdf load efficiency, ocr efficiency, and minor deduplication bug by @seanpedrick-case in #127
  • Removed reference to model name in summarisation task

Full Changelog: v1.7.0...v1.7.1

v.1.7.0

03 Feb 09:40
13caf87

Choose a tag to compare

What's Changed

  • Overall: Added LLM support for redaction and summarisation. GUI improvements with 'Walkthrough' redaction process. Efficient OCR option added with multithread and split text extraction between visual OCR and simple text extraction. Various bug fixes. by @seanpedrick-case in #120
  • Added LLM support for entity detection and document summarisation. Should work with AWS Bedrock, local transformers LLMs, or an inference server URL for e.g. Llama.cpp or VLLM. Relevant config variables in tools/config.py: SHOW_AWS_BEDROCK_LLM_MODELS, SHOW_TRANSFORMERS_LLM_PII_DETECTION_OPTIONS, SHOW_INFERENCE_SERVER_PII_OPTIONS, SHOW_SUMMARISATION
  • Added efficient redaction OCR option - falls back to simple text extraction for pages with selectable text, then uses visual OCR for the remainder. Relevant config variable in tools/config.py: EFFICIENT_OCR. by @seanpedrick-case in #123
  • Walkthrough 'simple' front page GUI tab added. Relevant config variable in tools/config.py: SHOW_QUICKSTART
  • Streamlined GUI for reviewing files and duplicate detection/summaries
  • Fix on cli_redact function for passing in new parameters.
  • Various bug fixes throughout code to fix minor issues.

Full Changelog: v1.6.7...v1.7.0

v1.6.7

05 Jan 12:01
3be9b9b

Choose a tag to compare

What's Changed

  • Corrected base href references and proxy headers for FastAPI implementation by @seanpedrick-case in #118

Full Changelog: v1.6.6...v1.6.7

v1.6.6

16 Dec 11:34
9ab85e0

Choose a tag to compare

What's Changed

  • Corrected input image creation location so that output redaction pdfs have coordinates correctly placed by @seanpedrick-case in #116

Full Changelog: V1.6.5...v1.6.6

V1.6.5

12 Dec 22:25
ba5f5dd

Choose a tag to compare

What's Changed

Full Changelog: v1.6.4...V1.6.5

v1.6.4

11 Dec 13:51
ffec20b

Choose a tag to compare

What's Changed

  • AWS Textract extraction now has an option to split punctuation at the end of words from the main body of the word. Corrected textract bounding box image outputs by @seanpedrick-case in #112
  • Added Qwen VL 235B-A22B as a possible model option in Transformers

Full Changelog: v1.6.3...v1.6.4

v1.6.3

11 Dec 10:32
03b7757

Choose a tag to compare

What's Changed

  • CLI / direct mode should now accept s3-based paths as inputs for files. Minor fixes to test code file encoding by @seanpedrick-case in #110

Full Changelog: v1.6.2...v1.6.3