Releases · seanpedrick-case/doc_redaction

Removed reference to model source in summarisation token counts by @seanpedrick-case in #129
Updated DynamoDB and S3 log load scripts.
Updated user guide.
Fix to loading review file when loading in '_redactions_for_review' file.
Front page GUI fix when SHOW_COSTS is False
by @seanpedrick-case in #131

Full Changelog: v1.7.1...v1.7.2

Contributors

seanpedrick-case

Assets 2

04 Feb 14:11

seanpedrick-case

v1.7.1

58bd5e8

v1.7.1

What's Changed

Package updates. Minor documentation updates. Fixes to review pdf load efficiency, ocr efficiency, and minor deduplication bug by @seanpedrick-case in #127
Removed reference to model name in summarisation task

Full Changelog: v1.7.0...v1.7.1

Contributors

seanpedrick-case

Assets 2

03 Feb 09:40

seanpedrick-case

v1.7.0

13caf87

v.1.7.0

What's Changed

Overall: Added LLM support for redaction and summarisation. GUI improvements with 'Walkthrough' redaction process. Efficient OCR option added with multithread and split text extraction between visual OCR and simple text extraction. Various bug fixes. by @seanpedrick-case in #120
Added LLM support for entity detection and document summarisation. Should work with AWS Bedrock, local transformers LLMs, or an inference server URL for e.g. Llama.cpp or VLLM. Relevant config variables in tools/config.py: SHOW_AWS_BEDROCK_LLM_MODELS, SHOW_TRANSFORMERS_LLM_PII_DETECTION_OPTIONS, SHOW_INFERENCE_SERVER_PII_OPTIONS, SHOW_SUMMARISATION
Added efficient redaction OCR option - falls back to simple text extraction for pages with selectable text, then uses visual OCR for the remainder. Relevant config variable in tools/config.py: EFFICIENT_OCR. by @seanpedrick-case in #123
Walkthrough 'simple' front page GUI tab added. Relevant config variable in tools/config.py: SHOW_QUICKSTART
Streamlined GUI for reviewing files and duplicate detection/summaries
Fix on cli_redact function for passing in new parameters.
Various bug fixes throughout code to fix minor issues.

Full Changelog: v1.6.7...v1.7.0

Contributors

seanpedrick-case

Assets 2

05 Jan 12:01

seanpedrick-case

v1.6.7

3be9b9b

v1.6.7

What's Changed

Corrected base href references and proxy headers for FastAPI implementation by @seanpedrick-case in #118

Full Changelog: v1.6.6...v1.6.7

Contributors

seanpedrick-case

Assets 2

16 Dec 11:34

seanpedrick-case

v1.6.6

9ab85e0

v1.6.6

What's Changed

Corrected input image creation location so that output redaction pdfs have coordinates correctly placed by @seanpedrick-case in #116

Full Changelog: V1.6.5...v1.6.6

Contributors

seanpedrick-case

Assets 2

12 Dec 22:25

seanpedrick-case

V1.6.5

ba5f5dd

V1.6.5

What's Changed

Added save to s3 option to cli_redact by @seanpedrick-case in #114

Full Changelog: v1.6.4...V1.6.5

Contributors

seanpedrick-case

Assets 2

11 Dec 13:51

seanpedrick-case

v1.6.4

ffec20b

v1.6.4

What's Changed

AWS Textract extraction now has an option to split punctuation at the end of words from the main body of the word. Corrected textract bounding box image outputs by @seanpedrick-case in #112
Added Qwen VL 235B-A22B as a possible model option in Transformers

Full Changelog: v1.6.3...v1.6.4

Contributors

seanpedrick-case

Assets 2

11 Dec 10:32

seanpedrick-case

v1.6.3

03b7757

v1.6.3

What's Changed

CLI / direct mode should now accept s3-based paths as inputs for files. Minor fixes to test code file encoding by @seanpedrick-case in #110

Full Changelog: v1.6.2...v1.6.3

Contributors

seanpedrick-case

Assets 2

Releases: seanpedrick-case/doc_redaction

v1.7.4

What's Changed

Contributors

Uh oh!

v1.7.3

What's Changed

Contributors

Uh oh!

v1.7.2

What's Changed

Contributors

Uh oh!

v1.7.1

What's Changed

Contributors

Uh oh!

v.1.7.0

What's Changed

Contributors

Uh oh!

v1.6.7

What's Changed

Contributors

Uh oh!

v1.6.6

What's Changed

Contributors

Uh oh!

V1.6.5

What's Changed

Contributors

Uh oh!

v1.6.4

What's Changed

Contributors

Uh oh!

v1.6.3

What's Changed

Contributors

Uh oh!