Releases: seanpedrick-case/doc_redaction
Releases · seanpedrick-case/doc_redaction
v1.7.4
What's Changed
- Original redaction annotations now removed from review files before replacement by app by @seanpedrick-case in #135
Full Changelog: v1.7.3...v1.7.4
v1.7.3
What's Changed
- Updated lambda function for bringing together dynamodb logs
- Updated logging for summarisation and deduplication in main app to correct minor issues
by @seanpedrick-case in #133
Full Changelog: v1.7.2...v1.7.3
v1.7.2
What's Changed
- Removed reference to model source in summarisation token counts by @seanpedrick-case in #129
- Updated DynamoDB and S3 log load scripts.
- Updated user guide.
- Fix to loading review file when loading in '_redactions_for_review' file.
- Front page GUI fix when SHOW_COSTS is False
by @seanpedrick-case in #131
Full Changelog: v1.7.1...v1.7.2
v1.7.1
What's Changed
- Package updates. Minor documentation updates. Fixes to review pdf load efficiency, ocr efficiency, and minor deduplication bug by @seanpedrick-case in #127
- Removed reference to model name in summarisation task
Full Changelog: v1.7.0...v1.7.1
v.1.7.0
What's Changed
- Overall: Added LLM support for redaction and summarisation. GUI improvements with 'Walkthrough' redaction process. Efficient OCR option added with multithread and split text extraction between visual OCR and simple text extraction. Various bug fixes. by @seanpedrick-case in #120
- Added LLM support for entity detection and document summarisation. Should work with AWS Bedrock, local transformers LLMs, or an inference server URL for e.g. Llama.cpp or VLLM. Relevant config variables in tools/config.py: SHOW_AWS_BEDROCK_LLM_MODELS, SHOW_TRANSFORMERS_LLM_PII_DETECTION_OPTIONS, SHOW_INFERENCE_SERVER_PII_OPTIONS, SHOW_SUMMARISATION
- Added efficient redaction OCR option - falls back to simple text extraction for pages with selectable text, then uses visual OCR for the remainder. Relevant config variable in tools/config.py: EFFICIENT_OCR. by @seanpedrick-case in #123
- Walkthrough 'simple' front page GUI tab added. Relevant config variable in tools/config.py: SHOW_QUICKSTART
- Streamlined GUI for reviewing files and duplicate detection/summaries
- Fix on cli_redact function for passing in new parameters.
- Various bug fixes throughout code to fix minor issues.
Full Changelog: v1.6.7...v1.7.0
v1.6.7
What's Changed
- Corrected base href references and proxy headers for FastAPI implementation by @seanpedrick-case in #118
Full Changelog: v1.6.6...v1.6.7
v1.6.6
What's Changed
- Corrected input image creation location so that output redaction pdfs have coordinates correctly placed by @seanpedrick-case in #116
Full Changelog: V1.6.5...v1.6.6
V1.6.5
What's Changed
- Added save to s3 option to cli_redact by @seanpedrick-case in #114
Full Changelog: v1.6.4...V1.6.5
v1.6.4
What's Changed
- AWS Textract extraction now has an option to split punctuation at the end of words from the main body of the word. Corrected textract bounding box image outputs by @seanpedrick-case in #112
- Added Qwen VL 235B-A22B as a possible model option in Transformers
Full Changelog: v1.6.3...v1.6.4
v1.6.3
What's Changed
- CLI / direct mode should now accept s3-based paths as inputs for files. Minor fixes to test code file encoding by @seanpedrick-case in #110
Full Changelog: v1.6.2...v1.6.3