Skip to content

Commit 14b1fea

Browse files
committed
add changelog entry for clean_text_dump.py
1 parent 7ae5fde commit 14b1fea

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,13 @@ For each PR made, an entry should be added to this changelog. It should contain
1212
- etc.
1313

1414
## Changelog
15+
### 3.1.??
16+
- 1232-process-the-full-text-dump
17+
- Description: A script was added `/scripts/sde_dump_processing/clean_text_dump.py` which cleans dumps from sinequa. The sinequa dump does not respect normal csv new line formatting, so that a dump of 1.8 million records becomes a csv of 900 million lines. This script can detect the headers and process the dump with the three possible sources TDAMM, SDE, and scripts, in order to create a final, clean csv. It has a simple CLI which allows setting the input and output, the verbosity of the logs, etc. Because the input files can be very large, the script streams them instead of holding them in memory.
18+
- Changes:
19+
- add file /scripts/sde_dump_processing/clean_text_dump.py`
1520

21+
### 3.1.0
1622
- 1209-bug-fix-document-type-creator-form
1723
- Description: The dropdown on the pattern creation form needs to be set as multi as the default option since this is why the doc type creator form is used for the majority of multi-URL pattern creations. This should be applied to doc types, division types, and titles as well.
1824
- Changes:

0 commit comments

Comments
 (0)