Easily convert and manipulate Concordance .DAT
load files β perfect for legal e-discovery, metadata extraction, and bulk processing.
A powerful Python CLI tool designed to handle complex .DAT
files with custom delimiters (ΓΎ
, control characters), broken encodings, and Excel-incompatible data.
This tool can:
- β
Convert
.DAT
to.CSV
- π Compare two
.DAT
files (with optional field mapping) - π§ Replace or remap headers
- π Merge multiple
.DAT
files intelligently - π§Ή Delete rows based on field values
- π― Extract and export selected fields
- Handles Concordance
.DAT
files with embedded line breaks - Supports various encodings:
UTF-8
,UTF-16
,Windows-1252
, and more - Robust parsing even with Excelβs 32,767 character cell limit
- CLI-first design β ideal for automation and scripting
- Legal eDiscovery processing
- Metadata cleanup and normalization
- Custom conversions and field extraction
- Comparing vendor-delivered load files
π₯ Download EXE
git clone https://github.com/yourusername/dat-file-tool.git
cd dat-file-tool
This tool uses only built-in libraries β no external packages required!
- β
Convert
.dat
to.csv
or keep as.dat
- π Compare two
.dat
files (with optional header mapping) - π§Ή Delete specific rows from
.dat
using a value list - π Merge
.dat
files by common headers - π€ Auto-detect encoding (UTF-8, UTF-16, Windows-1252, Latin-1)
- π¬ Smart line reader handles embedded newlines and quoted fields
- π Output directory support via
-o DIR
β οΈ Excel field-length warning for long text fields (>32,767 chars)- π― Select only specific fields from a DAT file using
--select
Feature | Description |
---|---|
--csv |
Export DAT file to CSV format (Comma Separated Value) |
--tsv |
Export DAT file to TSV format (Tab Separated Value) |
--dat |
Export to DAT format (default if none specified) |
-c , --compare |
Compare two DAT files line-by-line |
-r , --replace-header |
Replace headers using a mapping file (old_header,new_header ) |
--merge |
Merge multiple DAT files grouped by matching headers |
--delete |
Delete rows based on field values listed in a file |
--select |
Export only selected fields from the DAT file |
-o DIR |
Specify output directory for generated files |
python Main.py input.dat --csv
# Output: input_converted.csv
python Main.py input.dat --tsv
# Output: input_converted.tsv
You can also specify custom output paths:
python Main.py input.dat --csv output.csv
python Main.py file1.dat file2.dat -c
With optional header mapping:
python Main.py file1.dat file2.dat -c -m mapping.csv
Outputs differences to value_diff.csv
(or .dat
).
Create a mapping file (mapping.csv
) like:
OldHeader1,NewHeader1
OldHeader2,NewHeader2
Run:
python Main.py input.dat -r mapping.csv --csv
# Output: input_Replaced.csv
Create a merge list file (merge_list.csv
) containing one file path per line:
file1.dat
file2.dat
file3.dat
Then run:
python Main.py --merge merge_list.csv
# Outputs: merged_group_1.dat, merged_group_2.dat, etc.
Also creates a log file: merged_group_log.csv
.
Create a delete file (delete.csv
) with the field name on the first line and values to delete below:
ID
1001
1003
1007
Run:
python Main.py input.dat --delete delete.csv --csv
# Outputs: input{kept}.csv and input{removed}.csv
Create a selection file (select.txt
) with one header per line:
Name
Age
City
Run:
python Main.py input.dat --select select.txt --csv
# Output: input_selected.csv
Flag | Description |
---|---|
-o DIR |
Set output directory |
--help |
Show help message |
- All exports go to the directory specified by
-o
, or default to the input file's folder. - Output filenames include tags like
{kept}
,{removed}
, or_Replaced
.
Handles:
- UTF-8 BOM
- UTF-16 LE/BE
- Windows-1252 (via printable category)
- Latin-1 fallback
Warns if any field exceeds Excel's max cell limit (32,767 chars).
- Python 3.7+
- No external libraries (uses standard library only)
Add .vscode/launch.json
:
{
"name": "Debug Merge Example",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/Main.py",
"console": "integratedTerminal",
"args": [
"--merge", "File_list.csv", "--csv", "-o", "merged/"
]
}
Feel free to fork, enhance, or report issues! Contributions are welcome π¬
Md Ehsan Ahsan π§ MyGitHub π οΈ Built with love using Python π
This tool is provided as-is without any warranties.
Use it at your own risk.
I am not responsible if it eats your files, breaks your computer, or ruins your spreadsheet.π But Hey, if it helps you automate the boring stuff β you're welcome! π
This project is free to use under the MIT License.