Skip to content

Commit de5bca0

Browse files
authored
Merge pull request #39 from brootware/dev
Dev
2 parents d4bfebb + f8f2bfc commit de5bca0

File tree

16 files changed

+813
-261
lines changed

16 files changed

+813
-261
lines changed

.flake8

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[flake8]
2+
max-line-length = 300

.github/workflows/ci.yml

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ on:
55
branches: [dev]
66
pull_request:
77
branches: [main]
8+
types: [opened, synchronize, reopened]
89

910
permissions:
1011
contents: read
@@ -17,6 +18,8 @@ jobs:
1718
python-version: ["3.8", "3.9", "3.10"]
1819
steps:
1920
- uses: actions/checkout@v2
21+
with:
22+
fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
2023

2124
- name: Set up Python ${{ matrix.python-version }}
2225
uses: actions/setup-python@v2
@@ -25,22 +28,24 @@ jobs:
2528

2629
- name: Install dependencies
2730
run: |
28-
python -m pip install --upgrade pip
2931
python -m pip install --upgrade poetry
30-
pip install flake8 pytest nltk numpy
3132
poetry install
32-
python tools/install_nltk_popular.py
3333
3434
- name: Lint with flake8
3535
run: |
3636
# stop the build if there are Python syntax errors or undefined names
37-
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
37+
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
3838
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
39-
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
39+
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
4040
41-
- name: Test with pytest
42-
run: |
43-
pytest
41+
- name: Run tox with pytest
42+
run: poetry run tox -e py
43+
44+
- name: SonarCloud Scan
45+
uses: SonarSource/sonarcloud-github-action@master
46+
env:
47+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
48+
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
4449

4550
# - name: Build and push Docker Image
4651
# uses: mr-smithers-excellent/docker-build-push@v5

README.md

Lines changed: 26 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121

2222
Redacts and Unredacts the following from your text files. 📄 ✍️
2323

24-
- names 👤
2524
- sg nric 🆔
2625
- credit cards 🏧
2726
- domain names 🌐
@@ -46,6 +45,12 @@ Quick install
4645
python -m pip install pyredactkit
4746
```
4847

48+
Redact from terminal
49+
50+
```bash
51+
pyredactkit 'this is my ip:127.0.0.1. my email is [email protected]. secret link is github.com'
52+
```
53+
4954
Redact a single file
5055

5156
```bash
@@ -55,10 +60,16 @@ pyredactkit test.txt
5560
Unredact the file
5661

5762
```bash
58-
pyredactkit redacted_test.txt -u .hashshadow_test.txt.json
63+
pyredactkit -f redacted_test.txt -u .hashshadow_test.txt.json
5964
```
6065

61-
Install nltk data for redacting names
66+
Redact using custom regex pattern
67+
68+
```bash
69+
pyredactkit -f file -c custom.json
70+
```
71+
72+
<!-- Install nltk data for redacting names
6273
6374
```bash
6475
python -c "import nltk
@@ -74,11 +85,11 @@ else:
7485
nltk.download('popular')"
7586
```
7687
77-
Redact names from text file
88+
Redact names from a text file
7889
7990
```bash
8091
pyredactkit test.txt -t name
81-
```
92+
``` -->
8293

8394
### Use from github source
8495

@@ -190,12 +201,6 @@ e0b66cbd-6174-4491-b938-408a47d38fb9,Platinum,142000,CC90518
190201
24f31233-cba6-4f6a-a2d6-0ce49952b2cb,Premium,781000,CC66746
191202
```
192203
193-
To redact specific type of data. E.g (name)
194-
195-
```bash
196-
poetry run pyredactkit test.txt -t name
197-
```
198-
199204
Sample result:
200205
201206
```txt
@@ -217,7 +222,7 @@ My router is: 10.10.10.1
217222
71.159.188.33
218223
```
219224
220-
To redact multiple files from a directory and place it in a new directory
225+
To redact multiple files from a directory and place them in a new directory
221226
222227
```bash
223228
poetry run pyredactkit dir_test -d redacted_dir
@@ -226,28 +231,28 @@ poetry run pyredactkit dir_test -d redacted_dir
226231
## Optional Help Menu as below
227232
228233
```bash
229-
usage: pyredactkit [-h] [-u UNREDACT] [-t REDACTIONTYPE] [-d DIROUT] [-r] [-e EXTENSION] file [file ...]
234+
usage: pyredactkit [-h] [-f FILE [FILE ...]] [-u UNREDACT] [-d DIROUT] [-r] [-e EXTENSION] [text ...]
230235

231-
Read in a file or set of files, and return the result.
236+
Supply a sentence or paragraph to redact sensitive data from it. Or read in a file or set of files with -f , and return the result.
232237

233238
positional arguments:
234-
file Path of a file or a directory of files. Usage: pyredactkit [file/filestoredact]
239+
text Redact sensitive data of a sentence from command prompt. (default: None)
235240

236241
optional arguments:
237242
-h, --help show this help message and exit
243+
-f FILE [FILE ...], --file FILE [FILE ...]
244+
Path of a file or a directory of files. Usage: pyredactkit [file/filestoredact] (default: None)
238245
-u UNREDACT, --unredact UNREDACT
239-
Option to unredact masked data. Usage: pyredactkit [redacted_file] -u [.hashshadow.json] (default: None)
240-
-t REDACTIONTYPE, --redactiontype REDACTIONTYPE
241-
Type of data to redact. names, nric, dns, emails, ipv4, ipv6, base64. Usage: pyredactkit [file/filestoredact] -t ip (default: None)
246+
Option to unredact masked data. Usage: pyredactkit -f [redacted_file] -u [.hashshadow.json] (default: None)
242247
-d DIROUT, --dirout DIROUT
243-
Output directory of the file. Usage: pyredactkit [file/filestoredact] -d [redacted_dir] (default: None)
248+
Output directory of the file. Usage: pyredactkit -f [file/filestoredact] -d [redacted_dir] (default: None)
244249
-r, --recursive Search through subfolders (default: True)
245250
-e EXTENSION, --extension EXTENSION
246251
File extension to filter by. (default: )
247252
```
248253
249254
## Sample files
250255
251-
- [All types of data](https://raw.githubusercontent.com/brootware/PyRedactKit/main/test/test.txt)
252-
- [itcont.txt - 4GB uncompressed](https://sanitizationbq.s3.ap-southeast-1.amazonaws.com/itcont.tar.gz)
256+
- [All types of data](./logdata/test.txt)
257+
- [Differnt log file types](./logdata/)
253258
- [test_sample2.txt - 10002 lines of IP addresses](https://sanitizationbq.s3.ap-southeast-1.amazonaws.com/test_sample2.txt)

custom.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[
2+
{
3+
"pattern": "(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)",
4+
"type": ["ip", "ipv4"]
5+
},
6+
{
7+
"pattern": "([a-z0-9!#$%&'*+/=?^_`{|.}~-]+@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)",
8+
"type": ["email", "emails"]
9+
},
10+
{
11+
"pattern": "^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$",
12+
"type": ["base64", "b64"]
13+
}
14+
]

logdata/emailsecrets.log

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
this is my IP: 102.23.5.1
2+
My router is : 10.10.10.1
3+
71.159.188.33
4+
81.141.167.45
5+
165.65.59.139
6+
64.248.67.225
7+
8+
https://tech.gov.sg
9+
10+
My emails are
11+
12+
13+
14+
15+
16+
this is my IP: 102.23.5.1
17+
My router is: 10.10.10.1
18+
71.159.188.33
19+
81.141.167.45
20+
165.65.59.139
21+
64.248.67.225
22+
23+
Base64 data
24+
QVBJX1RPS0VO
25+
UzNjcjN0UGFzc3dvcmQ=
26+
U3VwM3JTM2NyZXRQQHNzd29yZA==
27+
WwogICAgICAgIHsKICAgICAgICAgICAgInBhdHRlcm4iOiAiKFthLXowLTkhIyQlJicqK1wvPT9eX2B7fC59fi1dK0AoPzpbYS16MC05XSg/OlthLXowLTktXSpbYS16MC05XSk/XC4pK1thLXowLTldKD86W2EtejAtOS1dKlthLXowLTldKT8pIiwKICAgICAgICAgICAgInR5cGUiOiAiZW1haWxzIiwKICAgICAgICAgICAgInBvc2l0aW9uIjogImlkX29iamVjdC5yZWdleGVzWzBdWydwYXR0ZXJuJ10iCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAgICJwYXR0ZXJuIjogJyg/aSkoKD86aHR0cHM/Oi8vfHd3d1xkezAsM31bLl0pP1thLXowLTkuXC1dK1suXSg/Oig/OmludGVybmF0aW9uYWwpfCg/OmNvbnN0cnVjdGlvbil8KD86Y29udHJhY3RvcnMpfCg/OmVudGVycHJpc2VzKXwoPzpwaG90b2dyYXBoeSl8KD86aW1tb2JpbGllbil8KD86bWFuYWdlbWVudCl8KD86dGVjaG5vbG9neSl8KD86ZGlyZWN0b3J5KXwoPzplZHVjYXRpb24pfCg/OmVxdWlwbWVudCl8KD86aW5zdGl0dXRlKXwoPzptYXJrZXRpbmcpfCg/OnNvbHV0aW9ucyl8KD86YnVpbGRlcnMpfCg/OmNsb3RoaW5nKXwoPzpjb21wdXRlcil8KD86ZGVtb2NyYXQpfCg/OmRpYW1vbmRzKXwoPzpncmFwaGljcyl8KD86aG9sZGluZ3MpfCg/OmxpZ2h0aW5nKXwoPzpwbHVtYmluZyl8KD86dHJhaW5pbmcpfCg/OnZlbnR1cmVzKXwoPzphY2FkZW15KXwoPzpjYXJlZXJzKXwoPzpjb21wYW55KXwoPzpkb21haW5zKXwoPzpmbG9yaXN0KXwoPzpnYWxsZXJ5KXwoPzpndWl0YXJzKXwoPzpob2xpZGF5KXwoPzpraXRjaGVuKXwoPzpyZWNpcGVzKXwoPzpzaGlrc2hhKXwoPzpzaW5nbGVzKXwoPzpzdXBwb3J0KXwoPzpzeXN0ZW1zKXwoPzphZ2VuY3kpfCg/OmJlcmxpbil8KD86Y2FtZXJhKXwoPzpjZW50ZXIpfCg/OmNvZmZlZSl8KD86ZXN0YXRlKXwoPzprYXVmZW4pfCg/Omx1eHVyeSl8KD86bW9uYXNoKXwoPzptdXNldW0pfCg/OnBob3Rvcyl8KD86cmVwYWlyKXwoPzpzb2NpYWwpfCg/OnRhdHRvbyl8KD86dHJhdmVsKXwoPzp2aWFqZXMpfCg/OnZveWFnZSl8KD86YnVpbGQpfCg/OmNoZWFwKXwoPzpjb2Rlcyl8KD86ZGFuY2UpfCg/OmVtYWlsKXwoPzpnbGFzcyl8KD86aG91c2UpfCg/Om5pbmphKXwoPzpwaG90byl8KD86c2hvZXMpfCg/OnNvbGFyKXwoPzp0b2RheSl8KD86YWVybyl8KD86YXJwYSl8KD86YXNpYSl8KD86YmlrZSl8KD86YnV6eil8KD86Y2FtcCl8KD86Y2x1Yil8KD86Y29vcCl8KD86ZmFybSl8KD86Z2lmdCl8KD86Z3VydSl8KD86aW5mbyl8KD86am9icyl8KD86a2l3aSl8KD86bGFuZCl8KD86bGltbyl8KD86bGluayl8KD86bWVudSl8KD86bW9iaSl8KD86bW9kYSl8KD86bmFtZSl8KD86cGljcyl8KD86cGluayl8KD86cG9zdCl8KD86cmljaCl8KD86cnVocil8KD86c2V4eSl8KD86dGlwcyl8KD86d2FuZyl8KD86d2llbil8KD86em9uZSl8KD86Yml6KXwoPzpjYWIpfCg/OmNhdCl8KD86Y2VvKXwoPzpjb20pfCg/OmVkdSl8KD86Z292KXwoPzppbnQpfCg/Om1pbCl8KD86bmV0KXwoPzpvbmwpfCg/Om9yZyl8KD86cHJvKXwoPzpyZWQpfCg/OnRlbCl8KD86dW5vKXwoPzp4eHgpfCg/OmFjKXwoPzphZCl8KD86YWUpfCg/OmFmKXwoPzphZyl8KD86YWkpfCg/OmFsKXwoPzphbSl8KD86YW4pfCg/OmFvKXwoPzphcSl8KD86YXIpfCg/OmFzKXwoPzphdCl8KD86YXUpfCg/OmF3KXwoPzpheCl8KD86YXopfCg/OmJhKXwoPzpiYil8KD86YmQpfCg/OmJlKXwoPzpiZil8KD86YmcpfCg/OmJoKXwoPzpiaSl8KD86YmopfCg/OmJtKXwoPzpibil8KD86Ym8pfCg/OmJyKXwoPzpicyl8KD86YnQpfCg/OmJ2KXwoPzpidyl8KD86YnkpfCg/OmJ6KXwoPzpjYSl8KD86Y2MpfCg/OmNkKXwoPzpjZil8KD86Y2cpfCg/OmNoKXwoPzpjaSl8KD86Y2spfCg/OmNsKXwoPzpjbSl8KD86Y24pfCg/OmNvKXwoPzpjcil8KD86Y3UpfCg/OmN2KXwoPzpjdyl8KD86Y3gpfCg/OmN5KXwoPzpjeil8KD86ZGUpfCg/OmRqKXwoPzpkayl8KD86ZG0pfCg/OmRvKXwoPzpkeil8KD86ZWMpfCg/OmVlKXwoPzplZyl8KD86ZXIpfCg/OmVzKXwoPzpldCl8KD86ZXUpfCg/OmZpKXwoPzpmail8KD86ZmspfCg/OmZtKXwoPzpmbyl8KD86ZnIpfCg/OmdhKXwoPzpnYil8KD86Z2QpfCg/OmdlKXwoPzpnZil8KD86Z2cpfCg/OmdoKXwoPzpnaSl8KD86Z2wpfCg/OmdtKXwoPzpnbil8KD86Z3ApfCg/OmdxKXwoPzpncil8KD86Z3MpfCg/Omd0KXwoPzpndSl8KD86Z3cpfCg/Omd5KXwoPzpoayl8KD86aG0pfCg/OmhuKXwoPzpocil8KD86aHQpfCg/Omh1KXwoPzppZCl8KD86aWUpfCg/OmlsKXwoPzppbSl8KD86aW4pfCg/OmlvKXwoPzppcSl8KD86aXIpfCg/OmlzKXwoPzppdCl8KD86amUpfCg/OmptKXwoPzpqbyl8KD86anApfCg/OmtlKXwoPzprZyl8KD86a2gpfCg/OmtpKXwoPzprbSl8KD86a24pfCg/OmtwKXwoPzprcil8KD86a3cpfCg/Omt5KXwoPzpreil8KD86bGEpfCg/OmxiKXwoPzpsYyl8KD86bGkpfCg/OmxrKXwoPzpscil8KD86bHMpfCg/Omx0KXwoPzpsdSl8KD86bHYpfCg/Omx5KXwoPzptYSl8KD86bWMpfCg/Om1kKXwoPzptZSl8KD86bWcpfCg/Om1oKXwoPzptayl8KD86bWwpfCg/Om1tKXwoPzptbil8KD86bW8pfCg/Om1wKXwoPzptcSl8KD86bXIpfCg/Om1zKXwoPzptdCl8KD86bXUpfCg/Om12KXwoPzptdyl8KD86bXgpfCg/Om15KXwoPzpteil8KD86bmEpfCg/Om5jKXwoPzpuZSl8KD86bmYpfCg/Om5nKXwoPzpuaSl8KD86bmwpfCg/Om5vKXwoPzpucCl8KD86bnIpfCg/Om51KXwoPzpueil8KD86b20pfCg/OnBhKXwoPzpwZSl8KD86cGYpfCg/OnBnKXwoPzpwaCl8KD86cGspfCg/OnBsKXwoPzpwbSl8KD86cG4pfCg/OnByKXwoPzpwcyl8KD86cHQpfCg/OnB3KXwoPzpweSl8KD86cWEpfCg/OnJlKXwoPzpybyl8KD86cnMpfCg/OnJ1KXwoPzpydyl8KD86c2EpfCg/OnNiKXwoPzpzYyl8KD86c2QpfCg/OnNlKXwoPzpzZyl8KD86c2gpfCg/OnNpKXwoPzpzail8KD86c2spfCg/OnNsKXwoPzpzbSl8KD86c24pfCg/OnNvKXwoPzpzcil8KD86c3QpfCg/OnN1KXwoPzpzdil8KD86c3gpfCg/OnN5KXwoPzpzeil8KD86dGMpfCg/OnRkKXwoPzp0Zil8KD86dGcpfCg/OnRoKXwoPzp0ail8KD86dGspfCg/OnRsKXwoPzp0bSl8KD86dG4pfCg/OnRvKXwoPzp0cCl8KD86dHIpfCg/OnR0KXwoPzp0dil8KD86dHcpfCg/OnR6KXwoPzp1YSl8KD86dWcpfCg/OnVrKXwoPzp1cyl8KD86dXkpfCg/OnV6KXwoPzp2YSl8KD86dmMpfCg/OnZlKXwoPzp2Zyl8KD86dmkpfCg/OnZuKXwoPzp2dSl8KD86d2YpfCg/OndzKXwoPzp5ZSl8KD86eXQpfCg/OnphKXwoPzp6bSl8KD86encpKSg/Oi9bXlxzKCk8Pl0rW15cc2AhKClcW1xde307OlwnIi4sPD4/XHhhYlx4YmJcdTIwMWNcdTIwMWRcdTIwMThcdTIwMTldKT8pJywKICAgICAgICAgICAgInR5cGUiOiAiZG5zIiwKICAgICAgICAgICAgInBvc2l0aW9uIjogImlkX29iamVjdC5yZWdleGVzWzFdWydwYXR0ZXJuJ10iCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAgICJwYXR0ZXJuIjogIig/OjI1WzAtNV18MlswLTRdWzAtOV18WzAxXT9bMC05XVswLTldPylcLig/OjI1WzAtNV18MlswLTRdWzAtOV18WzAxXT9bMC05XVswLTldPylcLig/OjI1WzAtNV18MlswLTRdWzAtOV18WzAxXT9bMC05XVswLTldPylcLig/OjI1WzAtNV18MlswLTRdWzAtOV18WzAxXT9bMC05XVswLTldPykiLAogICAgICAgICAgICAidHlwZSI6ICJpcHY0IiwKICAgICAgICAgICAgInBvc2l0aW9uIjogImlkX29iamVjdC5yZWdleGVzWzJdWydwYXR0ZXJuJ10iCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAgICJwYXR0ZXJuIjogJygoPzooPzpcXGR7NH1bLSBdPyl7M31cXGR7NH18XFxkezE1LDE2fSkpKD8hW1xcZF0pJywKICAgICAgICAgICAgInR5cGUiOiAiY3JlZGl0X2NhcmRzIiwKICAgICAgICAgICAgInBvc2l0aW9uIjogImlkX29iamVjdC5yZWdleGVzWzNdWydwYXR0ZXJuJ10iCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAgICJwYXR0ZXJuIjogIltTVEZHXVxkezd9W0EtWl0iLAogICAgICAgICAgICAidHlwZSI6ICJTaW5nYXBvcmVfTlJJQyIsCiAgICAgICAgICAgICJwb3NpdGlvbiI6ICJpZF9vYmplY3QucmVnZXhlc1s0XVsncGF0dGVybiddIgogICAgICAgIH0sCiAgICAgICAgewogICAgICAgICAgICAicGF0dGVybiI6ICdccyooPyEuKjo6Lio6OikoPzooPyE6KXw6KD89OikpKD86WzAtOWEtZl17MCw0fSg/Oig/PD06Oil8KD88ITo6KTopKXs2fSg/OlswLTlhLWZdezAsNH0oPzooPzw9OjopfCg/PCE6Oik6KVswLTlhLWZdezAsNH0oPzooPzw9OjopfCg/PCE6KXwoPzw9OikoPzwhOjopOil8KD86MjVbMC00XXwyWzAtNF1cZHwxXGRcZHxbMS05XT9cZCkoPzpcLig/OjI1WzAtNF18MlswLTRdXGR8MVxkXGR8WzEtOV0/XGQpKXszfSlccyonLAogICAgICAgICAgICAidHlwZSI6ICJpcHY2IiwKICAgICAgICAgICAgInBvc2l0aW9uIjogImlkX29iamVjdC5yZWdleGVzWzVdWydwYXR0ZXJuJ10iCiAgICAgICAgfSwKICAgICAgICB7CiAgICAgICAgICAgICJwYXR0ZXJuIjogIi9eKD86W2EtekEtWjAtOStcL117NH0pKig/OnwoPzpbYS16QS1aMC05K1wvXXszfT0pfCg/OlthLXpBLVowLTkrXC9dezJ9PT0pfCg/OlthLXpBLVowLTkrXC9dezF9PT09KSkkLyIsCiAgICAgICAgICAgICJ0eXBlIjogImJhc2U2NCIsCiAgICAgICAgICAgICJwb3NpdGlvbiI6ICJpZF9vYmplY3QucmVnZXhlc1s2XVsncGF0dGVybiddIgogICAgICAgIH0KICAgIF0=

0 commit comments

Comments
 (0)