Skip to content

Commit ea34fbe

Browse files
authored
Merge pull request #2 from Infinitode/feat/custom-profanity-lists
feat: Implement custom profanity lists and file loading
2 parents 2d7709d + 5066fda commit ea34fbe

File tree

4 files changed

+432
-47
lines changed

4 files changed

+432
-47
lines changed

README.md

Lines changed: 111 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -113,25 +113,129 @@ Below is a complete list of all the available supported languages for ValX's pro
113113

114114
## Usage
115115

116-
### Detect Profanity
116+
### Profanity Detection and Removal
117+
118+
ValX allows for flexible profanity filtering using built-in language lists, custom word lists (provided as Python lists or loaded from files), or a combination of both.
119+
120+
**1. Basic Profanity Detection (Built-in Language)**
117121

118122
```python
119123
from valx import detect_profanity
120124

121-
# Detect profanity
125+
sample_text = ["This is some fuck and porn text."]
126+
# Detect profanity using the English list
122127
results = detect_profanity(sample_text, language='English')
123-
print("Profanity Evaluation Results", results)
128+
# results will be:
129+
# [
130+
# {'Line': 1, 'Column': 14, 'Word': 'fuck', 'Language': 'English'},
131+
# {'Line': 1, 'Column': 23, 'Word': 'porn', 'Language': 'English'}
132+
# ]
133+
print(results)
134+
```
135+
136+
**2. Profanity Detection with a Custom Word List (Python List)**
137+
138+
You can provide your own list of words to filter.
139+
140+
```python
141+
from valx import detect_profanity
142+
143+
sample_text = ["This contains custombadword1 and also asshole from English list."]
144+
my_custom_words = ["custombadword1", "anothercustom"]
145+
146+
# Option A: Custom list ONLY (language=None)
147+
results_custom_only = detect_profanity(sample_text, language=None, custom_words_list=my_custom_words)
148+
# results_custom_only will detect "custombadword1" with Language: "Custom"
149+
# [{'Line': 1, 'Column': 15, 'Word': 'custombadword1', 'Language': 'Custom'}]
150+
print(results_custom_only)
151+
152+
# Option B: Custom list COMBINED with a built-in language
153+
results_custom_plus_english = detect_profanity(sample_text, language="English", custom_words_list=my_custom_words)
154+
# results_custom_plus_english will detect "custombadword1" and "asshole"
155+
# Language will be "Custom + English"
156+
# [
157+
# {'Line': 1, 'Column': 15, 'Word': 'custombadword1', 'Language': 'Custom + English'},
158+
# {'Line': 1, 'Column': 43, 'Word': 'asshole', 'Language': 'Custom + English'}
159+
# ]
160+
print(results_custom_plus_english)
124161
```
125162

126-
### Remove Profanity
163+
**3. Loading Custom Profanity Words from a File**
164+
165+
ValX provides a helper function to load words from a text file (one word per line, '#' for comments).
127166

128167
```python
129-
from valx import remove_profanity
168+
from valx import detect_profanity, load_custom_profanity_from_file
169+
170+
# Assume 'my_profanity_file.txt' contains:
171+
# customfileword1
172+
# # this is a comment
173+
# customfileword2
174+
175+
custom_words_from_file = load_custom_profanity_from_file("my_profanity_file.txt")
176+
# custom_words_from_file will be: ['customfileword1', 'customfileword2']
177+
178+
sample_text_for_file = ["Text with customfileword1 and built-in shit."]
179+
180+
# Use file-loaded list with English built-in list
181+
results_file_plus_english = detect_profanity(
182+
sample_text_for_file,
183+
language="English",
184+
custom_words_list=custom_words_from_file
185+
)
186+
# Detects "customfileword1" and "shit", Language: "Custom + English"
187+
print(results_file_plus_english)
188+
189+
# Use file-loaded list ONLY
190+
results_file_only = detect_profanity(
191+
sample_text_for_file,
192+
language=None, # Important: set language to None
193+
custom_words_list=custom_words_from_file
194+
)
195+
# Detects only "customfileword1", Language: "Custom"
196+
print(results_file_only)
197+
```
198+
199+
**Output Format for `detect_profanity`**
200+
201+
The `detect_profanity` function returns a list of dictionaries. Each dictionary includes:
202+
- `"Line"`: The line number (1-indexed).
203+
- `"Column"`: The column number (1-indexed) where the profanity starts.
204+
- `"Word"`: The detected profanity word.
205+
- `"Language"`: Indicates the source of the word list:
206+
- `<LanguageName>` (e.g., "English"): If only a built-in language list was used.
207+
- `"Custom"`: If `language=None` and only a `custom_words_list` was used.
208+
- `"Custom + <LanguageName>"` (e.g., "Custom + English"): If both a built-in list and `custom_words_list` were used.
209+
- `"Custom + All"`: If `language='All'` and `custom_words_list` were used.
210+
130211

131-
# Remove profanity
132-
removed = remove_profanity(sample_text, "text_cleaned.txt", language="English")
212+
**4. Removing Profanity**
213+
214+
`remove_profanity` works similarly, accepting `language` and `custom_words_list` parameters.
215+
216+
```python
217+
from valx import remove_profanity, load_custom_profanity_from_file
218+
219+
sample_text = ["This is fuck, custombadword1, and text with customfileword1."]
220+
my_custom_words = ["custombadword1"]
221+
custom_words_from_file = load_custom_profanity_from_file("my_profanity_file.txt") # Assuming it contains 'customfileword1'
222+
223+
# Remove profanity using English built-in + my_custom_words + custom_words_from_file
224+
all_custom_words = list(set(my_custom_words + custom_words_from_file)) # Combine and unique
225+
226+
cleaned_text = remove_profanity(
227+
sample_text,
228+
output_file="cleaned_output.txt", # Optional: saves to file
229+
language="English",
230+
custom_words_list=all_custom_words
231+
)
232+
# cleaned_text will have "fuck", "custombadword1", and "customfileword1" replaced with "bad word".
233+
# e.g., ["This is bad word, bad word, and text with bad word."]
234+
print(cleaned_text)
133235
```
134236

237+
The `load_profanity_words` function (used internally) also accepts `language` and `custom_words_list` if you need direct access to the word lists.
238+
135239
### Detect Sensitive Information
136240

137241
```python

custom_profanity.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# This is a custom profanity list for testing
2+
custombadword1
3+
supersecretcurse
4+
anotherone
5+
6+
# Test empty lines and comments below
7+
8+
#anothercomment
9+
testwordalpha
10+
testwordbeta

0 commit comments

Comments
 (0)