@@ -113,25 +113,129 @@ Below is a complete list of all the available supported languages for ValX's pro
113113
114114## Usage
115115
116- ### Detect Profanity
116+ ### Profanity Detection and Removal
117+
118+ ValX allows for flexible profanity filtering using built-in language lists, custom word lists (provided as Python lists or loaded from files), or a combination of both.
119+
120+ ** 1. Basic Profanity Detection (Built-in Language)**
117121
118122``` python
119123from valx import detect_profanity
120124
121- # Detect profanity
125+ sample_text = [" This is some fuck and porn text." ]
126+ # Detect profanity using the English list
122127results = detect_profanity(sample_text, language = ' English' )
123- print (" Profanity Evaluation Results" , results)
128+ # results will be:
129+ # [
130+ # {'Line': 1, 'Column': 14, 'Word': 'fuck', 'Language': 'English'},
131+ # {'Line': 1, 'Column': 23, 'Word': 'porn', 'Language': 'English'}
132+ # ]
133+ print (results)
134+ ```
135+
136+ ** 2. Profanity Detection with a Custom Word List (Python List)**
137+
138+ You can provide your own list of words to filter.
139+
140+ ``` python
141+ from valx import detect_profanity
142+
143+ sample_text = [" This contains custombadword1 and also asshole from English list." ]
144+ my_custom_words = [" custombadword1" , " anothercustom" ]
145+
146+ # Option A: Custom list ONLY (language=None)
147+ results_custom_only = detect_profanity(sample_text, language = None , custom_words_list = my_custom_words)
148+ # results_custom_only will detect "custombadword1" with Language: "Custom"
149+ # [{'Line': 1, 'Column': 15, 'Word': 'custombadword1', 'Language': 'Custom'}]
150+ print (results_custom_only)
151+
152+ # Option B: Custom list COMBINED with a built-in language
153+ results_custom_plus_english = detect_profanity(sample_text, language = " English" , custom_words_list = my_custom_words)
154+ # results_custom_plus_english will detect "custombadword1" and "asshole"
155+ # Language will be "Custom + English"
156+ # [
157+ # {'Line': 1, 'Column': 15, 'Word': 'custombadword1', 'Language': 'Custom + English'},
158+ # {'Line': 1, 'Column': 43, 'Word': 'asshole', 'Language': 'Custom + English'}
159+ # ]
160+ print (results_custom_plus_english)
124161```
125162
126- ### Remove Profanity
163+ ** 3. Loading Custom Profanity Words from a File**
164+
165+ ValX provides a helper function to load words from a text file (one word per line, '#' for comments).
127166
128167``` python
129- from valx import remove_profanity
168+ from valx import detect_profanity, load_custom_profanity_from_file
169+
170+ # Assume 'my_profanity_file.txt' contains:
171+ # customfileword1
172+ # # this is a comment
173+ # customfileword2
174+
175+ custom_words_from_file = load_custom_profanity_from_file(" my_profanity_file.txt" )
176+ # custom_words_from_file will be: ['customfileword1', 'customfileword2']
177+
178+ sample_text_for_file = [" Text with customfileword1 and built-in shit." ]
179+
180+ # Use file-loaded list with English built-in list
181+ results_file_plus_english = detect_profanity(
182+ sample_text_for_file,
183+ language = " English" ,
184+ custom_words_list = custom_words_from_file
185+ )
186+ # Detects "customfileword1" and "shit", Language: "Custom + English"
187+ print (results_file_plus_english)
188+
189+ # Use file-loaded list ONLY
190+ results_file_only = detect_profanity(
191+ sample_text_for_file,
192+ language = None , # Important: set language to None
193+ custom_words_list = custom_words_from_file
194+ )
195+ # Detects only "customfileword1", Language: "Custom"
196+ print (results_file_only)
197+ ```
198+
199+ ** Output Format for ` detect_profanity ` **
200+
201+ The ` detect_profanity ` function returns a list of dictionaries. Each dictionary includes:
202+ - ` "Line" ` : The line number (1-indexed).
203+ - ` "Column" ` : The column number (1-indexed) where the profanity starts.
204+ - ` "Word" ` : The detected profanity word.
205+ - ` "Language" ` : Indicates the source of the word list:
206+ - ` <LanguageName> ` (e.g., "English"): If only a built-in language list was used.
207+ - ` "Custom" ` : If ` language=None ` and only a ` custom_words_list ` was used.
208+ - ` "Custom + <LanguageName>" ` (e.g., "Custom + English"): If both a built-in list and ` custom_words_list ` were used.
209+ - ` "Custom + All" ` : If ` language='All' ` and ` custom_words_list ` were used.
210+
130211
131- # Remove profanity
132- removed = remove_profanity(sample_text, " text_cleaned.txt" , language = " English" )
212+ ** 4. Removing Profanity**
213+
214+ ` remove_profanity ` works similarly, accepting ` language ` and ` custom_words_list ` parameters.
215+
216+ ``` python
217+ from valx import remove_profanity, load_custom_profanity_from_file
218+
219+ sample_text = [" This is fuck, custombadword1, and text with customfileword1." ]
220+ my_custom_words = [" custombadword1" ]
221+ custom_words_from_file = load_custom_profanity_from_file(" my_profanity_file.txt" ) # Assuming it contains 'customfileword1'
222+
223+ # Remove profanity using English built-in + my_custom_words + custom_words_from_file
224+ all_custom_words = list (set (my_custom_words + custom_words_from_file)) # Combine and unique
225+
226+ cleaned_text = remove_profanity(
227+ sample_text,
228+ output_file = " cleaned_output.txt" , # Optional: saves to file
229+ language = " English" ,
230+ custom_words_list = all_custom_words
231+ )
232+ # cleaned_text will have "fuck", "custombadword1", and "customfileword1" replaced with "bad word".
233+ # e.g., ["This is bad word, bad word, and text with bad word."]
234+ print (cleaned_text)
133235```
134236
237+ The ` load_profanity_words ` function (used internally) also accepts ` language ` and ` custom_words_list ` if you need direct access to the word lists.
238+
135239### Detect Sensitive Information
136240
137241``` python
0 commit comments