🧹 Data Cleaning & Preprocessing Toolkit (Bash Script)
A powerful command-line data cleaning toolkit written in Bash.
This script provides 14 essential utilities to clean, preprocess, transform, and analyze text and CSV data.
Ideal for students, data analysts, and shell-scripting projects.
✨ Features Included
1. Basic Cleaning
-
Trim extra spaces
-
Convert text to lowercase
-
Remove empty lines
2. Remove Duplicates & Sort
-
Sort lines
-
Remove duplicate entries
3. Remove Special Characters
-
Keep only A–Z, a–z, 0–9, and spaces
4. Remove Stopwords
-
Remove common English stopwords using
stopwords.txt5. Clean a CSV Column
-
Trim spaces and convert a specific column to lowercase
6. Show File Statistics
-
Total lines
-
Unique lines
-
Total words
-
Total characters
-
Longest line length
7. Extract Numbers Only
-
Extract all numeric values
8. Extract Emails Only
-
Extract valid email addresses
9. Word Frequency Count
-
Shows words with frequency (sorted descending)
10. Replace a Word
-
Replace a selected word with another word
11. Merge Two Files
-
Combine two files in order
12. Show First N Lines
-
Uses
head13. Show Last N Lines
-
Uses
tail14. Exit
-
End the tool
📁 Required Files in the Project