Skip to content

kaustubh-2007/KAIZEN-S-SPS-PROJECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧹 Data Cleaning & Preprocessing Toolkit (Bash Script)

A powerful command-line data cleaning toolkit written in Bash.
This script provides 14 essential utilities to clean, preprocess, transform, and analyze text and CSV data.
Ideal for students, data analysts, and shell-scripting projects.


✨ Features Included

1. Basic Cleaning

  • Trim extra spaces

  • Convert text to lowercase

  • Remove empty lines

    2. Remove Duplicates & Sort

  • Sort lines

  • Remove duplicate entries

    3. Remove Special Characters

  • Keep only A–Z, a–z, 0–9, and spaces

    4. Remove Stopwords

  • Remove common English stopwords using stopwords.txt

    5. Clean a CSV Column

  • Trim spaces and convert a specific column to lowercase

    6. Show File Statistics

  • Total lines

  • Unique lines

  • Total words

  • Total characters

  • Longest line length

    7. Extract Numbers Only

  • Extract all numeric values

    8. Extract Emails Only

  • Extract valid email addresses

    9. Word Frequency Count

  • Shows words with frequency (sorted descending)

    10. Replace a Word

  • Replace a selected word with another word

    11. Merge Two Files

  • Combine two files in order

    12. Show First N Lines

  • Uses head

    13. Show Last N Lines

  • Uses tail

    14. Exit

  • End the tool


📁 Required Files in the Project

About

This is the shell script for cleaning ,formatting ,and preparing raw text data using standard Linux command-line utilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages