This repository contains scripts to process and clean Reddit submissions and comments, grouping them by each day and converting them into structured JSON format.
- Cleans and structures Reddit submissions and comments.
- Groups data by date.
- Handles large datasets efficiently.
- Outputs well-formatted JSON files.
- Clone the repository
git clone https://github.com/VISHNUDAS-tunerlabs/Reddis-cleaner.git cd Reddis-cleaner
- Install dependencies
npm install
- Place your input JSON file inside the
dataInput
folder. - Run the following command:
node submissionDataCleanUp.js
- The processed JSON will be saved inside the
redditOutput
folder.
- Place your input JSON file inside the
commentDataInput
folder. - Run the following command:
node commentDataCleanUp.js
- The processed JSON will be saved inside the
redditCommentOutput
folder.
root
│── commentDataInput/ # Input folder for Reddit comments
│── dataInput/ # Input folder for Reddit submissions
│── redditOutput/ # Processed submissions output
│── redditCommentOutput/ # Processed comments output
│── commentDataCleanUp.js # Script to clean Reddit comments
│── submissionDataCleanUp.js # Script to clean Reddit submissions
│── node_modules/ # Dependencies (ignored by Git)
│── package.json # Project dependencies and scripts
│── .gitignore # Git ignore settings
│── README.md # Project documentation
Contributions are welcome! Feel free to fork this repository and submit a pull request.
This project is licensed under the MIT License. See LICENSE for details.