Skip to content

Commit c7b0225

Browse files
committed
README upd
1 parent 234f056 commit c7b0225

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,63 @@ reddit-scraper -d month -s subreddits.json -l 50
8686

8787
Data is saved in JSON files under the `data/` directory, one file per subreddit.
8888

89+
### Output Format
90+
91+
The scraper generates JSON files with a clean, structured format that's easy to work with. Here's what the output looks like:
92+
93+
```json
94+
[
95+
{
96+
"post_body": "This is the title of the post",
97+
"post_user": "username123",
98+
"post_time": "2023-04-15T14:30:45",
99+
"comments": [
100+
{
101+
"body": "This is a top-level comment",
102+
"user": "commenter456",
103+
"time": "2023-04-15T15:20:10",
104+
"replies": [
105+
{
106+
"body": "This is a reply to the comment",
107+
"user": "replier789",
108+
"time": "2023-04-15T16:05:30",
109+
"replies": []
110+
}
111+
]
112+
}
113+
]
114+
},
115+
{
116+
"post_body": "Another post title",
117+
"post_user": "anotheruser",
118+
"post_time": "2023-04-14T09:15:22",
119+
"comments": []
120+
}
121+
]
122+
```
123+
124+
Key features of the output format:
125+
126+
- **Posts**: Each post is represented as an object with:
127+
- `post_body`: The title of the post
128+
- `post_user`: The username of the post author
129+
- `post_time`: ISO-formatted timestamp of when the post was created
130+
- `comments`: Array of comments on the post
131+
132+
- **Comments**: Each comment is represented as an object with:
133+
- `body`: The text content of the comment
134+
- `user`: The username of the comment author
135+
- `time`: ISO-formatted timestamp of when the comment was created
136+
- `replies`: Array of replies to the comment (nested comments)
137+
138+
- **Nested Structure**: Comments can have replies, which can have their own replies, creating a tree structure that preserves the conversation flow
139+
140+
This format makes it easy to:
141+
- Analyze post and comment content
142+
- Track user activity
143+
- Measure engagement over time
144+
- Import into data analysis tools
145+
89146
## Development
90147

91148
### Project Structure

0 commit comments

Comments
 (0)