You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(README): update supplement crawl instructions and content structure
- Added instructions for crawling pages tagged as 'supplement'
- Updated content structure to include multiple content types
- Clarified post-processing details for supplements
There are two types of content downloaded- SCP Items and SCP Tales.
43
+
There are multiple types of content downloaded (Items, Tales, GOI formats, and Supplements).
38
44
39
45
All content (both SCP Items and Tales) contain the following:
40
46
@@ -66,6 +72,7 @@ The crawler generates a series of json files containing an array of objects repr
66
72
| scp_titles.json | Main | Title | scp |
67
73
| scp_hubs.json | Main | Hub | scp |
68
74
| scp_tales.json | Main | Tale | scp |
75
+
| scp_supplement.json | Main | Supplement | scp |
69
76
| scp_int.json | International | Item | scp_int |
70
77
| scp_int_titles.json | International | Title | scp_int |
71
78
| scp_int_tales.json | International | Tale | scp_int |
@@ -76,7 +83,9 @@ To regenerate all files run `make fresh`.
76
83
77
84
## Post Processed Data
78
85
79
-
The postproc system takes the Titles, Hubs, Items, and Tales and uses them to generate a comprehensive set of objects. It combines and cross references data and expands on the data already there.
86
+
The postproc system takes Titles, Hubs, Items, Tales, GOI, and Supplements and uses them to generate a comprehensive set of objects. It combines and cross references data and expands on the data already there.
87
+
88
+
Supplements are written to `data/processed/supplement/` and include additional fields like `parent_scp` and `parent_tale`.
0 commit comments