-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
L2A task suitable for someone who is comfortable helping with implementing features.A task suitable for someone who is comfortable helping with implementing features.
Description
Currently scrapers/profiles.go also does the parsing which does not match our design.
Here is what I am proposing
- Update scrapers/profiles.go to save .html files similar to scrapers/coursebook.go
- add /professors to outDir
- save profiles as {fist}-{last}.html
- Create a parser/profiles.go
- Copy all of the parsing logic into here, modified to use goquery instead of chromedp
- Update flags in main.go
- Bonus
- Add resume support to scraper
- Add a unit test for the parser
- Side effects
- parser.go uses
utils.GetAllFilesWithExtensionwhich would create an issue if the proposed/poffessorsis added so we might consider scraping coursebook intooutDir/coursebook/...instead.
- parser.go uses
Sample dir structure:
outDir (ie data)
├───coursebook
│ ├───24f
│ │ └───cp_acct
│ │ acct2301.001.24f.html
│ │ acct2301.002.24f.html
│ │ ...
│ │ ...
└───professors
first-last.html
...
I haven't worked with the profiles scraper very much but there does not seem to be any technical reason why this should not be possible.
If this is added as a task I don't mind working on it but if someone is interested feel free.
Metadata
Metadata
Assignees
Labels
L2A task suitable for someone who is comfortable helping with implementing features.A task suitable for someone who is comfortable helping with implementing features.