Seperate profile parsing from scrapers/profiles.go

Currently scrapers/profiles.go also does the parsing which does not match our design.  

Here is what I am proposing 

- Update scrapers/profiles.go to save .html files similar to scrapers/coursebook.go 
   - add /professors to outDir
   - save profiles as {fist}-{last}.html
- Create a parser/profiles.go
   - Copy all of the parsing logic into here, modified to use goquery instead of chromedp
- Update flags in main.go 
- Bonus 
  - Add resume support to scraper
  - Add a unit test for the parser
- Side effects
  - parser.go uses `utils.GetAllFilesWithExtension`  which would create an issue if the proposed `/poffessors` is added so we might consider scraping coursebook into `outDir/coursebook/...` instead. 

```
Sample dir structure: 

 outDir (ie data)
    ├───coursebook
    │   ├───24f
    │   │   └───cp_acct
    │   │           acct2301.001.24f.html
    │   │           acct2301.002.24f.html
    │   │           ...
    │   │    ...
    └───professors
            first-last.html
            ...

```


I haven't worked with the profiles scraper very much but there does not seem to be any technical reason why this should not be possible.

If this is added as a task I don't mind working on it but if someone is interested feel free.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seperate profile parsing from scrapers/profiles.go #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seperate profile parsing from scrapers/profiles.go #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions