-
Notifications
You must be signed in to change notification settings - Fork 11
architecture planning
There will be four components to GenderTracker:
Responsible for handling the import of data. It should have the following methods:
- fetch (gets the data from the source)
- parse (parses the data into the appropriate form.)
There will be some basic importers:
- URLImporter
- FileImporter
- StringImporter
Note this doesn't do any actual parsing, it just passes on the elements it extracts back to whoever instantiated it so it can be passed to a parser.
Converts the data that comes from the Importers into Article.
For each source, a new parser will need to be created. I'm envisioning them for:
- NYTimesParser
- GlobeParser
- GuardianParser etc...
An Article is a generic structured version of input text content. It will have some set properties like:
- title
- body
- author
- _original_text (prior to stripping any markup and so on.)
Any additional properties can be added to an Article, like (publication date, publication name...?)
The MetricRunner is responsible for knowing:
- What metrics should be computed on an article
- What metrics are currently running (maintain a job queue)
- Where each metric should report its result (or what should happen with the result.)
MetricRunner needs to report the results of every job. These results either:
- Get written to a database
- Get stored in memory
- Get compiled into a report of some form.
- Not sure if we should separate storage from the reporting aspect... Not even sure the reporting aspect is necessary.
A metric takes in a single article. A metric is a unit of computation that returns the result for how gender biased a specific article is (although it really doesn't have to be this specific, if we want to make it as such.)
A Metric has:
- A type ("name", "pronoun" etc.)
- scores
- male [0,1] (does it need to be -1?)
- female [0,1] (does it need to be -1?)
- threshold (how high does the result need to be to return a positive indication in either direction?)
- compute method (performs the actual computation.
- What about metrics that need to know the metrics of their other articles? For example, if we build a classifier? I think this assumes you had to have built it outside the context of this system.
Responsible for
TODO: Translate images below to actual plan...
