Skip to content
hrybacki edited this page Mar 21, 2013 · 5 revisions

CitationEngine

  • Add logging to detect unknown errors, reference types, etc -> parsers.py
  • Write pretty-print repr for Document
  • Resolve given names -> .-delimited initials
  • Resolve journal title abbreviations
  • Improve document merging / conflict resolution
  • Need to know who, when, and from which batch. I.E.: user.datetime.query.pointers to all documents/raw data collected.
    • local storage?
    • database?
    • used in conjunction with merges/conflict resolutions
  • Consider Bloom filter vs hash for DB queries
  • controllers.py -- Store meta-collection data i.e. query used, source obtained from, and timestamp
  • Think about an optimistic insert or random ID
  • Think about saving all captured information to disk -- json?
  • Document controller -- unsure about resolutionToken lifespan
  • Determine first date articles added to OAI and modify default DATE_FROM in controllers.py
  • Should db.add_or_update() return the objectID of the document inserted into the DB?
  • db.py should have param=DB; where DB = database the controller should interact with
Clone this wiki locally