If we autolearn, need to gather as many identifiers as possible to see if we match something already in the database.
Undoubtedly this should go at a slightly lower level in terms of normalization, massaging, dup-checking, etc. before we insert new pubs into the database.