Skip to content

Conversation

TorfinnOlsen
Copy link
Collaborator

  • spReputationAnalysis.py

Will hit the mongoDB and give you a high level cursory analysis of similarities between collections, list of collections and unique ones in the different tables, and samples from each of them so you can get a visual recognition of what's in these DB's. Run this first and you've got a good starting point of what you're looking at. Also gives you the most recent date a record was added to give you a sense of when the table was last updated.

  • spReputationDateAnalysis.py

Hits Mongo online and does a more indepth analysis of the date ranges of records so you know what's covered in each of the DB's you're looking at.

  • spReputationExport.py

This will export all of the DB's in TOTAL (they're pretty small less than a gig but still quite a bit to try and load up in a google sheet) to csv's.

  • load_csvs.py

This assumes you've installed pandas, and will load everything up into data frames so you can do a cleaning process across all the data as one set and start analysis of dates of occurrences, frequency, etc. I think I can bang out a handful of scripts that do this high level here.

@willscott
Copy link
Collaborator

Your PR doesn't include the contents of these scripts yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants