CodeForPhilly
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md
Lines changed: 19 additions & 47 deletions b/‎README.md
Lines changed: 19 additions & 47 deletions
diff --git a/‎src/server/alembic/generate_rfm_mapping.py
Lines changed: 3 additions & 2 deletions b/‎src/server/alembic/generate_rfm_mapping.py
Lines changed: 3 additions & 2 deletions
diff --git a/‎src/server/api/API_ingest/dropbox_handler.py
Lines changed: 4 additions & 2 deletions b/‎src/server/api/API_ingest/dropbox_handler.py
Lines changed: 4 additions & 2 deletions
diff --git a/‎src/server/api/API_ingest/ingest_sources_from_api.py
Lines changed: 4 additions & 2 deletions b/‎src/server/api/API_ingest/ingest_sources_from_api.py
Lines changed: 4 additions & 2 deletions
diff --git a/‎src/server/api/API_ingest/shelterluv_api_handler.py
Lines changed: 11 additions & 9 deletions b/‎src/server/api/API_ingest/shelterluv_api_handler.py
Lines changed: 11 additions & 9 deletions
diff --git a/‎src/server/api/admin_api.py
Lines changed: 22 additions & 18 deletions b/‎src/server/api/admin_api.py
Lines changed: 22 additions & 18 deletions
diff --git a/‎src/server/api/common_api.py
Lines changed: 8 additions & 5 deletions b/‎src/server/api/common_api.py
Lines changed: 8 additions & 5 deletions
@@ -23,3 +23,4 @@ start_env.sh
 .mypy_cache/
 *secrets*
 *kustomization*
+src/.venv/
@@ -12,61 +12,33 @@ animal care programs), administration and development efforts are coordinated by
 
 ## [The Data Pipeline](https://codeforphilly.org/projects/paws_data_pipeline)
 
-This project seeks to provide PAWS with an easy-to-use and easy-to-support tool to extract 
-data from multiple source systems, confirm accuracy and appropriateness, 
-clean/validate data where necessary (a data hygiene and wrangling step), 
-and then load relevant data into one or more repositories to facilitate 
-(1) a highly-accurate and rich 360-degree view of PAWS constituents 
-(Salesforce is a likely candidate target system; already in use at PAWS) and 
-(2) flexible ongoing data analysis and insights discovery (e.g. a data lake / data warehouse). 
-
 Through all of its operational and service activities, PAWS accumulates data regarding donations, 
 adoptions, fosters, volunteers, merchandise sales, event attendees (to name a few), 
-each in their own system and/or manual (Google Sheet) tally. This vital data that can 
+each in their own system and/or manual tally. This vital data that can 
 drive insights remains siloed and is usually difficult to extract, manipulate, and analyze. 
-Taking all of this data, making it readily available, and drawing inferences through analysis 
-can drive many benefits:   
-
-- PAWS operations can be better informed and use data-driven decisions to guide programs 
-and maximize effectiveness;  
-- Supporters can be further engaged by suggesting additional opportunities for involvement 
-based upon pattern analysis;  
-- Multi-dimensional supporters can be consistently (and accurately) acknowledged for all 
-the ways they support PAWS (i.e. a volunteer who donates and also fosters kittens), 
-not to mention opportunities to further tap the potential of these enthusiastic supporters.
-
-## [Code of Conduct](https://codeforphilly.org/pages/code_of_conduct)
-
-This is a Code for Philly project operating under their code of conduct. 
-
-## Getting started
-see [Getting Started](GettingStarted.md) to run the app locally
 
-## Project Plan
+This project provides PAWS with an easy-to-use and easy-to-support tool to extract 
+constituent data from multiple source systems, standardize extracted data, match constituents across data sources,  
+load relevant data into Salesforce, and run an automation in Salesforce to produce an RFM score. 
+Through these processes, the PAWS data pipeline has laid the groundwork for facilitating an up-to-date 360-degree view of PAWS constituents, and 
+flexible ongoing data analysis and insights discovery.
 
-### Phase 1 (now - Jan 15 2020) 
+## Uses 
 
-**Goal**: Create a central storage of data where 
+- The pipeline can inform the PAWS development team of new constiuents through volunteer or foster engagegement
+- Instead of manually matching constituents from volunteering, donations and foster/adoptions, PAWS staff only need to upload the volunteer dataset into the pipeline, and the pipeline handles the matching
+- Volunteer and Foster data are automatically loaded into the constituent's SalesForce profile
+- An RFM score is calculated for each constituent using the most recent data 
+- Data analyses can use the output of the PDP matching logic to join datasets from different sources; PAWS can benefit from such analyses in the following ways: 
+    - PAWS operations can be better informed and use data-driven decisions to guide programs and maximize effectiveness;  
+    - Supporters can be further engaged by suggesting additional opportunities for involvement based upon pattern analysis;  
+    - Multi-dimensional supporters can be consistently (and accurately) acknowledged for all the ways they support PAWS (i.e. a volunteer who donates and also fosters kittens), not to mention opportunities to further tap the potential of these enthusiastic supporters.
 
-1. Datasets from top 3 relevant sources can be uploaded as csvs to a central system: a) Donors, b) Volunteers, 
-c) Adopters
-2. All datasets in the central system can be linked to each other on an ongoing basis
-3. Notifications can be sent out to relevant parties when inconsistencies need to be handled by a human
-4. Comprehensive report on a person’s interactions with PAWS can be pulled via a simple UI (must include full known history)
-
-### Phase 2 (Jan 15 - May 15 2020)
-
-**Goal**: Expand above features to include all relevant datasets and further automate data uploads
-Datasets from all other relevant sources can be uploaded as csvs to a central system ( a) Adoption and Foster applicants, 
-b) Foster Parents, c) Attendees, d) Clinic Clients e) Champions, f) Friends)
-Where APIs exist, create automated calls to those APIs to pull data
-
-### Phase 3 (May 15 - Sept 15 2020)
+## [Code of Conduct](https://codeforphilly.org/pages/code_of_conduct)
 
-**Goal**: Create more customizable analytics reports and features (eg noshow rates in clinicHQ)
+This is a Code for Philly project operating under their code of conduct. 
 
 ## Links
 
-[Slack Channel](https://codeforphilly.org/chat?channel=paws_data_pipeline)
-
-[Google Drive](https://drive.google.com/open?id=1O8oPWLT5oDL8q_Tm4a0Gt8XCYYxEIcjiPJYHm33lXII) 
+[Slack Channel](https://codeforphilly.org/chat?channel=paws_data_pipeline)  
+[Wiki](https://github.com/CodeForPhilly/paws-data-pipeline/wiki)
@@ -1,5 +1,6 @@
 import itertools
-
+import structlog
+logger = structlog.get_logger()
 
 def get_all_combinations(chars):
     yield from itertools.product(*([chars] * 3))
@@ -71,7 +72,7 @@ def start():
             f.write("%s\n" % item)
 
 
-    print('done')
+    logger.debug('Completed generate_rfm_mapping')
 
 
 start()
@@ -1,18 +1,20 @@
 import dropbox
+import structlog
+logger = structlog.get_logger()
 
 try:
     from secrets_dict import DROPBOX_APP
 except ImportError:
     # Not running locally
-    print("Couldn't get DROPBOX_APP from file, trying environment **********")
+    logger.debug("Couldn't get DROPBOX_APP from file, trying environment **********")
     from os import environ
 
     try:
         DROPBOX_APP = environ['DROPBOX_APP']
     except KeyError:
         # Not in environment
         # You're SOL for now
-        print("Couldn't get DROPBOX_APP from file or environment")
+        logger.error("Couldn't get DROPBOX_APP from file or environment")
 
 
 class TransferData:
 
@@ -1,7 +1,9 @@
 from api.API_ingest import shelterluv_api_handler
+import structlog
+logger = structlog.get_logger()
 
 def start(conn):
-    print("Start Fetching raw data from different API sources")
+    logger.debug("Start Fetching raw data from different API sources")
     #Run each source to store the output in dropbox and in the container as a CSV
     shelterluv_api_handler.store_shelterluv_people_all(conn)
-    print("Finish Fetching raw data from different API sources")
+    logger.debug("Finish Fetching raw data from different API sources")
@@ -7,20 +7,22 @@
 from api.API_ingest.dropbox_handler import upload_file_to_dropbox
 from constants import RAW_DATA_PATH
 from models import ShelterluvPeople
+import structlog
+logger = structlog.get_logger()
 
 try:
     from secrets_dict import SHELTERLUV_SECRET_TOKEN
 except ImportError:
     # Not running locally
-    print("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
+    logger.debug("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
     from os import environ
 
     try:
         SHELTERLUV_SECRET_TOKEN = environ['SHELTERLUV_SECRET_TOKEN']
     except KeyError:
         # Not in environment
         # You're SOL for now
-        print("Couldn't get SHELTERLUV_SECRET_TOKEN from file or environment")
+        logger.error("Couldn't get SHELTERLUV_SECRET_TOKEN from file or environment")
 
 
 def write_csv(json_data):
@@ -68,7 +70,7 @@ def store_shelterluv_people_all(conn):
     has_more = True
     shelterluv_people = []
 
-    print("Start getting shelterluv contacts from people table")
+    logger.debug("Start getting shelterluv contacts from people table")
 
     while has_more:
         r = requests.get("http://shelterluv.com/api/v1/people?limit={}&offset={}".format(LIMIT, offset),
@@ -78,9 +80,9 @@ def store_shelterluv_people_all(conn):
         has_more = response["has_more"]
         offset += 100
 
-    print("Finish getting shelterluv contacts from people table")
+    logger.debug("Finish getting shelterluv contacts from people table")
 
-    print("Start storing latest shelterluvpeople results to container")
+    logger.debug("Start storing latest shelterluvpeople results to container")
     if os.listdir(RAW_DATA_PATH):
         for file_name in os.listdir(RAW_DATA_PATH):
             file_path = os.path.join(RAW_DATA_PATH, file_name)
@@ -90,11 +92,11 @@ def store_shelterluv_people_all(conn):
                 os.remove(file_path)
 
     file_path = write_csv(shelterluv_people)
-    print("Finish storing latest shelterluvpeople results to container")
+    logger.debug("Finish storing latest shelterluvpeople results to container")
 
-    print("Start storing " + '/shelterluv/' + "results to dropbox")
+    logger.debug("Start storing " + '/shelterluv/' + "results to dropbox")
     upload_file_to_dropbox(file_path, '/shelterluv/' + file_path.split('/')[-1])
-    print("Finish storing " + '/shelterluv/' + "results to dropbox")
+    logger.debug("Finish storing " + '/shelterluv/' + "results to dropbox")
 
-    print("Uploading shelterluvpeople csv to database")
+    logger.debug("Uploading shelterluvpeople csv to database")
     ShelterluvPeople.insert_from_df(pd.read_csv(file_path, dtype="string"), conn)
@@ -17,6 +17,10 @@
 from config import RAW_DATA_PATH
 from api.API_ingest.salesforce_api_handler import ingest_data
 
+import structlog
+logger = structlog.get_logger()
+
+
 ALLOWED_EXTENSIONS = {"csv", "xlsx"}
 
 
@@ -33,7 +37,7 @@ def upload_csv():
             try:
                 validate_and_arrange_upload(file)
             except Exception as e:
-                current_app.logger.exception(e)
+                logger.exception(e)
             finally:
                 file.close()
 
@@ -45,7 +49,7 @@ def upload_csv():
 def list_current_files():
     result = None
 
-    current_app.logger.info("Start returning file list")
+    logger.info("Start returning file list")
     file_list_result = os.listdir(RAW_DATA_PATH)
 
     if len(file_list_result) > 0:
@@ -57,9 +61,9 @@ def list_current_files():
 @admin_api.route("/api/execute", methods=["POST"])
 @jwt_ops.admin_required
 def execute():
-    current_app.logger.info("Execute flow")
+    logger.info("Execute flow")
     job_outcome = flow_script.start_flow() # 'busy', 'completed', or 'nothing to do'
-    current_app.logger.info("Job outcome: " + str(job_outcome))
+    logger.info("Job outcome: %s", str(job_outcome))
 
 
     # --------   Skip update if 'busy' or 'nothing to do' as nothing changed ? ------
@@ -88,8 +92,8 @@ def execute():
         try:
             connection.execute(upsert)
         except Exception as e:
-            current_app.logger.error("Insert/Update failed on Last Execution stats")
-            current_app.logger.exception(e)
+            logger.error("Insert/Update failed on Last Execution stats")
+            logger.error(e)
     # -------------------------------------------------------------------------------
 
     if job_outcome == 'busy':
@@ -128,7 +132,7 @@ def get_statistics():
 @jwt_ops.admin_required
 def list_statistics():
     """ Pull Last Execution stats from DB. """
-    current_app.logger.info("list_statistics() request")
+    logger.info("list_statistics() request")
     last_execution_details = '{}'  # Empty but valid JSON
 
     engine.dispose() # we don't want other process's conn pool
@@ -144,7 +148,7 @@ def list_statistics():
                 last_execution_details = result.fetchone()[0]
 
         except Exception as e:
-            current_app.logger.error("Failure reading Last Execution stats from DB - OK on first run")
+            logger.error("Failure reading Last Execution stats from DB - OK on first run")
         # Will happen on first run, shouldn't after 
 
     return last_execution_details
@@ -221,10 +225,10 @@ def start_job():
 
     if running_job :
         # There was a running job already
-        current_app.logger.info("Request to start job, but job_id " + str(running_job) + " already executing")
+        logger.warn("Request to start job, but job_id " + str(running_job) + " already executing")
         return None
     else:
-        current_app.logger.info("Assigned job_id " + job_id )
+        logger.info("Assigned job_id  %s" + str(job_id ) )
         return job_id
 
 
@@ -270,7 +274,7 @@ def  import_rfm_csv():
     with open('C:\\Projects\\paws-stuff\\score_tuples.csv', 'r') as csvfile:
         reader = csv.reader(csvfile, delimiter=',')
         hdr = next(reader)
-        print('Skipping header: ', hdr)
+        logger.debug('Skipping header: %s', hdr)
         for row in reader:
             score_list.append(row)
 
@@ -303,14 +307,14 @@ def write_rfm_edges(rfm_dict : dict) :
             try:
                 connection.execute(upsert)
             except Exception as e:
-                current_app.logger.error("Insert/Update failed on rfm edge ")
-                current_app.logger.exception(e)
+                logger.error("Insert/Update failed on rfm edge ")
+                logger.error(e)
                 return None
 
         return 0
 
     else :   # Malformed dict
-        current_app.logger.error("Received rfm_edge dictionary with " + str(len(rfm_dict)) + " entries - expected 3")
+        logger.error("Received rfm_edge dictionary with %s  entries - expected 3",  str(len(rfm_dict)))
         return None
 
 
@@ -322,14 +326,14 @@ def read_rfm_edges() :
     with engine.begin() as connection:   # BEGIN TRANSACTION
         q_result = connection.execute(q)
         if q_result.rowcount == 0:
-            current_app.logger.error("No rfm_edge entry found in DB")
+            logger.error("No rfm_edge entry found in DB")
             return None
         else:
             edge_string = q_result.fetchone()[0]
             try:
                 edge_dict = json.loads(edge_string)   # Convert stored string to dict
             except json.decoder.JSONDecodeError:
-                current_app.logger.error("rfm_edge entry found in DB was malformed")
+                logger.error("rfm_edge entry found in DB was malformed")
                 return None
 
             return edge_dict
@@ -381,9 +385,9 @@ def generate_dummy_rfm_scores():
 
     #   return jsonify(sfd_list)  # enable if using endpoint, but it returns a lot of data
 
-    current_app.logger.debug("Inserting dummy scores...")
+    logger.debug("Inserting dummy scores...")
     count = insert_rfm_scores(dummy_scores)
-    current_app.logger.debug("Finished inserting")
+    logger.debug("Finished inserting")
 
 
     return count
 
@@ -6,19 +6,22 @@
 import time
 from datetime import datetime
 
+import structlog
+logger = structlog.get_logger()
+
+
 from api.fake_data import sl_mock_data
 
 try:
     from secrets_dict import SHELTERLUV_SECRET_TOKEN
 except ImportError:
     # Not running locally
-    print("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
+    logger.debug("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
     from os import getenv
 
     SHELTERLUV_SECRET_TOKEN = getenv('SHELTERLUV_SECRET_TOKEN')
     if not SHELTERLUV_SECRET_TOKEN:
-        print("Couldn't get secrets from file or environment",
-            "Defaulting to Fake Data")
+        logger.warn("Couldn't get secrets from file or environment - defaulting to Fake Data")
 
 from api import jwt_ops
 
@@ -262,7 +265,7 @@ def get_support_oview(matching_id):
                 if row['source_id'].isalnum():
                     id_list.append(row['source_id'])
                 else:
-                    current_app.logger.warn("salesforcecontacts source_id " + row['source_id'] + "has non-alphanumeric characters; will not be used")
+                    logger.warn("salesforcecontacts source_id %s has non-alphanumeric characters; will not be used",  str(row['source_id']))
 
             if len(id_list) == 0: # No ids to query
                 oview_fields['number_of_gifts'] = 0    # Marker for no support data
@@ -379,7 +382,7 @@ def get_support_oview(matching_id):
 
 
         else:   # len(rows) == 0
-            current_app.logger.debug('No SF contact IDs found for matching_id ' + str(matching_id))
+            logger.warn('No SF contact IDs found for matching_id %', str(matching_id))
             oview_fields['number_of_gifts'] = 0  # Marker for no data
             return jsonify(oview_fields)