Skip to content

Commit ebc0417

Browse files
committed
Merge remote-tracking branch 'origin/master' into 512-salesforce-automation
# Conflicts: # src/server/requirements.txt
2 parents a39642e + 845898f commit ebc0417

21 files changed

+270
-172
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ start_env.sh
2323
.mypy_cache/
2424
*secrets*
2525
*kustomization*
26+
src/.venv/

README.md

Lines changed: 19 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -12,61 +12,33 @@ animal care programs), administration and development efforts are coordinated by
1212

1313
## [The Data Pipeline](https://codeforphilly.org/projects/paws_data_pipeline)
1414

15-
This project seeks to provide PAWS with an easy-to-use and easy-to-support tool to extract
16-
data from multiple source systems, confirm accuracy and appropriateness,
17-
clean/validate data where necessary (a data hygiene and wrangling step),
18-
and then load relevant data into one or more repositories to facilitate
19-
(1) a highly-accurate and rich 360-degree view of PAWS constituents
20-
(Salesforce is a likely candidate target system; already in use at PAWS) and
21-
(2) flexible ongoing data analysis and insights discovery (e.g. a data lake / data warehouse).
22-
2315
Through all of its operational and service activities, PAWS accumulates data regarding donations,
2416
adoptions, fosters, volunteers, merchandise sales, event attendees (to name a few),
25-
each in their own system and/or manual (Google Sheet) tally. This vital data that can
17+
each in their own system and/or manual tally. This vital data that can
2618
drive insights remains siloed and is usually difficult to extract, manipulate, and analyze.
27-
Taking all of this data, making it readily available, and drawing inferences through analysis
28-
can drive many benefits:
29-
30-
- PAWS operations can be better informed and use data-driven decisions to guide programs
31-
and maximize effectiveness;
32-
- Supporters can be further engaged by suggesting additional opportunities for involvement
33-
based upon pattern analysis;
34-
- Multi-dimensional supporters can be consistently (and accurately) acknowledged for all
35-
the ways they support PAWS (i.e. a volunteer who donates and also fosters kittens),
36-
not to mention opportunities to further tap the potential of these enthusiastic supporters.
37-
38-
## [Code of Conduct](https://codeforphilly.org/pages/code_of_conduct)
39-
40-
This is a Code for Philly project operating under their code of conduct.
41-
42-
## Getting started
43-
see [Getting Started](GettingStarted.md) to run the app locally
4419

45-
## Project Plan
20+
This project provides PAWS with an easy-to-use and easy-to-support tool to extract
21+
constituent data from multiple source systems, standardize extracted data, match constituents across data sources,
22+
load relevant data into Salesforce, and run an automation in Salesforce to produce an RFM score.
23+
Through these processes, the PAWS data pipeline has laid the groundwork for facilitating an up-to-date 360-degree view of PAWS constituents, and
24+
flexible ongoing data analysis and insights discovery.
4625

47-
### Phase 1 (now - Jan 15 2020)
26+
## Uses
4827

49-
**Goal**: Create a central storage of data where
28+
- The pipeline can inform the PAWS development team of new constiuents through volunteer or foster engagegement
29+
- Instead of manually matching constituents from volunteering, donations and foster/adoptions, PAWS staff only need to upload the volunteer dataset into the pipeline, and the pipeline handles the matching
30+
- Volunteer and Foster data are automatically loaded into the constituent's SalesForce profile
31+
- An RFM score is calculated for each constituent using the most recent data
32+
- Data analyses can use the output of the PDP matching logic to join datasets from different sources; PAWS can benefit from such analyses in the following ways:
33+
- PAWS operations can be better informed and use data-driven decisions to guide programs and maximize effectiveness;
34+
- Supporters can be further engaged by suggesting additional opportunities for involvement based upon pattern analysis;
35+
- Multi-dimensional supporters can be consistently (and accurately) acknowledged for all the ways they support PAWS (i.e. a volunteer who donates and also fosters kittens), not to mention opportunities to further tap the potential of these enthusiastic supporters.
5036

51-
1. Datasets from top 3 relevant sources can be uploaded as csvs to a central system: a) Donors, b) Volunteers,
52-
c) Adopters
53-
2. All datasets in the central system can be linked to each other on an ongoing basis
54-
3. Notifications can be sent out to relevant parties when inconsistencies need to be handled by a human
55-
4. Comprehensive report on a person’s interactions with PAWS can be pulled via a simple UI (must include full known history)
56-
57-
### Phase 2 (Jan 15 - May 15 2020)
58-
59-
**Goal**: Expand above features to include all relevant datasets and further automate data uploads
60-
Datasets from all other relevant sources can be uploaded as csvs to a central system ( a) Adoption and Foster applicants,
61-
b) Foster Parents, c) Attendees, d) Clinic Clients e) Champions, f) Friends)
62-
Where APIs exist, create automated calls to those APIs to pull data
63-
64-
### Phase 3 (May 15 - Sept 15 2020)
37+
## [Code of Conduct](https://codeforphilly.org/pages/code_of_conduct)
6538

66-
**Goal**: Create more customizable analytics reports and features (eg noshow rates in clinicHQ)
39+
This is a Code for Philly project operating under their code of conduct.
6740

6841
## Links
6942

70-
[Slack Channel](https://codeforphilly.org/chat?channel=paws_data_pipeline)
71-
72-
[Google Drive](https://drive.google.com/open?id=1O8oPWLT5oDL8q_Tm4a0Gt8XCYYxEIcjiPJYHm33lXII)
43+
[Slack Channel](https://codeforphilly.org/chat?channel=paws_data_pipeline)
44+
[Wiki](https://github.com/CodeForPhilly/paws-data-pipeline/wiki)

src/server/alembic/generate_rfm_mapping.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import itertools
2-
2+
import structlog
3+
logger = structlog.get_logger()
34

45
def get_all_combinations(chars):
56
yield from itertools.product(*([chars] * 3))
@@ -71,7 +72,7 @@ def start():
7172
f.write("%s\n" % item)
7273

7374

74-
print('done')
75+
logger.debug('Completed generate_rfm_mapping')
7576

7677

7778
start()

src/server/api/API_ingest/dropbox_handler.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
import dropbox
2+
import structlog
3+
logger = structlog.get_logger()
24

35
try:
46
from secrets_dict import DROPBOX_APP
57
except ImportError:
68
# Not running locally
7-
print("Couldn't get DROPBOX_APP from file, trying environment **********")
9+
logger.debug("Couldn't get DROPBOX_APP from file, trying environment **********")
810
from os import environ
911

1012
try:
1113
DROPBOX_APP = environ['DROPBOX_APP']
1214
except KeyError:
1315
# Not in environment
1416
# You're SOL for now
15-
print("Couldn't get DROPBOX_APP from file or environment")
17+
logger.error("Couldn't get DROPBOX_APP from file or environment")
1618

1719

1820
class TransferData:
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
from api.API_ingest import shelterluv_api_handler
2+
import structlog
3+
logger = structlog.get_logger()
24

35
def start(conn):
4-
print("Start Fetching raw data from different API sources")
6+
logger.debug("Start Fetching raw data from different API sources")
57
#Run each source to store the output in dropbox and in the container as a CSV
68
shelterluv_api_handler.store_shelterluv_people_all(conn)
7-
print("Finish Fetching raw data from different API sources")
9+
logger.debug("Finish Fetching raw data from different API sources")

src/server/api/API_ingest/shelterluv_api_handler.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,22 @@
77
from api.API_ingest.dropbox_handler import upload_file_to_dropbox
88
from constants import RAW_DATA_PATH
99
from models import ShelterluvPeople
10+
import structlog
11+
logger = structlog.get_logger()
1012

1113
try:
1214
from secrets_dict import SHELTERLUV_SECRET_TOKEN
1315
except ImportError:
1416
# Not running locally
15-
print("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
17+
logger.debug("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
1618
from os import environ
1719

1820
try:
1921
SHELTERLUV_SECRET_TOKEN = environ['SHELTERLUV_SECRET_TOKEN']
2022
except KeyError:
2123
# Not in environment
2224
# You're SOL for now
23-
print("Couldn't get SHELTERLUV_SECRET_TOKEN from file or environment")
25+
logger.error("Couldn't get SHELTERLUV_SECRET_TOKEN from file or environment")
2426

2527

2628
def write_csv(json_data):
@@ -68,7 +70,7 @@ def store_shelterluv_people_all(conn):
6870
has_more = True
6971
shelterluv_people = []
7072

71-
print("Start getting shelterluv contacts from people table")
73+
logger.debug("Start getting shelterluv contacts from people table")
7274

7375
while has_more:
7476
r = requests.get("http://shelterluv.com/api/v1/people?limit={}&offset={}".format(LIMIT, offset),
@@ -78,9 +80,9 @@ def store_shelterluv_people_all(conn):
7880
has_more = response["has_more"]
7981
offset += 100
8082

81-
print("Finish getting shelterluv contacts from people table")
83+
logger.debug("Finish getting shelterluv contacts from people table")
8284

83-
print("Start storing latest shelterluvpeople results to container")
85+
logger.debug("Start storing latest shelterluvpeople results to container")
8486
if os.listdir(RAW_DATA_PATH):
8587
for file_name in os.listdir(RAW_DATA_PATH):
8688
file_path = os.path.join(RAW_DATA_PATH, file_name)
@@ -90,11 +92,11 @@ def store_shelterluv_people_all(conn):
9092
os.remove(file_path)
9193

9294
file_path = write_csv(shelterluv_people)
93-
print("Finish storing latest shelterluvpeople results to container")
95+
logger.debug("Finish storing latest shelterluvpeople results to container")
9496

95-
print("Start storing " + '/shelterluv/' + "results to dropbox")
97+
logger.debug("Start storing " + '/shelterluv/' + "results to dropbox")
9698
upload_file_to_dropbox(file_path, '/shelterluv/' + file_path.split('/')[-1])
97-
print("Finish storing " + '/shelterluv/' + "results to dropbox")
99+
logger.debug("Finish storing " + '/shelterluv/' + "results to dropbox")
98100

99-
print("Uploading shelterluvpeople csv to database")
101+
logger.debug("Uploading shelterluvpeople csv to database")
100102
ShelterluvPeople.insert_from_df(pd.read_csv(file_path, dtype="string"), conn)

src/server/api/admin_api.py

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@
1717
from config import RAW_DATA_PATH
1818
from api.API_ingest.salesforce_api_handler import ingest_data
1919

20+
import structlog
21+
logger = structlog.get_logger()
22+
23+
2024
ALLOWED_EXTENSIONS = {"csv", "xlsx"}
2125

2226

@@ -33,7 +37,7 @@ def upload_csv():
3337
try:
3438
validate_and_arrange_upload(file)
3539
except Exception as e:
36-
current_app.logger.exception(e)
40+
logger.exception(e)
3741
finally:
3842
file.close()
3943

@@ -45,7 +49,7 @@ def upload_csv():
4549
def list_current_files():
4650
result = None
4751

48-
current_app.logger.info("Start returning file list")
52+
logger.info("Start returning file list")
4953
file_list_result = os.listdir(RAW_DATA_PATH)
5054

5155
if len(file_list_result) > 0:
@@ -57,9 +61,9 @@ def list_current_files():
5761
@admin_api.route("/api/execute", methods=["POST"])
5862
@jwt_ops.admin_required
5963
def execute():
60-
current_app.logger.info("Execute flow")
64+
logger.info("Execute flow")
6165
job_outcome = flow_script.start_flow() # 'busy', 'completed', or 'nothing to do'
62-
current_app.logger.info("Job outcome: " + str(job_outcome))
66+
logger.info("Job outcome: %s", str(job_outcome))
6367

6468

6569
# -------- Skip update if 'busy' or 'nothing to do' as nothing changed ? ------
@@ -88,8 +92,8 @@ def execute():
8892
try:
8993
connection.execute(upsert)
9094
except Exception as e:
91-
current_app.logger.error("Insert/Update failed on Last Execution stats")
92-
current_app.logger.exception(e)
95+
logger.error("Insert/Update failed on Last Execution stats")
96+
logger.error(e)
9397
# -------------------------------------------------------------------------------
9498

9599
if job_outcome == 'busy':
@@ -128,7 +132,7 @@ def get_statistics():
128132
@jwt_ops.admin_required
129133
def list_statistics():
130134
""" Pull Last Execution stats from DB. """
131-
current_app.logger.info("list_statistics() request")
135+
logger.info("list_statistics() request")
132136
last_execution_details = '{}' # Empty but valid JSON
133137

134138
engine.dispose() # we don't want other process's conn pool
@@ -144,7 +148,7 @@ def list_statistics():
144148
last_execution_details = result.fetchone()[0]
145149

146150
except Exception as e:
147-
current_app.logger.error("Failure reading Last Execution stats from DB - OK on first run")
151+
logger.error("Failure reading Last Execution stats from DB - OK on first run")
148152
# Will happen on first run, shouldn't after
149153

150154
return last_execution_details
@@ -221,10 +225,10 @@ def start_job():
221225

222226
if running_job :
223227
# There was a running job already
224-
current_app.logger.info("Request to start job, but job_id " + str(running_job) + " already executing")
228+
logger.warn("Request to start job, but job_id " + str(running_job) + " already executing")
225229
return None
226230
else:
227-
current_app.logger.info("Assigned job_id " + job_id )
231+
logger.info("Assigned job_id %s" + str(job_id ) )
228232
return job_id
229233

230234

@@ -270,7 +274,7 @@ def import_rfm_csv():
270274
with open('C:\\Projects\\paws-stuff\\score_tuples.csv', 'r') as csvfile:
271275
reader = csv.reader(csvfile, delimiter=',')
272276
hdr = next(reader)
273-
print('Skipping header: ', hdr)
277+
logger.debug('Skipping header: %s', hdr)
274278
for row in reader:
275279
score_list.append(row)
276280

@@ -303,14 +307,14 @@ def write_rfm_edges(rfm_dict : dict) :
303307
try:
304308
connection.execute(upsert)
305309
except Exception as e:
306-
current_app.logger.error("Insert/Update failed on rfm edge ")
307-
current_app.logger.exception(e)
310+
logger.error("Insert/Update failed on rfm edge ")
311+
logger.error(e)
308312
return None
309313

310314
return 0
311315

312316
else : # Malformed dict
313-
current_app.logger.error("Received rfm_edge dictionary with " + str(len(rfm_dict)) + " entries - expected 3")
317+
logger.error("Received rfm_edge dictionary with %s entries - expected 3", str(len(rfm_dict)))
314318
return None
315319

316320

@@ -322,14 +326,14 @@ def read_rfm_edges() :
322326
with engine.begin() as connection: # BEGIN TRANSACTION
323327
q_result = connection.execute(q)
324328
if q_result.rowcount == 0:
325-
current_app.logger.error("No rfm_edge entry found in DB")
329+
logger.error("No rfm_edge entry found in DB")
326330
return None
327331
else:
328332
edge_string = q_result.fetchone()[0]
329333
try:
330334
edge_dict = json.loads(edge_string) # Convert stored string to dict
331335
except json.decoder.JSONDecodeError:
332-
current_app.logger.error("rfm_edge entry found in DB was malformed")
336+
logger.error("rfm_edge entry found in DB was malformed")
333337
return None
334338

335339
return edge_dict
@@ -381,9 +385,9 @@ def generate_dummy_rfm_scores():
381385

382386
# return jsonify(sfd_list) # enable if using endpoint, but it returns a lot of data
383387

384-
current_app.logger.debug("Inserting dummy scores...")
388+
logger.debug("Inserting dummy scores...")
385389
count = insert_rfm_scores(dummy_scores)
386-
current_app.logger.debug("Finished inserting")
390+
logger.debug("Finished inserting")
387391

388392

389393
return count

src/server/api/common_api.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,22 @@
66
import time
77
from datetime import datetime
88

9+
import structlog
10+
logger = structlog.get_logger()
11+
12+
913
from api.fake_data import sl_mock_data
1014

1115
try:
1216
from secrets_dict import SHELTERLUV_SECRET_TOKEN
1317
except ImportError:
1418
# Not running locally
15-
print("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
19+
logger.debug("Couldn't get SHELTERLUV_SECRET_TOKEN from file, trying environment **********")
1620
from os import getenv
1721

1822
SHELTERLUV_SECRET_TOKEN = getenv('SHELTERLUV_SECRET_TOKEN')
1923
if not SHELTERLUV_SECRET_TOKEN:
20-
print("Couldn't get secrets from file or environment",
21-
"Defaulting to Fake Data")
24+
logger.warn("Couldn't get secrets from file or environment - defaulting to Fake Data")
2225

2326
from api import jwt_ops
2427

@@ -262,7 +265,7 @@ def get_support_oview(matching_id):
262265
if row['source_id'].isalnum():
263266
id_list.append(row['source_id'])
264267
else:
265-
current_app.logger.warn("salesforcecontacts source_id " + row['source_id'] + "has non-alphanumeric characters; will not be used")
268+
logger.warn("salesforcecontacts source_id %s has non-alphanumeric characters; will not be used", str(row['source_id']))
266269

267270
if len(id_list) == 0: # No ids to query
268271
oview_fields['number_of_gifts'] = 0 # Marker for no support data
@@ -379,7 +382,7 @@ def get_support_oview(matching_id):
379382

380383

381384
else: # len(rows) == 0
382-
current_app.logger.debug('No SF contact IDs found for matching_id ' + str(matching_id))
385+
logger.warn('No SF contact IDs found for matching_id %', str(matching_id))
383386
oview_fields['number_of_gifts'] = 0 # Marker for no data
384387
return jsonify(oview_fields)
385388

0 commit comments

Comments
 (0)