Begin work for caching anomalies and sending alerts #6

gitjeff05 · 2020-04-20T14:28:02Z

Just opening this early to get some feedback. This would lay some groundwork for when we want to alert or notify folks of anomalies that we see. For this PR, I focused only on us-daily data (see check_history). If you run this, you should see a warning + instructions on what to do:

Found anomaly in DC on 2020-04-07 00:00:00 for type death
You should verify this with Data Entry or QA team,
then enter it into the list of allowed exceptions: "./resources/exceptions.csv"
DC,2020-04-07,death

I propose we keep a cached list of exceptions for testing monotonic data and then we can alert (via slackbot or whatever) when things do not look right.

gitjeff05 · 2020-04-20T14:29:36Z

app/check_dataset.py

+
+def send_anomaly_info(anomalies):
+    """
+    Warn user on found anomalies with instructions on how


I think this is also where we could hook in some notifications. I'll have to dig into the slack API, I don't want to hammer the room with notifications.

I think the place to start is a once-per-day notification based on either current or the history table that runs at around 5PM ET (after the 2nd shift). You should be able to render the log to a text string and attach it as a snippet to a slack message so it doesn't take up a lot of screen space unless you unroll it.

You should span a background thread in either the flask App or the Pyro4 app that sends a message on startup then sleeps until 5PM ET.

gitjeff05 · 2020-04-20T14:30:57Z

app/check_dataset.py


    log = ResultLog()

+    exceptions_cached = pd.read_csv('./resources/exceptions.csv', parse_dates=['date'])


Is this the right place for exceptions? Maybe we have an anomalies directory and then we track these by type (state, us, etc).

nah, because it is under the source tree and we don't want it in revision control (if think).

Can you add a log dir to the .ini file, create the dir if it is missing, and default it to ~/logs/exceptions.csv?

I was thinking we would want it under source control. This way, we can detect when a new anomaly is detected and we can alert the channel. If an anomaly is valid, we simply add it to the cached file so the room never gets alerted on that anomaly again. These would be fairly small files -- the current exceptions has 20 exceptions and is < 1KB is size.

gitjeff05 · 2020-04-20T14:33:21Z

app/check_dataset.py

    log.consolidate()
    return log

+def cache_exceptions():


This may belong higher up. The idea is we have a place to go to build the anomaly cache. This would not get run very often I don't think. It duplicates some code in check_history

I think it belongs a level up and it should call the current/history versions. anything in those lists are/should be exceptions.

gitjeff05 · 2020-04-20T14:37:23Z

app/checks.py

-            error_dates_str = error_dates.astype(str).str.cat(sep=", ")
-
+        if (df[col].is_monotonic == False):
+            dates = df.loc[df[col].diff() < 0, 'date']


I refactored this because initially, I thought there was a bug. There wasn't. But, I did find a shorter way to do the checking and comparison with the pandas is_monotonic/diff() prop/function.

Begin work for caching anomalies and sending alerts

0ed5e01

gitjeff05 commented Apr 20, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin work for caching anomalies and sending alerts #6

Begin work for caching anomalies and sending alerts #6

Uh oh!

gitjeff05 commented Apr 20, 2020

Uh oh!

gitjeff05 Apr 20, 2020

Uh oh!

joshuaellinger Apr 20, 2020

Uh oh!

gitjeff05 Apr 20, 2020

Uh oh!

joshuaellinger Apr 20, 2020

Uh oh!

gitjeff05 Apr 20, 2020

Uh oh!

gitjeff05 Apr 20, 2020

Uh oh!

joshuaellinger Apr 20, 2020

Uh oh!

gitjeff05 Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		log = ResultLog()

		exceptions_cached = pd.read_csv('./resources/exceptions.csv', parse_dates=['date'])

Begin work for caching anomalies and sending alerts #6

Are you sure you want to change the base?

Begin work for caching anomalies and sending alerts #6

Uh oh!

Conversation

gitjeff05 commented Apr 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants