|
320 | 320 | "- DayOfYear\n", |
321 | 321 | "\n", |
322 | 322 | "Timeseries Aware Calculated Fields.\n", |
323 | | - "- PercentileEventDetection\n", |
| 323 | + "- AbovePercentileEventDetection\n", |
324 | 324 | "\n", |
325 | 325 | "There will be more added over time. If there is one you are particularly interested in, please reach out and let us know." |
326 | 326 | ] |
|
541 | 541 | }, |
542 | 542 | { |
543 | 543 | "cell_type": "code", |
544 | | - "execution_count": 11, |
| 544 | + "execution_count": null, |
545 | 545 | "metadata": {}, |
546 | 546 | "outputs": [ |
547 | 547 | { |
|
554 | 554 | ], |
555 | 555 | "source": [ |
556 | 556 | "ev.joined_timeseries.add_calculated_fields([\n", |
557 | | - " tcf.PercentileEventDetection()\n", |
| 557 | + " tcf.AbovePercentileEventDetection()\n", |
558 | 558 | "]).write()" |
559 | 559 | ] |
560 | 560 | }, |
|
749 | 749 | }, |
750 | 750 | { |
751 | 751 | "cell_type": "code", |
752 | | - "execution_count": 14, |
| 752 | + "execution_count": null, |
753 | 753 | "metadata": {}, |
754 | 754 | "outputs": [], |
755 | 755 | "source": [ |
756 | 756 | "pdf = ev.joined_timeseries.filter([\n", |
757 | 757 | " \"primary_location_id = 'usgs-14138800'\",\n", |
758 | | - " \"event = true\",\n", |
| 758 | + " \"event_above = true\",\n", |
759 | 759 | "]).to_pandas()" |
760 | 760 | ] |
761 | 761 | }, |
762 | 762 | { |
763 | 763 | "cell_type": "code", |
764 | | - "execution_count": 15, |
| 764 | + "execution_count": null, |
765 | 765 | "metadata": {}, |
766 | 766 | "outputs": [], |
767 | 767 | "source": [ |
768 | | - "primary_plot = pdf.hvplot.points(x=\"value_time\", y=\"primary_value\", color=\"event_id\") #.opts(width=1200, height=400)" |
| 768 | + "primary_plot = pdf.hvplot.points(x=\"value_time\", y=\"primary_value\", color=\"event_above_id\") #.opts(width=1200, height=400)" |
769 | 769 | ] |
770 | 770 | }, |
771 | 771 | { |
|
947 | 947 | "(\n", |
948 | 948 | " ev.metrics\n", |
949 | 949 | " .query(\n", |
950 | | - " group_by=[\"configuration_name\", \"primary_location_id\", \"event_id\"],\n", |
| 950 | + " group_by=[\"configuration_name\", \"primary_location_id\", \"event_above_id\"],\n", |
951 | 951 | " filters=[\n", |
952 | 952 | " \"primary_location_id = 'usgs-14138800'\",\n", |
953 | | - " \"event = true\",\n", |
| 953 | + " \"event_above = true\",\n", |
954 | 954 | " ],\n", |
955 | 955 | " include_metrics=[\n", |
956 | 956 | " teehr.Signatures.Maximum(\n", |
|
1004 | 1004 | "(\n", |
1005 | 1005 | " ev.metrics\n", |
1006 | 1006 | " .query(\n", |
1007 | | - " group_by=[\"configuration_name\", \"primary_location_id\", \"event_id\"],\n", |
| 1007 | + " group_by=[\"configuration_name\", \"primary_location_id\", \"event_above_id\"],\n", |
1008 | 1008 | " filters=[\n", |
1009 | 1009 | " \"primary_location_id = 'usgs-14138800'\",\n", |
1010 | | - " \"event = true\",\n", |
| 1010 | + " \"event_above = true\",\n", |
1011 | 1011 | " ],\n", |
1012 | 1012 | " include_metrics=[\n", |
1013 | 1013 | " teehr.Signatures.Maximum(\n", |
|
1037 | 1037 | "cell_type": "markdown", |
1038 | 1038 | "metadata": {}, |
1039 | 1039 | "source": [ |
1040 | | - "One last thing to cover here. So far we have added the calculated fields on the `joined_timeseries` table, written them to disk, and then queried the `joined_timeseries` table to calculate metrics. This works well and allows the calculated fields to be calculated once and used in many subsequent metrics, plots, etc. However, you may wish to add temporary fields to the `joined_timeseries` table as part of the metrics calculation. This can be done too. Building on the pervious example, where we calculated the \"event_max_relative_bias\", lets now assume we want to calculate the same metric but for the 90th percentile instead of the default 85th percentile that we used when we added the added the `event` and `event_id` fields to the `joined_timeseries` table. We could add the new \"90th percentile event\" to the `joined_timeseries` table and save to disk and then proceed as we did before, or we can add new `event90` and `event90_id` fields to the data frame temporarily before calculating the maximum event values and ultimately the \"event_90th_max_relative_bias\"." |
| 1040 | + "One last thing to cover here. So far we have added the calculated fields on the `joined_timeseries` table, written them to disk, and then queried the `joined_timeseries` table to calculate metrics. This works well and allows the calculated fields to be calculated once and used in many subsequent metrics, plots, etc. However, you may wish to add temporary fields to the `joined_timeseries` table as part of the metrics calculation. This can be done too. Building on the previous example, where we calculated the \"event_max_relative_bias\", lets now assume we want to calculate the same metric but for the 90th percentile instead of the default 85th percentile that we used when we added the added the `event` and `event_id` fields to the `joined_timeseries` table. We could add the new \"90th percentile event\" to the `joined_timeseries` table and save to disk and then proceed as we did before, or we can add new `event90` and `event90_id` fields to the data frame temporarily before calculating the maximum event values and ultimately the \"event_90th_max_relative_bias\"." |
1041 | 1041 | ] |
1042 | 1042 | }, |
1043 | 1043 | { |
|
1075 | 1075 | "source": [ |
1076 | 1076 | "(\n", |
1077 | 1077 | " ev.metrics\n", |
1078 | | - " # Add the PercentileEventDetection calculated field to identify events greater than the 90th percentile.\n", |
| 1078 | + " # Add the AbovePercentileEventDetection calculated field to identify events greater than the 90th percentile.\n", |
1079 | 1079 | " # Note the output_event_field_name and output_event_id_field_name are set to \"event90\" and \"event90_id\" respectively.\n", |
1080 | 1080 | " .add_calculated_fields([\n", |
1081 | | - " tcf.PercentileEventDetection(\n", |
| 1081 | + " tcf.AbovePercentileEventDetection(\n", |
1082 | 1082 | " quantile=0.90,\n", |
1083 | 1083 | " output_event_field_name=\"event90\",\n", |
1084 | 1084 | " output_event_id_field_name=\"event90_id\"\n", |
|
1118 | 1118 | ")" |
1119 | 1119 | ] |
1120 | 1120 | }, |
| 1121 | + { |
| 1122 | + "cell_type": "markdown", |
| 1123 | + "metadata": {}, |
| 1124 | + "source": [ |
| 1125 | + "Categorical Deterministic Metrics\n", |
| 1126 | + "---------------------------------\n", |
| 1127 | + "\n", |
| 1128 | + "Now that we understand routines for grouping/filtering data and added calculated fields in TEEHR to obtain performance metrics, we can introduce the concept of categorical deterministic metrics. \n", |
| 1129 | + "\n", |
| 1130 | + "Categorical deterministic metrics are utilized to measure qualitative attributes and are used to evaluate binary outcomes given some condition or categorical classification. Put simply, these methods help compare predictions made by the model with observed data to indicate where the model was right or wrong.\n", |
| 1131 | + "\n", |
| 1132 | + "Currently, the following categorical deterministic methods are available in TEEHR:\n", |
| 1133 | + "\n", |
| 1134 | + "- [ConfusionMatrix](https://rtiinternational.github.io/teehr/api/generated/teehr.DeterministicMetrics.html#teehr.DeterministicMetrics.ConfusionMatrix)\n", |
| 1135 | + "- [FalseAlarmRatio](https://rtiinternational.github.io/teehr/api/generated/teehr.DeterministicMetrics.html#teehr.DeterministicMetrics.FalseAlarmRatio)\n", |
| 1136 | + "- [ProbabilityOfDetection](https://rtiinternational.github.io/teehr/api/generated/teehr.DeterministicMetrics.html#teehr.DeterministicMetrics.ProbabilityOfDetection)\n", |
| 1137 | + "- [ProbabilityOfFalseDetection](https://rtiinternational.github.io/teehr/api/generated/teehr.DeterministicMetrics.html#teehr.DeterministicMetrics.ProbabilityOfFalseDetection)\n", |
| 1138 | + "- [CriticalSuccessIndex](https://rtiinternational.github.io/teehr/api/generated/teehr.DeterministicMetrics.html#teehr.DeterministicMetrics.CriticalSuccessIndex)\n", |
| 1139 | + "\n", |
| 1140 | + "Unlike other metrics in TEEHR which operate on the default fields present in the joined timeseries table, categorical deterministic metrics require an additional 'threshold field' column to categorize model predictive performance based on a user defined flow threshold. \n", |
| 1141 | + "\n", |
| 1142 | + "For example, let's utilize our categorical metrics to evaluate model performance when predicting streamflow that exceeds the 10% return interval. Lets start from the default `joined_timeseries` table and complete the chained operation in-memory as opposed to writing to disk or using the fields we calculated in the previous steps." |
| 1143 | + ] |
| 1144 | + }, |
| 1145 | + { |
| 1146 | + "cell_type": "code", |
| 1147 | + "execution_count": null, |
| 1148 | + "metadata": {}, |
| 1149 | + "outputs": [], |
| 1150 | + "source": [ |
| 1151 | + "# overwrite the existing joined_timeseries table to demonstrate chained query\n", |
| 1152 | + "ev.joined_timeseries.create(add_attrs=False, execute_scripts=False)" |
| 1153 | + ] |
| 1154 | + }, |
| 1155 | + { |
| 1156 | + "cell_type": "code", |
| 1157 | + "execution_count": null, |
| 1158 | + "metadata": {}, |
| 1159 | + "outputs": [], |
| 1160 | + "source": [ |
| 1161 | + "metrics_df = ev.metrics.add_calculated_fields([\n", |
| 1162 | + " # adds 'event', 'event_id', and 'quantile_value' fields to the joined timeseries table\n", |
| 1163 | + " teehr.TimeseriesAwareCalculatedFields.AbovePercentileEventDetection(\n", |
| 1164 | + " quantile=0.90,\n", |
| 1165 | + " add_quantile_field=True\n", |
| 1166 | + " )\n", |
| 1167 | + "]).query(\n", |
| 1168 | + " # calculate all available categorical deterministic metrics using the 'quantile_value' field as the threshold\n", |
| 1169 | + " group_by=['primary_location_id', 'configuration_name'],\n", |
| 1170 | + " include_metrics=[\n", |
| 1171 | + " teehr.DeterministicMetrics.ConfusionMatrix(\n", |
| 1172 | + " threshold_field_name='quantile_value'\n", |
| 1173 | + " ),\n", |
| 1174 | + " teehr.DeterministicMetrics.FalseAlarmRatio(\n", |
| 1175 | + " threshold_field_name='quantile_value'\n", |
| 1176 | + " ),\n", |
| 1177 | + " teehr.DeterministicMetrics.ProbabilityOfDetection(\n", |
| 1178 | + " threshold_field_name='quantile_value'\n", |
| 1179 | + " ),\n", |
| 1180 | + " teehr.DeterministicMetrics.ProbabilityOfFalseDetection(\n", |
| 1181 | + " threshold_field_name='quantile_value'\n", |
| 1182 | + " ),\n", |
| 1183 | + " teehr.DeterministicMetrics.CriticalSuccessIndex(\n", |
| 1184 | + " threshold_field_name='quantile_value'\n", |
| 1185 | + " )\n", |
| 1186 | + " ]\n", |
| 1187 | + ").to_sdf().show()" |
| 1188 | + ] |
| 1189 | + }, |
| 1190 | + { |
| 1191 | + "cell_type": "markdown", |
| 1192 | + "metadata": {}, |
| 1193 | + "source": [ |
| 1194 | + "In the above example, we utilize a chained query to add a 'threshold field' to our unaltered `joined_timeseries` table and calculate all available categorical metrics in one operation. \n", |
| 1195 | + "\n", |
| 1196 | + "The chained operation executes in two steps:\n", |
| 1197 | + "\n", |
| 1198 | + "- The first portion of the operation uses the `AbovePercentileEventDetection` method with `add_quantile_field` set to True to obtain the flow value that corresponds to the 90% quantile for each row in the joined timeseries table.\n", |
| 1199 | + "\n", |
| 1200 | + "- The following portion takes the resulting table (with `AbovePercentileEventDetection` fields present) and executes a metrics query for our categorical deterministic metrics (with `threshold_field_name` arguments populated with the default `output_quantile_field_name` specified in the `AbovePercentileEventDetection` method). \n", |
| 1201 | + "\n", |
| 1202 | + "As you can see, we obtain a unique entry for the following metrics per unique combination of `primary_location_id` and `configuration_name` (as specified in the `group_by` argument in the `ev.metrics.query()` call).\n", |
| 1203 | + "\n", |
| 1204 | + "<b>When employing categorical deterministic metrics in TEEHR, keep in mind that each aggregation must correspond to exactly one 'threshold value'</b>. \n", |
| 1205 | + "\n", |
| 1206 | + "In this example, that criteria is achieved inherently given the `quantile_value` was generated using a `TimeSeriesAwareCalculatedField` which considers each 'unique timeseries' (where 'unique timeseries' is defined by `AbovePercentileEventDetection.uniqueness_fields`, by default=`['reference_time', 'primary_location_id', 'configuration_name', 'variable_name', 'unit_name']`). \n", |
| 1207 | + "\n", |
| 1208 | + "By that same logic, each unique combination of `primary_location_id` and `configuration_name` defined in the `group_by` argument in this example has exactly one 'threshold value' because `['reference_time', 'variable_name', 'unit_name']` fields are constant for each unique combination of `['primary_location_id, 'configuration_name']`." |
| 1209 | + ] |
| 1210 | + }, |
1121 | 1211 | { |
1122 | 1212 | "cell_type": "code", |
1123 | 1213 | "execution_count": 20, |
|
0 commit comments