Releases: Flowminder/FlowKit
Releases · Flowminder/FlowKit
1.1.0
Changed
Connection.available_datesis now a property and returns results based on theetl.etl_recordstable. #1873
Fixed
- Fixed the run action blocking the FlowMachine server in some scenarios. #1256
Removed
- Removed
tablesandcolumnsmethods from theConnectionclass in FlowMachine - Removed the
inspectorattribute from theConnectionclass in FlowMachine
1.0.0
Added
- FlowMachine now periodically prunes the cache to below the permitted cache size. #1307
The frequency of this pruning is configurable using theFLOWMACHINE_CACHE_PRUNING_FREQUENCYenvironment variable to Flowmachine, and queries are excluded from being removed by the automatic shrinker based on thecache_protected_periodconfig key within FlowDB. - FlowDB now includes Paul Ramsey's OGR foreign data wrapper, for easy loading of GIS data. #1512
- FlowETL now allows all configuration options to be set using docker secrets. #1515
- Added a new component, AutoFlow, to automate running Jupyter notebooks when new data is added to FlowDB. #1570
FLOWETL_INTEGRATION_TESTS_SAVE_AIRFLOW_LOGSenvironment variable added to allow copying the Airflow logs in FlowETL integration tests into the /mounts/logs directory for debugging. #1019- Added new
IterativeMedianFilterquery to Flowmachine, which applies an iterative median filter to the output of another query. #1339 - FlowDB now includes the TDS foreign data wrapper. #1729
- Added contributing and support instructions. #1791
- New FlowETL module installable via pip to aid in ETL dag creation.
Changed
- FlowDB is now built on PostgreSQL 12 #1396 and PostGIS 3.
- FlowETL is now built on Airflow 10.1.6.
- FlowETL now defaults to disabling Airflow's REST API, and enables RBAC for the webui. #1516
- FlowETL now requires that the
FLOWETL_AIRFLOW_ADMIN_USERNAMEandFLOWETL_AIRFLOW_ADMIN_PASSWORDenvironment variables be set, which specify the default web ui account. #1516 - FlowAPI will no longer return a result for rows in spatial aggregate, joined spatial aggregate, flows, total events, meaningful locations aggregate, meaningful locations od, or unique subscriber count where the aggregate would contain less than 16 sims. #1026
- FlowETL now requires that
AIRFLOW__CORE__SQL_ALCHEMY_CONNbe provided as an environment variable or secret. #1702, #1703 - FlowAuth now records last used two-factor authentication codes in an expiring cache, which supports either a file-based, or redis backend. #1173
- AutoFlow now uses Bundler to manage Ruby dependencies.
- The
end_dateparameter offlowclient.modal_location_from_datesnow refers to the day after the final date included in the range, so is now consistent with other queries that have start/end date parameters. #819 - Date intervals in AutoFlow date stencils are now interpreted as half-open intervals (i.e. including start date, excluding end date), for consistency with date ranges elsewhere in FlowKit.
flowmachineuser now has read access to ETL metadata tables in FlowDB
Fixed
- Quickstart should no longer fail on systems which do not include the
netstattool. #1472 - Fixed an error that prevented FlowAuth admin users from resetting users' passwords using the FlowAuth UI. #1635
- The 'Cancel' button on the FlowAuth 'New User' form no longer submits the form. #1636
- FlowAuth backend now sends a meaningful 400 response when trying to create a user with an empty password. #1637
- Usernames of deleted users can now be re-used as usernames for new users. #1638
- RedactedJoinedSpatialAggregate now only redacts rows with too few subscribers. #1747
- FlowDB now uses a more conservative default setting for
tcp_keepalives_idleof 10 minutes, to avoid connections being killed after 15 minutes when running in a docker swarm. #1771 - Aggregation units and api routes can now be added to servers. #1815
- Fixed several issues with FlowETL. #1529 #1499 #1498 #1497
Removed
- Removed pg_cron.
0.9.1
Added
- Added new
DistanceSeriesquery to Flowmachine, which produces per-subscriber time series of distance from a reference point. #1313 - Added new
ImputedDistanceSeriesquery to Flowmachine, which produces contiguous per-subscriber time series of distance from a reference point by filling in gaps using the rolling median. #1337
Fixed
- The FlowETL config file is now always validated, avoiding runtime errors if a config setting is wrong or missing. #1375
- FlowETL now only creates DAGs for CDR types which are present in the config, leading to a better user experience in the Airflow UI. #1376
- The
concurrencysettings in the FlowETL config are no longer ignored. #1378 - The FlowETL deployment example has been updated so that it no longer fails due to a missing foreign data wrapper for the available CDR dates. #1379
- Fixed error when editing a user in FlowAuth who did not have two factor enabled. #1374
- Fixed not being able to enable a newly added api route on existing servers in FlowAuth. #1373
Removed
- The
default_argssection in the FlowETL config file has been removed. #1377
0.9.0
Added
- FlowAuth now makes version information available at
/versionand displays it in the web ui. #835 - FlowETL now comes with a deployment example (in
flowetl/deployment_example/). #1126 - FlowETL now allows to run supplementary post-ETL queries. #989
- Random sampling is now exposed via the API, for all non-aggregated query kinds. #1007
- New aggregate added to FlowMachine -
HistogramAggregation, which constructs histograms over the results of other queries. #1075 - New
IntereventIntervalquery class - returns stats over the gap between events as a time interval. - Added submodule
flowmachine.core.dependency_graph, which contains functions related to creating or using query dependency graphs (previously these were inutils.py). - New config option
sql_find_available_datesin FlowETL to provide SQL code to determine the available dates. #1295
Changed
- FlowDB is now based on PostgreSQL 11.5 and PostGIS 2.5.3
- When running queries through FlowAPI, the query's dependencies will also be cached by default. This behaviour can be switched off by setting
FLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true. #1152 NewSubscribersnow takes a pair ofUniqueSubscribersqueries instead of the arguments to them- Flowmachine's default random sampling method is now
random_idsrather than the non-reproduciblesystem_rows. #1263 IntereventPeriodnow returns stats over the gap between events in fractional time units, instead of time intervals. #1265- Attempting to store a query that does not have a standard table name (e.g.
EventTableSubsetor unseeded random sample) will now raise anUnstorableQueryErrorinstead ofValueError. - In the FlowETL deployment example, the external ingestion database is now set up separately from the FlowKit components and connected to FlowDB via a docker overlay network. #1276
- The
md5attribute of theQueryclass has been renamed toquery_id#1288. DistanceMatrixno longer returns duplicate rows for the lon-lat spatial unit.- Previously,
Displacementdefaulted to returningNaNfor subscribers who have a location in the reference location but were not seen in the time period for the displacement query. These subscribers are no longer returned unless thereturn_subscribers_not_seenargument is set toTrue. PopulationWeightedOpportunitiesis now available underflowmachine.features.location, instead offlowmachine.modelsPopulationWeightedOpportunitiesno longer supports erroring with incomplete per-location departure rate vectors and will instead omit any locations not included from the resultsPopulationWeightedOpportunitiesno longer requires use of therun()method
Fixed
- Quickstart will no longer fail if it has been run previously with a different FlowDB data size and not explicitly shut down. #900
Removed
- Flowmachine's
subscriber_locations_clusterfunction has been removed - useHartiganClusterorMeaningfulLocationsdirectly. - FlowAPI no longer supports the non-reproducible random sampling method
system_rows. #1263
0.8.0
Added
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes event counts. #992
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up amount. #967
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes nocturnal events. #1025
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up balance. #968
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes displacement. #1010
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes pareto interactions. #1012
- FlowETL now supports ingesting from a postgres table in addition to CSV files. #1027
FLOWETL_RUNTIME_CONFIGenvironment variable added to control which DAG definitions the FlowETL integration tests should use (valid values: "testing", "production").FLOWETL_INTEGRATION_TESTS_DISABLE_PULLING_DOCKER_IMAGESenvironment variable added to allow running the FlowETL integration tests against locally built docker images during development.- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes handset. #1011 and #1029
JoinedSpatialAggregatenow supports "distr" stats which computes outputs the relative distribution of the passed metrics.- Added
SubscriberHandsetCharacteristicto FlowMachine
Changed
- The flowdb containers for test_data and synthetic_data were split into two separate containers and quick_start.sh downloads the docker-compose files to a new temporary directory on each run. #843
- Flowmachine now returns more informative error messages when query parameter validation fails. #1055
Removed
TESTINGenvironment variable was removed (previously used by the FlowETL integration tests).- Removed
SubscriberPhoneTypefrom FlowMachine to avoid redundancy.
0.7.0
Added
PRIVATE_JWT_SIGNING_KEYenvironment variable/secret added to FlowAuth, which should be a PEM encoded RSA private key, optionally base64 encoded if supplied as an environment variable.PUBLIC_JWT_SIGNING_KEYenvironment variable/secret added to FlowAPI, which should be a PEM encoded RSA public key, optionally base64 encoded if supplied as an environment variable.- The dev provisioning Ansible playbook now automatically generates an SSH key pair for the
flowkituser. #892 - Added new classes to represent spatial units in FlowMachine.
- Added a
Geographyquery class, to get geography data for a spatial unit. - FlowAPI's 'joined_spatial_aggregate' endpoint now exposes unique location counts.#949
- FlowAPI's 'joined_spatial_aggregate' endpoint now exposes subscriber degree.#969
- Flowdb now contains an auxiliary table to record outcomes of queries that can be run as part of the regular ETL process #988
Changed
- The quick-start script now only pulls the docker images for the services that are actually started up. #898
- FlowAuth and FlowAPI are now linked using an RSA keypair, instead of per-server shared secrets. #89
- Location-related FlowMachine queries now take a
spatial_unitparameter instead oflevel. - The quick-start script now uses the environment variable
GIT_REVISIONto control the version to be deployed. - Create token page permission and spatial aggregation checkboxes are now hidden by default.#834
- The flowetl mounted directories
archive, dump, ingest, quarantinewere replaced with a singlefilesdirectory and files are no longer moved. #946 - FlowDB's postgresql has been updated to 11.4, which addresses several bugs and one major vulnerability.
Fixed
- When creating a new token in FlowAuth, the expiry now always shows the year, seconds till expiry, and timezone. #260
- Distances in
Displacementare now calculated with longitude and latitude the corrcet way around. #913 - The quick-start script now works correctly with branches. #902
- Fixed
location_event_countsfailing to work when specifying a subset of event types #1015 - FlowAPI will now show the correct version in the API spec, flowmachine and flowclient will show the correct versions in the worked examples. #818
Removed
-
Removed
cell_mappings.py,get_columns_for_levelandBadLevelError. -
JWT_SECRET_KEYhas been removed in favour of RSA keys. -
The FlowDB tables
infrastructure.countriesandinfrastructure.operatorshave been removed. #958
0.6.4
Added
- Buttons to copy token to clipboard and download token as file added to token list page.#704
- Two new worked examples: "Cell Towers Per Region" and "Unique Subscriber Counts". #633, #634
Changed
- The
FLOWDB_DEBUGenvironment variable has been renamed toFLOWDB_ENABLE_POSTGRES_DEBUG_MODE. - FlowAuth will now automatically set up the database when started without needing to trigger via the cli.
- FlowAuth now requires that at least one administrator account is created by providing env vars or secrets for:
FLOWAUTH_ADMIN_PASSWORDFLOWAUTH_ADMIN_USERNAME
Fixed
- The
FLOWDB_DEBUGenvironment variable used to have no effect. This has been fixed. #811 - Previously, queries could be stuck in an executing state if writing their cache metadata failed, they will now correctly show as having errored. #833
- Fixed an issue where
Tableobjects could be in an inconsistent cache state after resetting cache #832 - FlowAuth's docker container can now be used with a Postgres backing database. #825
- FlowAPI now starts up successfully when following the "Secrets Quickstart" instructions in the docs. #836
- The command to generate an SSL certificate in the "Secrets Quickstart" section in the docs has been fixed and made more robust #837
- FlowAuth will no longer try to initialise the database or create demo data multiple times when running under uwsgi with multiple workers #844
- Fixed issue of Multiple tokens don't line up on FlowAuth "Tokens" page #849
Removed
- The
FLOWDB_SERVICESenvironment variable has been removed from the toplevel Makefile, so that nowDOCKER_SERVICESis the only environment variable that controls which services are spun up when runningmake up. #827
0.6.3
Added
- FlowKit's worked examples are now Dockerized, and available as part of the quick setup script #614
- Skeleton for Airflow based ETL system added with basic ETL DAG specification and tests.
- The docs now contain information about required versions of installation prerequisites #703
- FlowAPI now requires the
FLOWAPI_IDENTIFIERenvironment variable to be set, which contains the name used to identify this FlowAPI server when generating tokens in FlowAuth #727 flowmachine.utils.calculate_dependency_graphnow includes theQueryobjects in thequery_objectfield of the graph's nodes dictionary #767- Architectural Decision Records (ADR) have been added and are included in the auto-generated docs #780
- Added FlowDB environment variables
SHARED_BUFFERS_SIZEandEFFECTIVE_CACHE_SIZE, to allow manually setting the Postgres configuration parametersshared_buffersandeffective_cache_size.
Changed
- Parameter names in
flowmachine.connect()have been renamed as follows to be consistent with the associated environment variables #728:db_port -> flowdb_portdb_user -> flowdb_userdb_pass -> flowdb_passworddb_host -> flowdb_hostdb_connection_pool_size -> flowdb_connection_pool_sizedb_connection_pool_overflow -> flowdb_connection_pool_overflow
- FlowAPI and FlowAuth now expect an audience key to be present in tokens #727
- Dependent queries are now only included once in the md5 calculation of a given query (in particular, it changes the query ids compared to previous FlowKit versions).
- Error is displayed in the add user form of Flowauth if username is alredy exists. #690
- Error is displayed in the add group form of Flowauth if group name already exists. #709
- FlowAuth's add new server page now shows helper text for bad inputs. #749
- The class
SubscriberSubsetterBasein FlowMachine no longer inherits fromQuery#740 (this changes the query ids compared to previous FlowKit versions).
Fixed
- FlowClient docs rendered to website now show the options available for arguments that require a string from some set of possibilities #695.
- The Flowmachine loggers are now initialised only once when flowmachine is imported, with a call to
connect()only changing the log level #691 - The FERNET_KEY environment variable for FlowAuth is now named FLOWAUTH_FERNET_KEY
- The quick-start script now correctly aborts if one of the FlowKit services doesn't fully start up #745
- The maps in the worked examples docs pages now appear in any browser
- Example invocations of
generate-jwtare no longer uncopyable due to line wrapping #778 - API parameter
intervalforlocation_event_countsqueries is now correctly passed to the underlying FlowMachine query object #807.
0.6.2
Added
- A new Ansible playbook was added in
deployment/provision-dev.yml. In addition to the standard provisioning
this installs pyenv, Python 3.7, pipenv and clones the FlowKit repository, which is useful for development purposes. - Added a 'quick start' setup script for trying out a complete FlowKit system #688.
Changed
- FlowAPI's
available_datesendpoint now always returns available dates for all event types and does not accept JSON - Hints are now displayed in the add user form of FlowAuth if the form is not completed #679
- The Ansible playbooks in
deployment/now allow configuring the username and password for the FlowKit user account. - Default compose file no longer includes build blocks, these have been moved to
docker-compose-build.yml.
Fixed
- FlowDB synthetic data container no longer silently fails to generate data if data generator is not set #654