Skip to content

Releases: Flowminder/FlowKit

1.1.0

17 Feb 09:30

Choose a tag to compare

Changed

  • Connection.available_dates is now a property and returns results based on the etl.etl_records table. #1873

Fixed

  • Fixed the run action blocking the FlowMachine server in some scenarios. #1256

Removed

  • Removed tables and columns methods from the Connection class in FlowMachine
  • Removed the inspector attribute from the Connection class in FlowMachine

1.0.0

27 Jan 19:53

Choose a tag to compare

Added

  • FlowMachine now periodically prunes the cache to below the permitted cache size. #1307
    The frequency of this pruning is configurable using the FLOWMACHINE_CACHE_PRUNING_FREQUENCY environment variable to Flowmachine, and queries are excluded from being removed by the automatic shrinker based on the cache_protected_period config key within FlowDB.
  • FlowDB now includes Paul Ramsey's OGR foreign data wrapper, for easy loading of GIS data. #1512
  • FlowETL now allows all configuration options to be set using docker secrets. #1515
  • Added a new component, AutoFlow, to automate running Jupyter notebooks when new data is added to FlowDB. #1570
  • FLOWETL_INTEGRATION_TESTS_SAVE_AIRFLOW_LOGS environment variable added to allow copying the Airflow logs in FlowETL integration tests into the /mounts/logs directory for debugging. #1019
  • Added new IterativeMedianFilter query to Flowmachine, which applies an iterative median filter to the output of another query. #1339
  • FlowDB now includes the TDS foreign data wrapper. #1729
  • Added contributing and support instructions. #1791
  • New FlowETL module installable via pip to aid in ETL dag creation.

Changed

  • FlowDB is now built on PostgreSQL 12 #1396 and PostGIS 3.
  • FlowETL is now built on Airflow 10.1.6.
  • FlowETL now defaults to disabling Airflow's REST API, and enables RBAC for the webui. #1516
  • FlowETL now requires that the FLOWETL_AIRFLOW_ADMIN_USERNAME and FLOWETL_AIRFLOW_ADMIN_PASSWORD environment variables be set, which specify the default web ui account. #1516
  • FlowAPI will no longer return a result for rows in spatial aggregate, joined spatial aggregate, flows, total events, meaningful locations aggregate, meaningful locations od, or unique subscriber count where the aggregate would contain less than 16 sims. #1026
  • FlowETL now requires that AIRFLOW__CORE__SQL_ALCHEMY_CONN be provided as an environment variable or secret. #1702, #1703
  • FlowAuth now records last used two-factor authentication codes in an expiring cache, which supports either a file-based, or redis backend. #1173
  • AutoFlow now uses Bundler to manage Ruby dependencies.
  • The end_date parameter of flowclient.modal_location_from_dates now refers to the day after the final date included in the range, so is now consistent with other queries that have start/end date parameters. #819
  • Date intervals in AutoFlow date stencils are now interpreted as half-open intervals (i.e. including start date, excluding end date), for consistency with date ranges elsewhere in FlowKit.
  • flowmachine user now has read access to ETL metadata tables in FlowDB

Fixed

  • Quickstart should no longer fail on systems which do not include the netstat tool. #1472
  • Fixed an error that prevented FlowAuth admin users from resetting users' passwords using the FlowAuth UI. #1635
  • The 'Cancel' button on the FlowAuth 'New User' form no longer submits the form. #1636
  • FlowAuth backend now sends a meaningful 400 response when trying to create a user with an empty password. #1637
  • Usernames of deleted users can now be re-used as usernames for new users. #1638
  • RedactedJoinedSpatialAggregate now only redacts rows with too few subscribers. #1747
  • FlowDB now uses a more conservative default setting for tcp_keepalives_idle of 10 minutes, to avoid connections being killed after 15 minutes when running in a docker swarm. #1771
  • Aggregation units and api routes can now be added to servers. #1815
  • Fixed several issues with FlowETL. #1529 #1499 #1498 #1497

Removed

  • Removed pg_cron.

0.9.1

10 Oct 11:01

Choose a tag to compare

Added

  • Added new DistanceSeries query to Flowmachine, which produces per-subscriber time series of distance from a reference point. #1313
  • Added new ImputedDistanceSeries query to Flowmachine, which produces contiguous per-subscriber time series of distance from a reference point by filling in gaps using the rolling median. #1337

Fixed

  • The FlowETL config file is now always validated, avoiding runtime errors if a config setting is wrong or missing. #1375
  • FlowETL now only creates DAGs for CDR types which are present in the config, leading to a better user experience in the Airflow UI. #1376
  • The concurrency settings in the FlowETL config are no longer ignored. #1378
  • The FlowETL deployment example has been updated so that it no longer fails due to a missing foreign data wrapper for the available CDR dates. #1379
  • Fixed error when editing a user in FlowAuth who did not have two factor enabled. #1374
  • Fixed not being able to enable a newly added api route on existing servers in FlowAuth. #1373

Removed

  • The default_args section in the FlowETL config file has been removed. #1377

0.9.0

01 Oct 10:24

Choose a tag to compare

Added

  • FlowAuth now makes version information available at /version and displays it in the web ui. #835
  • FlowETL now comes with a deployment example (in flowetl/deployment_example/). #1126
  • FlowETL now allows to run supplementary post-ETL queries. #989
  • Random sampling is now exposed via the API, for all non-aggregated query kinds. #1007
  • New aggregate added to FlowMachine - HistogramAggregation, which constructs histograms over the results of other queries. #1075
  • New IntereventInterval query class - returns stats over the gap between events as a time interval.
  • Added submodule flowmachine.core.dependency_graph, which contains functions related to creating or using query dependency graphs (previously these were in utils.py).
  • New config option sql_find_available_dates in FlowETL to provide SQL code to determine the available dates. #1295

Changed

  • FlowDB is now based on PostgreSQL 11.5 and PostGIS 2.5.3
  • When running queries through FlowAPI, the query's dependencies will also be cached by default. This behaviour can be switched off by setting FLOWMACHINE_SERVER_DISABLE_DEPENDENCY_CACHING=true. #1152
  • NewSubscribers now takes a pair of UniqueSubscribers queries instead of the arguments to them
  • Flowmachine's default random sampling method is now random_ids rather than the non-reproducible system_rows. #1263
  • IntereventPeriod now returns stats over the gap between events in fractional time units, instead of time intervals. #1265
  • Attempting to store a query that does not have a standard table name (e.g. EventTableSubset or unseeded random sample) will now raise an UnstorableQueryError instead of ValueError.
  • In the FlowETL deployment example, the external ingestion database is now set up separately from the FlowKit components and connected to FlowDB via a docker overlay network. #1276
  • The md5 attribute of the Query class has been renamed to query_id #1288.
  • DistanceMatrix no longer returns duplicate rows for the lon-lat spatial unit.
  • Previously, Displacement defaulted to returning NaN for subscribers who have a location in the reference location but were not seen in the time period for the displacement query. These subscribers are no longer returned unless the return_subscribers_not_seen argument is set to True.
  • PopulationWeightedOpportunities is now available under flowmachine.features.location, instead of flowmachine.models
  • PopulationWeightedOpportunities no longer supports erroring with incomplete per-location departure rate vectors and will instead omit any locations not included from the results
  • PopulationWeightedOpportunities no longer requires use of the run() method

Fixed

  • Quickstart will no longer fail if it has been run previously with a different FlowDB data size and not explicitly shut down. #900

Removed

  • Flowmachine's subscriber_locations_cluster function has been removed - use HartiganCluster or MeaningfulLocations directly.
  • FlowAPI no longer supports the non-reproducible random sampling method system_rows. #1263

0.8.0

06 Aug 08:43

Choose a tag to compare

Added

  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes event counts. #992
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up amount. #967
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes nocturnal events. #1025
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes top-up balance. #968
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes displacement. #1010
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes pareto interactions. #1012
  • FlowETL now supports ingesting from a postgres table in addition to CSV files. #1027
  • FLOWETL_RUNTIME_CONFIG environment variable added to control which DAG definitions the FlowETL integration tests should use (valid values: "testing", "production").
  • FLOWETL_INTEGRATION_TESTS_DISABLE_PULLING_DOCKER_IMAGES environment variable added to allow running the FlowETL integration tests against locally built docker images during development.
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes handset. #1011 and #1029
  • JoinedSpatialAggregate now supports "distr" stats which computes outputs the relative distribution of the passed metrics.
  • Added SubscriberHandsetCharacteristic to FlowMachine

Changed

  • The flowdb containers for test_data and synthetic_data were split into two separate containers and quick_start.sh downloads the docker-compose files to a new temporary directory on each run. #843
  • Flowmachine now returns more informative error messages when query parameter validation fails. #1055

Removed

  • TESTING environment variable was removed (previously used by the FlowETL integration tests).
  • Removed SubscriberPhoneType from FlowMachine to avoid redundancy.

0.7.0

01 Jul 16:49

Choose a tag to compare

Added

  • PRIVATE_JWT_SIGNING_KEY environment variable/secret added to FlowAuth, which should be a PEM encoded RSA private key, optionally base64 encoded if supplied as an environment variable.
  • PUBLIC_JWT_SIGNING_KEY environment variable/secret added to FlowAPI, which should be a PEM encoded RSA public key, optionally base64 encoded if supplied as an environment variable.
  • The dev provisioning Ansible playbook now automatically generates an SSH key pair for the flowkit user. #892
  • Added new classes to represent spatial units in FlowMachine.
  • Added a Geography query class, to get geography data for a spatial unit.
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes unique location counts.#949
  • FlowAPI's 'joined_spatial_aggregate' endpoint now exposes subscriber degree.#969
  • Flowdb now contains an auxiliary table to record outcomes of queries that can be run as part of the regular ETL process #988

Changed

  • The quick-start script now only pulls the docker images for the services that are actually started up. #898
  • FlowAuth and FlowAPI are now linked using an RSA keypair, instead of per-server shared secrets. #89
  • Location-related FlowMachine queries now take a spatial_unit parameter instead of level.
  • The quick-start script now uses the environment variable GIT_REVISION to control the version to be deployed.
  • Create token page permission and spatial aggregation checkboxes are now hidden by default.#834
  • The flowetl mounted directories archive, dump, ingest, quarantine were replaced with a single files directory and files are no longer moved. #946
  • FlowDB's postgresql has been updated to 11.4, which addresses several bugs and one major vulnerability.

Fixed

  • When creating a new token in FlowAuth, the expiry now always shows the year, seconds till expiry, and timezone. #260
  • Distances in Displacement are now calculated with longitude and latitude the corrcet way around. #913
  • The quick-start script now works correctly with branches. #902
  • Fixed location_event_counts failing to work when specifying a subset of event types #1015
  • FlowAPI will now show the correct version in the API spec, flowmachine and flowclient will show the correct versions in the worked examples. #818

Removed

  • Removed cell_mappings.py, get_columns_for_level and BadLevelError.

  • JWT_SECRET_KEY has been removed in favour of RSA keys.

  • The FlowDB tables infrastructure.countries and infrastructure.operators have been removed. #958

0.6.4

04 Jun 11:03

Choose a tag to compare

Added

  • Buttons to copy token to clipboard and download token as file added to token list page.#704
  • Two new worked examples: "Cell Towers Per Region" and "Unique Subscriber Counts". #633, #634

Changed

  • The FLOWDB_DEBUG environment variable has been renamed to FLOWDB_ENABLE_POSTGRES_DEBUG_MODE.
  • FlowAuth will now automatically set up the database when started without needing to trigger via the cli.
  • FlowAuth now requires that at least one administrator account is created by providing env vars or secrets for:
    • FLOWAUTH_ADMIN_PASSWORD
    • FLOWAUTH_ADMIN_USERNAME

Fixed

  • The FLOWDB_DEBUG environment variable used to have no effect. This has been fixed. #811
  • Previously, queries could be stuck in an executing state if writing their cache metadata failed, they will now correctly show as having errored. #833
  • Fixed an issue where Table objects could be in an inconsistent cache state after resetting cache #832
  • FlowAuth's docker container can now be used with a Postgres backing database. #825
  • FlowAPI now starts up successfully when following the "Secrets Quickstart" instructions in the docs. #836
  • The command to generate an SSL certificate in the "Secrets Quickstart" section in the docs has been fixed and made more robust #837
  • FlowAuth will no longer try to initialise the database or create demo data multiple times when running under uwsgi with multiple workers #844
  • Fixed issue of Multiple tokens don't line up on FlowAuth "Tokens" page #849

Removed

  • The FLOWDB_SERVICES environment variable has been removed from the toplevel Makefile, so that now DOCKER_SERVICES is the only environment variable that controls which services are spun up when running make up. #827

0.6.3

17 May 18:29
9b25aaa

Choose a tag to compare

Added

  • FlowKit's worked examples are now Dockerized, and available as part of the quick setup script #614
  • Skeleton for Airflow based ETL system added with basic ETL DAG specification and tests.
  • The docs now contain information about required versions of installation prerequisites #703
  • FlowAPI now requires the FLOWAPI_IDENTIFIER environment variable to be set, which contains the name used to identify this FlowAPI server when generating tokens in FlowAuth #727
  • flowmachine.utils.calculate_dependency_graph now includes the Query objects in the query_object field of the graph's nodes dictionary #767
  • Architectural Decision Records (ADR) have been added and are included in the auto-generated docs #780
  • Added FlowDB environment variables SHARED_BUFFERS_SIZE and EFFECTIVE_CACHE_SIZE, to allow manually setting the Postgres configuration parameters shared_buffers and effective_cache_size.

Changed

  • Parameter names in flowmachine.connect() have been renamed as follows to be consistent with the associated environment variables #728:
    • db_port -> flowdb_port
    • db_user -> flowdb_user
    • db_pass -> flowdb_password
    • db_host -> flowdb_host
    • db_connection_pool_size -> flowdb_connection_pool_size
    • db_connection_pool_overflow -> flowdb_connection_pool_overflow
  • FlowAPI and FlowAuth now expect an audience key to be present in tokens #727
  • Dependent queries are now only included once in the md5 calculation of a given query (in particular, it changes the query ids compared to previous FlowKit versions).
  • Error is displayed in the add user form of Flowauth if username is alredy exists. #690
  • Error is displayed in the add group form of Flowauth if group name already exists. #709
  • FlowAuth's add new server page now shows helper text for bad inputs. #749
  • The class SubscriberSubsetterBase in FlowMachine no longer inherits from Query #740 (this changes the query ids compared to previous FlowKit versions).

Fixed

  • FlowClient docs rendered to website now show the options available for arguments that require a string from some set of possibilities #695.
  • The Flowmachine loggers are now initialised only once when flowmachine is imported, with a call to connect() only changing the log level #691
  • The FERNET_KEY environment variable for FlowAuth is now named FLOWAUTH_FERNET_KEY
  • The quick-start script now correctly aborts if one of the FlowKit services doesn't fully start up #745
  • The maps in the worked examples docs pages now appear in any browser
  • Example invocations of generate-jwt are no longer uncopyable due to line wrapping #778
  • API parameter interval for location_event_counts queries is now correctly passed to the underlying FlowMachine query object #807.

0.6.2

01 May 12:43

Choose a tag to compare

Added

  • A new Ansible playbook was added in deployment/provision-dev.yml. In addition to the standard provisioning
    this installs pyenv, Python 3.7, pipenv and clones the FlowKit repository, which is useful for development purposes.
  • Added a 'quick start' setup script for trying out a complete FlowKit system #688.

Changed

  • FlowAPI's available_dates endpoint now always returns available dates for all event types and does not accept JSON
  • Hints are now displayed in the add user form of FlowAuth if the form is not completed #679
  • The Ansible playbooks in deployment/ now allow configuring the username and password for the FlowKit user account.
  • Default compose file no longer includes build blocks, these have been moved to docker-compose-build.yml.

Fixed

  • FlowDB synthetic data container no longer silently fails to generate data if data generator is not set #654

0.6.1

24 Apr 11:47

Choose a tag to compare

Fixed

  • Fixed TotalNetworkObjects raising an error when run with a lat-long level #108
  • Radius of gyration no longer incorrectly appears as a top level api query