@@ -9,6 +9,10 @@ After a database is started, it is required to run migration script.
99For empty database, it creates all the required tables and indexes.
1010For non-empty database, it will perform database structure upgrade, using `Alembic <https://alembic.sqlalchemy.org/ >`_.
1111
12+ .. warning ::
13+
14+ Other containers (consumer, server) should be stopped while running migrations, to prevent interference.
15+
1216After migrations are performed, it is required to run script which creates partitions for some tables in the database.
1317By default, it creates monthly partitions, for current and next month. This can be changed by overriding command args.
1418This script should run on schedule, for example by adding a dedicated entry to `crontab <https://help.ubuntu.com/community/CronHowto >`_.
@@ -17,18 +21,16 @@ Along with migrations analytics views are created. By default these materialized
1721In order to fill these tables with data you need to run refresh script. The command for this shown below.
1822Views based on data in ``output `` and ``input `` tables and has such structure:
1923
20- .. code :: text
21-
22- dataset_name - Name of dataset.
23- dataset_location - Name of dataset location (e.g. clusster name).
24- dataset_location_type - Type of dataset location (e.g. hive, hdfs, postgres).
25- user_id - Internal user id.
26- user_name - Internal user name (e.g. name of user which run spark job).
27- last_interaction_dt - Time when user lat time interact with dataset. Read or write depens on base table.
28- num_of_interactions - Number of interactions in given interval.
29- sum_bytes - Sum of bytes in given interval. ``num_bytes`` - column.
30- sum_rows - Sum of rows in given interval. ``num_rows`` - column.
31- sum_files - Sum of files in given interval. ``num_files`` - column.
24+ * ``dataset_name `` - Name of dataset.
25+ * ``dataset_location `` - Name of dataset location (e.g. clusster name).
26+ * ``dataset_location_type `` - Type of dataset location (e.g. hive, hdfs, postgres).
27+ * ``user_id `` - Internal user id.
28+ * ``user_name `` - Internal user name (e.g. name of user which run spark job).
29+ * ``last_interaction_dt `` - Time when user lat time interact with dataset. Read or write depens on base table.
30+ * ``num_of_interactions `` - Number of interactions in given interval.
31+ * ``sum_bytes `` - Sum of bytes in given interval.
32+ * ``sum_rows `` - Sum of rows in given interval.
33+ * ``sum_files `` - Sum of files in given interval.
3234
3335We provide three types of views: ``day ``, ``week `` and ``month ``, based on the time period in which the aggregation occur.
3436Views are created for all intervals and by default, script refresh all views.
@@ -64,7 +66,7 @@ With Docker
6466 .. dropdown :: ``docker-compose.yml``
6567
6668 .. literalinclude :: ../../../docker-compose.yml
67- :emphasize-lines: 1-15,108-110
69+ :emphasize-lines: 1-33,123
6870
6971 .. dropdown :: ``.env.docker``
7072
0 commit comments