-
Notifications
You must be signed in to change notification settings - Fork 9
Description
When copying a table with DbToDbOperator, the destination table is created using the data types automatically determined by Pandas based on the data the dataframe contains. As a result, the information on which columns are primary keys and which columns are NULLable or NOT NULL is not copied over.
In many cases this won't be a problem. However, when you are going to further transform data with SQL, this can cause a number of problems:
- the lack of a primary key means that no indices exist, making JOINs very slow and resource consuming, depending on table size;
- the lack of a NOT NULL definition as in the source may also cause problems when JOINing tables, especially when the destination after the transformation (e.g. a datamart that will receive data transformed from stage) has this type on the column.
Without this feature, the problem must be solved by manually creating the DDL for all destination tables. For a small database this may be done, but if the number of tables is large, this may be impractical.
DbToDbOperator should have one or two parameters for adding this information to the destination table, when creating its schema.
Related to #187, as this should happen when generating a destination table.