Conversation
There was a problem hiding this comment.
seems like that this bq library you use doesn’t have some features like overriding project_id
does it block you for using it? is there any solution you see (havent checked myself)
also there is some hard coded postgres specific SQL code
only the timestamp or are there more ?
also # of days is hardcoded for time filter - 30 days
we just kept it there to be backward compatible. you can use List (with 1 object) to adjust number of days - https://contessa.readthedocs.io/en/latest/features.html#time-filter (sry there could be some comment probably..)
|
|
||
| def persist(self, items: List[DQBase]) -> None: | ||
| for item in items: | ||
| if isinstance(item, QualityCheck): |
There was a problem hiding this comment.
Can you write a comment that we do not support ConsistencyCheck or write also the ConsistencyCheck sender?
|
|
||
| def persist_quality_check(self, item: QualityCheck) -> None: | ||
| tags = [ | ||
| f"rule_name:{item.rule_name}", |
There was a problem hiding this comment.
this is not enough for uniqueness. you need to include also table_name, schema_name (or combined) and db_name/db_host.
as you can use contessa with same DD for multiple DBs, tables and schemas.
| objs = self.do_quality_checks(quality_check_class, rules, context) | ||
|
|
||
| self.conn.upsert(objs) | ||
| _ = [destination.persist(objs) for destination in destinations] |
There was a problem hiding this comment.
is there a reason to have list comprehension? can we just write normal for?
There was a problem hiding this comment.
| """ | ||
| Construct context to pass to executors. User context overrides defaults. | ||
| """ | ||
| if context is None: |
| check_table: Dict, | ||
| result_table: Dict, # todo - docs for quality name, maybe defaults.. | ||
| context: Optional[Dict] = None, | ||
| destinations: List[Destination] = None, |
There was a problem hiding this comment.
Im not sure about this. If we allow multiple destination, we should ensure consistency. Meaning, if DD send fail, we should rollback DB transaction and vice versa. Which requires more code to handle it.
As I think about this, I would for now allow only 1 destination. Just DD or DB. We(kiwi) will mostly use DB and if we would need, we can just run the same code with DD destination or to have a subscriber on DB level that will send it to DD (which is imho better idea in case you are using DB as destination)
What do you think?
|
Kiwi
|
seems like that this bq library you use doesn’t have some features like overriding project_id
also there is some hard coded postgres specific SQL code
also # of days is hardcoded for time filter - 30 days