Skip to content

Add destinations (statsd)#71

Open
mindartur wants to merge 1 commit intokiwicom:masterfrom
mindartur:mindartur/add_destinations
Open

Add destinations (statsd)#71
mindartur wants to merge 1 commit intokiwicom:masterfrom
mindartur:mindartur/add_destinations

Conversation

@mindartur
Copy link

@mindartur mindartur commented Feb 11, 2020

seems like that this bq library you use doesn’t have some features like overriding project_id
also there is some hard coded postgres specific SQL code
also # of days is hardcoded for time filter - 30 days

Copy link
Contributor

@maroshmka maroshmka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like that this bq library you use doesn’t have some features like overriding project_id

does it block you for using it? is there any solution you see (havent checked myself)

also there is some hard coded postgres specific SQL code

only the timestamp or are there more ?

also # of days is hardcoded for time filter - 30 days

we just kept it there to be backward compatible. you can use List (with 1 object) to adjust number of days - https://contessa.readthedocs.io/en/latest/features.html#time-filter (sry there could be some comment probably..)


def persist(self, items: List[DQBase]) -> None:
for item in items:
if isinstance(item, QualityCheck):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a comment that we do not support ConsistencyCheck or write also the ConsistencyCheck sender?


def persist_quality_check(self, item: QualityCheck) -> None:
tags = [
f"rule_name:{item.rule_name}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not enough for uniqueness. you need to include also table_name, schema_name (or combined) and db_name/db_host.

as you can use contessa with same DD for multiple DBs, tables and schemas.

objs = self.do_quality_checks(quality_check_class, rules, context)

self.conn.upsert(objs)
_ = [destination.persist(objs) for destination in destinations]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to have list comprehension? can we just write normal for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Construct context to pass to executors. User context overrides defaults.
"""
if context is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx 👍

check_table: Dict,
result_table: Dict, # todo - docs for quality name, maybe defaults..
context: Optional[Dict] = None,
destinations: List[Destination] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure about this. If we allow multiple destination, we should ensure consistency. Meaning, if DD send fail, we should rollback DB transaction and vice versa. Which requires more code to handle it.

As I think about this, I would for now allow only 1 destination. Just DD or DB. We(kiwi) will mostly use DB and if we would need, we can just run the same code with DD destination or to have a subscriber on DB level that will send it to DD (which is imho better idea in case you are using DB as destination)

What do you think?

@Hacklad
Copy link

Hacklad commented Jun 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants