Replies: 10 comments 3 replies
-
|
Can you please add some details which Airflow version you are actually using? Else it is a bit shooting in the dark, what the problem might be. Taking a look to the stack trace it seems that you call some job / trigger a DAG via CLI. Would be very important to classify the bug or understand to know which CLI params you are using. |
Beta Was this translation helpful? Give feedback.
-
|
@jens-scheffler-bosch 2.5.0 |
Beta Was this translation helpful? Give feedback.
-
|
@jens-scheffler-bosch More details on the setup:
DAGS are executed using Airflow API calls by external apps. |
Beta Was this translation helpful? Give feedback.
-
|
I've also just tried to run |
Beta Was this translation helpful? Give feedback.
-
|
@jens-scheffler-bosch forgive me, that's not the scheduler logs, it's the triggerer. The scheduler errors like this: |
Beta Was this translation helpful? Give feedback.
-
|
Triggerer exception as soon as it starts: |
Beta Was this translation helpful? Give feedback.
-
|
Same with scheduler: I do not understand how this is possible after the DB has been completely cleared. |
Beta Was this translation helpful? Give feedback.
-
|
Seems like these are dags executed by the scheduler.. |
Beta Was this translation helpful? Give feedback.
-
|
The way it looks - it's a problem of your Database. You likely try to use some active-active DB (not standard Postgres, but something that builds on top of Postgres). Likely this is some non-standard active-active setup with some load balancing to route your DB connection between those two. - i.e. multiple DB instances, each serving the traffic, each trying to insert its own keys. Airflow (and generally any application that is not build to run in such setup, and specifically any application that is build using SQLAlchemy without specifically accounting for such active-active setup) will fail because different replicas of the DB will attempt to insert the same unique keys when attempting to create a record in a table with automated unique key generated by SqlAlchemy. Airflow will only work with Active-Passive high availability DB - i.e one that you have only one node handling traffic at a time and the second is a replica, which can be used as a failover. The solution to your problem is to have a database that does not have active-active setup. BTW. Using Airflow 2.5.0 is a bad idea, there were probably few dosens bugfixes in 2.5 line alone (2.5.3 was release quite a few months ago) so using 2.5.0 when those fixes are out there makes very little sense, because you do not give yourself a chance to make use of those bug-fixes that were implemented (even if you do not want to switch to latest 2.6.* which I woudl heartily recommend). Converting it into discussion to continue discussion there because it is unlikely to be an airflow issue. @ginwakeup -> I am curious to see later what kind of database solution you use, almost for certain this is the reason of your problem. |
Beta Was this translation helpful? Give feedback.
-
|
@potiuk thank you for the detailed reply, I was wondering if this was caused by postgres setup indeed. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
Airflow keeps polluting the postgres DB. I cannot understand what causes this but it happens in our cluster every 3/4 weeks.
At some point the triggerer stops working and throw errors such as these:
I cannot understand what is causing it and it forces me to erase the DB every time to make it work again.
Nothing changed on the DAGs at all, this was all working since 3 days ago and it just started failing on DB data. Any idea?
What you think should happen instead
No response
How to reproduce
I don't know.
Operating System
Ubuntu 20.04
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
Deployed on Kubernetes, version 2.5
6 replicas for each component
db backend: postgres 15
Anything else
No response
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions