Replies: 7 comments 2 replies
-
|
forgot to show the results PostgreSQL connection pool created successfully ============================================================
|
Beta Was this translation helpful? Give feedback.
-
|
Hey @djouallah thanks for reporting. First things first, I think you raise a valid point regarding performance of inlining. Now a couple of comments: Regarding your scriptI get different timings. This could be related to OS and platform, but I sit around 60 transactions/s with 1 worker and 80 with two. Another thing that I notice is that you do the timings in the main program rather than timing each thread. I think if you just want to measure the throughput of one connection then it is better to time inside Regarding DuckLake's performanceAn insertion via Data Inlining in DuckLake will never match a plain Postgres table. For every insertion you do in DuckLake, you are creating a new snapshot, which requires a bunch of queries and subsequent updates/inserts to tables. Another notesIf we can get closer to the pure postgres performance by reducing some of the overhead (let's say we get 3x slower or even less), we would be pretty happy. I think inlining will not only improve insertion speed to lakehouses, but most importantly it solves the small files problem and large compaction jobs. Also, we may consider testing other catalogs in the future (maybe variations of postgres) that can scale better and provide a better baseline. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
Moving this to a discussion since it is not a bug. |
Beta Was this translation helpful? Give feedback.
-
|
now, I have a bigger problem, conflicts and data loss under heavy write use cases |
Beta Was this translation helpful? Give feedback.
-
|
i get it, data inlining works only with insert not update and delete !!! |
Beta Was this translation helpful? Give feedback.
-
|
Actually transaction conflicts is generally a problem that DuckLake has because of the way we handle snapshotting. We are looking into this. But this should already be a bit better in 1.4.4 since we've reduced the amount of roundtrips to postgres (less time for writers to grab conflicting snapshot ids). Data loss is another thing though. I wouldn't expect any data loss, not sure what you mean there. |
Beta Was this translation helpful? Give feedback.
-
|
@guillesd Sorry, I misspoke. What I meant is that if DuckLake is used as a system of record, some transactions under heavy write load may fail to be committed at all. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What happens?
testing data inlining in a local postgres, heap is around 10 x compared to ducklake
To Reproduce
OS:
windows
DuckDB Version:
1.4.3
DuckLake Version:
0.2
DuckDB Client:
python
Hardware:
No response
Full Name:
mim
Affiliation:
personal
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
Beta Was this translation helpful? Give feedback.
All reactions