Skip to content

Commit 12fb378

Browse files
Merge remote-tracking branch 'delta-io/master' into spark-4.0-upgrade-merge
2 parents 3bfc70e + 4619af7 commit 12fb378

File tree

15 files changed

+555
-392
lines changed

15 files changed

+555
-392
lines changed

kernel/kernel-defaults/src/test/java/io/delta/kernel/defaults/client/TestDefaultFileSystemClient.java

Lines changed: 0 additions & 71 deletions
This file was deleted.

kernel/kernel-defaults/src/test/java/io/delta/kernel/defaults/client/TestDefaultJsonHandler.java

Lines changed: 0 additions & 215 deletions
This file was deleted.

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeletionVectorSuite.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ package io.delta.kernel.defaults
1818
import io.delta.golden.GoldenTableUtils.goldenTablePath
1919

2020
import io.delta.kernel.defaults.utils.{TestRow, TestUtils}
21-
import io.delta.kernel.defaults.utils.DefaultKernelTestUtils.getTestResourceFilePath
2221
import org.apache.hadoop.conf.Configuration
2322
import org.scalatest.funsuite.AnyFunSuite
2423

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/PartitionPruningSuite.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,14 @@
1616
package io.delta.kernel.defaults
1717

1818
import java.math.{BigDecimal => BigDecimalJ}
19+
1920
import scala.collection.JavaConverters._
21+
2022
import io.delta.golden.GoldenTableUtils.goldenTablePath
2123
import io.delta.kernel.defaults.utils.{TestRow, TestUtils}
2224
import io.delta.kernel.expressions.{Column, Expression, Predicate}
2325
import io.delta.kernel.expressions.Literal._
2426
import io.delta.kernel.types._
25-
import org.apache.spark.sql.catalyst.plans.SQLHelper
2627
import org.scalatest.funsuite.AnyFunSuite
2728

2829
class PartitionPruningSuite extends AnyFunSuite with TestUtils {

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/internal/expressions/ExpressionSuiteBase.scala

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,6 @@
1515
*/
1616
package io.delta.kernel.defaults.internal.expressions
1717

18-
import java.util
19-
2018
import io.delta.kernel.data.{ColumnarBatch, ColumnVector}
2119
import io.delta.kernel.defaults.internal.data.DefaultColumnarBatch
2220
import io.delta.kernel.defaults.utils.{TestUtils, VectorTestUtils}

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ import io.delta.kernel.internal.data.ScanStateRow
3535
import io.delta.kernel.internal.util.ColumnMapping
3636
import io.delta.kernel.internal.util.Utils.singletonCloseableIterator
3737
import io.delta.kernel.types._
38-
import io.delta.kernel.utils.{CloseableIterator, DataFileStatus}
38+
import io.delta.kernel.utils.CloseableIterator
3939
import org.apache.hadoop.conf.Configuration
4040
import org.apache.hadoop.shaded.org.apache.commons.io.FileUtils
4141
import org.apache.spark.sql.SparkSession

protocol_rfcs/managed-commits.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ are responsible to define the commit atomicity and backfill protocols which the
143143

144144
At a high level, the `commit-owner` needs to provide:
145145
- API to atomically commit a version `x` with given set of `actions`. This is explained in detail in the [commit protocol](#commit-protocol) section.
146-
- API to retrieve information about the recent commits on the table. This is explained in detail in the [getting un-backfilled commits from commit-owner](#getting-un-backfilled-commits-from-commit-owner) section.
146+
- API to retrieve information about the recent commits and the latest ratified version on the table. This is explained in detail in the [getting un-backfilled commits from commit-owner](#getting-un-backfilled-commits-from-commit-owner) section.
147147

148148
### Commit Protocol
149149

@@ -161,7 +161,14 @@ Even after a commit succeeds, Delta clients can only discover the commit through
161161
have no way to determine which file in `_delta_log/_commits` directory corresponds to the actual commit `v`.
162162

163163
The commit-owner is responsible to implement an API (defined by the Delta client) that Delta clients can use to retrieve information about un-backfilled commits maintained
164-
by the commit-owner. Delta clients who are unaware of the commit-owner (or unwilling to talk to it), may not see recent un-backfilled commits and thus may encounter stale reads.
164+
by the commit-owner. The API must also return the latest version of the table ratified by the commit-owner (if any).
165+
Providing the latest ratified table version helps address potential race conditions between listing commits and contacting the commit-owner.
166+
For example, if a client performs a listing before a recently ratified commit is backfilled, and then contacts the commit-owner after the backfill completes,
167+
the commit-owner may return an empty list of un-backfilled commits. Without knowing the latest ratified version, the client might incorrectly assume their listing was complete
168+
and read a stale snapshot.
169+
170+
Delta clients who are unaware of the commit-owner (or unwilling to talk to it), may not see recent un-backfilled commits and thus may encounter stale reads.
171+
165172

166173
## Sample Commit Owner API
167174

@@ -176,7 +183,7 @@ interface CommitStore {
176183
* @param version The version we want to commit.
177184
* @param actions Actions that need to be committed.
178185
*
179-
* returns CommitResponse which has details around the new committed delta file.
186+
* @return CommitResponse which has details around the new committed delta file.
180187
*/
181188
def commit(
182189
version: Long,
@@ -191,13 +198,16 @@ interface CommitStore {
191198
* Note that the first version returned by this API may not be equal to the `startVersion`. This
192199
* happens when few versions starting from `startVersion` are already backfilled and so
193200
* CommitStore may have stopped tracking them.
201+
* The returned latestTableVersion is the maximum commit version ratified by the Commit-Owner.
202+
* Note that returning latestTableVersion as -1 is acceptable only if the commit-owner never
203+
* ratified any version i.e. it never accepted any un-backfilled commit.
194204
*
195-
* @return a list of `Commit` which are tracked by commit-owner.
196-
*
205+
* @return GetCommitsResponse which contains a list of `Commit`s and the latestTableVersion
206+
* tracked by the commit-owner.
197207
*/
198208
def getCommits(
199209
startVersion: Long,
200-
endVersion: Long): Seq[Commit]
210+
endVersion: Long): GetCommitsResponse
201211

202212
/**
203213
* API to ask the commit-owner to backfill all commits <= given `version`.

0 commit comments

Comments
 (0)