Skip to content

Commit 123ad79

Browse files
Version v0.2.0 (#13)
* Fix licensing Signed-off-by: teodordelibasic-db <[email protected]> * Fix generate_proto Signed-off-by: teodordelibasic-db <[email protected]> * Fix logging Signed-off-by: teodordelibasic-db <[email protected]> * Update changelogs Signed-off-by: teodordelibasic-db <[email protected]> * Bump to version v0.2.0 Signed-off-by: teodordelibasic-db <[email protected]> --------- Signed-off-by: teodordelibasic-db <[email protected]>
1 parent 8c5dfb9 commit 123ad79

File tree

7 files changed

+120
-42
lines changed

7 files changed

+120
-42
lines changed

CHANGELOG.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,57 @@
11
# Version changelog
22

3+
## Release v0.2.0
4+
5+
### New Features and Improvements
6+
7+
- Loosened protobuf dependency constraint to support versions >= 4.25.0 and < 7.0
8+
- **JSON Serialization Support**: Added support for JSON record serialization alongside Protocol Buffers (default)
9+
- New `RecordType.JSON` mode for ingesting JSON-encoded strings
10+
- No protobuf schema compilation required
11+
- Added `HeadersProvider` abstraction for flexible authentication strategies
12+
- Implemented `OAuthHeadersProvider` for OAuth 2.0 Client Credentials flow (default authentication method used by `create_stream()`)
13+
14+
### Bug Fixes
15+
16+
- **generate_proto tool**: Fixed uppercase field names bug for nested fields
17+
- **generate_proto tool**: Added validation for unsupported nested type combinations
18+
- Now properly rejects: `array<array<...>>`, `array<map<...>>`, `map<map<...>, ...>`, `map<array<...>, ...>`, `map<..., map<...>>`, `map<..., array<...>>`
19+
- **Logging**: Fixed false alarm "Retriable gRPC error" logs when calling `stream.close()`
20+
- CANCELLED errors during intentional stream closure are no longer logged as errors
21+
- **Logging**: Unified log messages between sync and async SDK implementations
22+
- Both SDKs now produce consistent logging output with same verbosity and format
23+
- **Error handling**: Improved error messages to distinguish between recoverable and non-recoverable errors
24+
- "Stream closed due to a non-recoverable error" vs "Stream failed permanently after failed recovery attempt"
25+
26+
### Documentation
27+
28+
- Added JSON and protobuf serialization examples for both sync and async APIs
29+
- Restructured Quick Start guide to present JSON first as the simpler option
30+
- Enhanced API Reference with JSON mode documentation
31+
- Added Azure workspace and endpoint URL examples
32+
33+
### Internal Changes
34+
35+
- **Build system**: Loosened setuptools requirement from `>=77` to `>=61`xw
36+
- **License format**: Changed license specification to PEP 621 table format for setuptools <77 compatibility
37+
- Changed from `license = "LicenseRef-Proprietary"` to `license = {text = "LicenseRef-Proprietary"}`
38+
- **generate_proto tool**: Added support for TINYINT and BYTE data types (both map to int32)
39+
- **Logging**: Added detailed initialization logging to async SDK to match sync SDK
40+
- "Starting initializing stream", "Attempting retry X out of Y", "Sending CreateIngestStreamRequest", etc.
41+
42+
### API Changes
43+
44+
- **StreamConfigurationOptions**: Added `record_type` parameter to specify serialization format
45+
- `RecordType.PROTO` (default): For protobuf serialization
46+
- `RecordType.JSON`: For JSON serialization
47+
- Example: `StreamConfigurationOptions(record_type=RecordType.JSON)`
48+
- **ZerobusStream.ingest_record**: Now accepts JSON strings (when using `RecordType.JSON`) in addition to protobuf messages and bytes
49+
- Added `RecordType` enum with `PROTO` and `JSON` values
50+
- Added `HeadersProvider` abstract base class for custom header strategies
51+
- Added `OAuthHeadersProvider` class for OAuth 2.0 authentication with Databricks OIDC endpoint
52+
- Added `create_stream_with_headers_provider` method to `ZerobusSdk` and `aio.ZerobusSdk` for custom authentication header providers
53+
- **Note**: Custom headers providers must include both `authorization` and `x-databricks-zerobus-table-name` headers
54+
355
## Release v0.1.0
456

557
Initial release of the Databricks Zerobus Ingest SDK for Python.

NEXT_CHANGELOG.md

Lines changed: 1 addition & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,13 @@
11
# NEXT CHANGELOG
22

3-
## Release v0.2.0
3+
## Release v0.3.0
44

55
### New Features and Improvements
66

7-
- Loosened protobuf dependency constraint to support versions >= 4.25.0 and < 7.0
8-
- **JSON Serialization Support**: Added support for JSON record serialization alongside Protocol Buffers (default)
9-
- New `RecordType.JSON` mode for ingesting JSON-encoded strings
10-
- No protobuf schema compilation required
11-
- Added `HeadersProvider` abstraction for flexible authentication strategies
12-
- Implemented `OAuthHeadersProvider` for OAuth 2.0 Client Credentials flow (default authentication method used by `create_stream()`)
13-
147
### Bug Fixes
158

169
### Documentation
1710

18-
- Added JSON and protobuf serialization examples for both sync and async APIs
19-
- Restructured Quick Start guide to present JSON first as the simpler option
20-
- Enhanced API Reference with JSON mode documentation
21-
- Added Azure workspace and endpoint URL examples
22-
2311
### Internal Changes
2412

2513
### API Changes
26-
27-
- **StreamConfigurationOptions**: Added `record_type` parameter to specify serialization format
28-
- `RecordType.PROTO` (default): For protobuf serialization
29-
- `RecordType.JSON`: For JSON serialization
30-
- Example: `StreamConfigurationOptions(record_type=RecordType.JSON)`
31-
- **ZerobusStream.ingest_record**: Now accepts JSON strings (when using `RecordType.JSON`) in addition to protobuf messages and bytes
32-
- Added `RecordType` enum with `PROTO` and `JSON` values
33-
- Added `HeadersProvider` abstract base class for custom header strategies
34-
- Added `OAuthHeadersProvider` class for OAuth 2.0 authentication with Databricks OIDC endpoint
35-
- Added `create_stream_with_headers_provider` method to `ZerobusSdk` and `aio.ZerobusSdk` for custom authentication header providers
36-
- **Note**: Custom headers providers must include both `authorization` and `x-databricks-zerobus-table-name` headers

pyproject.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[build-system]
2-
requires = ["setuptools>=77", "wheel"]
2+
requires = ["setuptools>=61", "wheel"]
33
build-backend = "setuptools.build_meta"
44

55
[project]
@@ -9,8 +9,7 @@ description = "Databricks Zerobus Ingest SDK for Python"
99
readme = "README.md"
1010
requires-python = ">=3.9"
1111
keywords = ["zerobus", "databricks", "sdk"]
12-
license = "LicenseRef-Proprietary"
13-
license-files = ["LICENSE"]
12+
license = {text = "LicenseRef-Proprietary"}
1413
classifiers = [
1514
"Development Status :: 5 - Production/Stable",
1615
"Intended Audience :: Developers",

zerobus/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.1.0"
1+
__version__ = "0.2.0"

zerobus/sdk/aio/zerobus_sdk.py

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ async def __with_retries(self, func: Callable, max_attempts: int):
7575
backoff_seconds = (self._options.recovery_backoff_ms / 1000) if self._options.recovery else 0
7676

7777
for attempt in range(max_attempts):
78+
logger.info(f"Attempting retry {attempt} out of {max_attempts}")
7879
try:
7980
try:
8081
await asyncio.wait_for(func(), timeout=timeout_seconds)
@@ -149,6 +150,7 @@ async def __create_stream(self):
149150
async def _initialize(self):
150151
# Asynchronous initialization method for ZerobusStream. Must be called before using the stream.
151152
try:
153+
logger.info("Starting initializing stream")
152154
max_attempts = self._options.recovery_retries if self._options.recovery else 1
153155
await self.__with_retries(self.__create_stream, max_attempts)
154156
await self.__set_state(StreamState.OPENED)
@@ -271,22 +273,29 @@ async def __handle_stream_failed(
271273
err_msg = str(exception) if exception is not None else "Stream closed unexpectedly!"
272274
self.__stream_failure_info.log_failure(failure_type)
273275

274-
if (self.__state == StreamState.OPENED or self.__state == StreamState.FLUSHING) and not isinstance(
275-
exception, NonRetriableException
276-
):
276+
should_recover = (
277+
(self.__state == StreamState.OPENED or self.__state == StreamState.FLUSHING)
278+
and not isinstance(exception, NonRetriableException)
279+
and self._options.recovery
280+
)
281+
282+
if should_recover:
277283
# Set the state to recovering
278284
# This is to prevent the stream from being closed multiple times
279285
self.__state = StreamState.RECOVERING
280286
recovered = await self.__recover_stream()
281287
if recovered:
282288
# Stream recovered successfully
283289
return
290+
# Recovery failed
291+
logger.error(f"Stream failed permanently after failed recovery attempt: {err_msg}")
292+
else:
293+
# Non-recoverable error
294+
logger.error(f"Stream closed due to a non-recoverable error: {err_msg}")
284295

285296
# Close the stream for new events
286297
await self.__set_state(StreamState.FAILED)
287298
await self.__close(hard_failure=True, err_msg=err_msg)
288-
289-
logger.error(f"Stream closed due to an error: {err_msg}")
290299
finally:
291300
self.__error_handling_in_progress = False
292301

@@ -312,6 +321,7 @@ async def __sender(self):
312321

313322
try:
314323
# 1. CREATE STREAM
324+
logger.info("Sending CreateIngestStreamRequest to gRPC stream")
315325
create_stream_request = zerobus_service_pb2.CreateIngestStreamRequest(
316326
table_name=self._table_properties.table_name.encode("utf-8"),
317327
record_type=self._options.record_type.value,
@@ -324,6 +334,7 @@ async def __sender(self):
324334
)
325335

326336
yield zerobus_service_pb2.EphemeralStreamRequest(create_stream=create_stream_request)
337+
logger.info("Waiting for CreateIngestStreamResponse")
327338
await self.__wait_for_stream_to_finish_initialization()
328339
stream_id = self.stream_id
329340

@@ -414,7 +425,12 @@ async def __sender(self):
414425
except asyncio.CancelledError as e:
415426
exception = e
416427
except grpc.RpcError as e:
417-
exception = log_and_get_exception(e)
428+
# Check if this is a CANCELLED error due to intentional stream closure
429+
if self.__state == StreamState.CLOSED and e.code() == grpc.StatusCode.CANCELLED:
430+
# Stream was cancelled during close() - don't log as error
431+
exception = ZerobusException(f"Error happened in sending records: {e}")
432+
else:
433+
exception = log_and_get_exception(e)
418434
except Exception as e:
419435
logger.error(f"Error happened in sending records: {str(e)}")
420436
exception = ZerobusException(f"Error happened in sending records: {str(e)}")
@@ -482,7 +498,12 @@ async def __receiver(self):
482498
except asyncio.CancelledError as e:
483499
exception = e
484500
except grpc.RpcError as e:
485-
exception = log_and_get_exception(e)
501+
# Check if this is a CANCELLED error due to intentional stream closure
502+
if self.__state == StreamState.CLOSED and e.code() == grpc.StatusCode.CANCELLED:
503+
# Stream was cancelled during close() - don't log as error
504+
exception = ZerobusException(f"Error happened in receiving records: {e}")
505+
else:
506+
exception = log_and_get_exception(e)
486507
except Exception as e:
487508
logger.error(f"Error happened in receiving records: {str(e)}")
488509
exception = ZerobusException(f"Error happened in receiving records: {str(e)}")

zerobus/sdk/sync/zerobus_sdk.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -514,7 +514,12 @@ def __sender(self) -> Iterator[zerobus_service_pb2.EphemeralStreamRequest]:
514514
offset_id += 1
515515

516516
except grpc.RpcError as e:
517-
exception = log_and_get_exception(e)
517+
# Only log if the stream is not being intentionally stopped
518+
if self.__stop_event.is_set() and e.code() == grpc.StatusCode.CANCELLED:
519+
# Stream was cancelled during close() - don't log as error
520+
exception = ZerobusException(f"Error happened in sending records: {e}")
521+
else:
522+
exception = log_and_get_exception(e)
518523
except Exception as e:
519524
if not self.__stop_event.is_set():
520525
logger.error(f"Error in sender: {str(e)}")
@@ -609,7 +614,12 @@ def __receiver(self):
609614
if not created_stream_success:
610615
exception = e
611616
return
612-
exception = log_and_get_exception(e)
617+
# Only log if the stream is not being intentionally stopped
618+
if self.__stop_event.is_set() and e.code() == grpc.StatusCode.CANCELLED:
619+
# Stream was cancelled during close() - don't log as error
620+
exception = ZerobusException(f"Error happened in receiving records: {e}")
621+
else:
622+
exception = log_and_get_exception(e)
613623
except Exception as e:
614624
logger.error(f"Error happened in receiving records: {str(e)}")
615625
exception = ZerobusException(f"Error happened in receiving records: {str(e)}")

zerobus/tools/generate_proto.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,12 @@ def parse_args() -> argparse.Namespace:
3232
3333
Type mappings:
3434
Delta -> Proto2
35+
TINYINT/BYTE -> int32
36+
SMALLINT/SHORT -> int32
3537
INT -> int32
38+
BIGINT/LONG -> int64
3639
STRING -> string
3740
FLOAT -> float
38-
LONG -> int64
39-
SHORT -> int32
4041
DOUBLE -> double
4142
BOOLEAN -> bool
4243
BINARY -> bytes
@@ -216,7 +217,7 @@ def parse_array_type(column_type: str) -> Optional[str]:
216217
"""
217218
match = re.match(r"^ARRAY<(.+)>$", column_type.upper())
218219
if match:
219-
return match.group(1).strip()
220+
return column_type[6:-1].strip()
220221
return None
221222

222223

@@ -398,6 +399,8 @@ def get_proto_field_info(
398399

399400
# Base scalar types
400401
type_mapping = {
402+
"TINYINT": "int32",
403+
"BYTE": "int32",
401404
"SMALLINT": "int32",
402405
"SHORT": "int32",
403406
"INT": "int32",
@@ -424,7 +427,11 @@ def get_proto_field_info(
424427
if element_type is not None:
425428
# Check for nested arrays (not supported)
426429
if parse_array_type(element_type) is not None:
427-
raise ValueError(f"Direct nested arrays are not supported: {element_type}")
430+
raise ValueError("Nested arrays are not supported: array<array<...>>")
431+
432+
# Check for array of maps (not supported)
433+
if parse_map_type(element_type) is not None:
434+
raise ValueError("Arrays of maps are not supported: array<map<...>>")
428435

429436
modifier, elem_proto_type, nested_def = get_proto_field_info(
430437
field_name, element_type, False, struct_counter, level + 1
@@ -436,6 +443,14 @@ def get_proto_field_info(
436443
if map_types is not None:
437444
key_type, value_type = map_types
438445

446+
# Protobuf map keys cannot be maps
447+
if parse_map_type(key_type) is not None:
448+
raise ValueError("Maps with map keys are not supported: map<map<...>, ...>")
449+
450+
# Protobuf map keys cannot be arrays
451+
if parse_array_type(key_type) is not None:
452+
raise ValueError("Maps with array keys are not supported: map<array<...>, ...>")
453+
439454
# Protobuf map keys must be integral or string types
440455
_, key_proto_type, key_nested_def = get_proto_field_info(field_name, key_type, False, struct_counter, level + 1)
441456

@@ -459,7 +474,11 @@ def get_proto_field_info(
459474

460475
# Protobuf map values cannot be other maps
461476
if parse_map_type(value_type) is not None:
462-
raise ValueError(f"Protobuf does not support nested maps. Found in: {column_type}")
477+
raise ValueError("Maps with map values are not supported: map<..., map<...>>")
478+
479+
# Protobuf map values cannot be arrays
480+
if parse_array_type(value_type) is not None:
481+
raise ValueError("Maps with array values are not supported: map<..., array<...>>")
463482

464483
_, value_proto_type, value_nested_def = get_proto_field_info(
465484
field_name, value_type, False, struct_counter, level + 1

0 commit comments

Comments
 (0)