Skip to content

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Jan 31, 2025

This makes the usage of the segment cursor more natural:

cur = conn.cursor('segment')
cur.execute(sql)
spooled_result = cur.fetchall()
for spooled_data in spooled_result:
  row_mapper = RowMapperFactory().create(columns=cur._query.columns, legacy_primitive_types=False)
  rows = list(SegmentIterator(spooled_data, row_mapper))

This allows for the usage that is intuitive to the user:

conn = trino.dbapi.Connection(
  host='localhost',
  port=8080,
  user='admin'
)
cur = conn.cursor('segment')
cur.execute(sql)
spooled_result = cur.fetchall()
print(spooled_result)
total_row_count = 0
for spooled_data in spooled_result:
  for spooled_segment in spooled_data.segments:
    print(spooled_segment.uri)
  row_mapper = RowMapperFactory().create(columns=cur._query.columns, legacy_primitive_types=False)
  rows = list(SegmentIterator(spooled_data, row_mapper))
  print(f'{len(rows)} rows in this segment!')
  total_row_count += len(rows)
  print(total_row_count)

This will produce:

[SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 450625, 'uncompressedSize': 8331286, 'rowsCount': 59805, 'rowOffset': 0})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 450418, 'uncompressedSize': 8271666, 'rowsCount': 59785, 'rowOffset': 59805})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 453987, 'uncompressedSize': 7663692, 'rowsCount': 54969, 'rowOffset': 119590})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 458888, 'uncompressedSize': 7745643, 'rowsCount': 55625, 'rowOffset': 174559})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 454021, 'uncompressedSize': 7673196, 'rowsCount': 55020, 'rowOffset': 230184})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 195314, 'uncompressedSize': 720690, 'rowsCount': 5166, 'rowOffset': 285204})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 200530, 'uncompressedSize': 741609, 'rowsCount': 5320, 'rowOffset': 290370})]), SpooledData(encoding=json+zstd, segments=[SpooledSegment(metadata={'segmentSize': 195359, 'uncompressedSize': 723296, 'rowsCount': 5185, 'rowOffset': 295690})])]
http://minio:9080/spooling/01JJY37MM347XP4HPZFG7N4X60.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=6467169b75fb1f8a2e0b037ed94cfbc234a1209f983f3f0aa31c8ea37137c0b6
59805 rows in this segment!
59805
http://minio:9080/spooling/01JJY37MM6KGFWA885RYFMGD42.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=d134a69a34af526cbceeb11ff7e66b028b731bc1b98d753c18131eaf2b7abadb
59785 rows in this segment!
119590
http://minio:9080/spooling/01JJY37MM4TWS9PWT6WWM6MMJH.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=a538b74b9ba2f6c4767f7518ad2b5de71d75c7a3196d87a9a9902896e4e50971
54969 rows in this segment!
174559
http://minio:9080/spooling/01JJY37MMSPED9G35H4WHA58P7.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=0ccc5a51275181549a5a7af4623b665d9f4ff5b308a81176d604cbc3f31f8e5b
55625 rows in this segment!
230184
http://minio:9080/spooling/01JJY37MMJ3YJW15860Y6AHNB2.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=a79eb1431dcb109ad7fa2be1057d0de50b9eef3389ab429159191003444e47be
55020 rows in this segment!
285204
http://minio:9080/spooling/01JJY37MQFS4X6GT8A6GXMTE20.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=eee74ba6a2967b39e89047f92f2e53f6c0ce6a974fa80a540cbd825a793293d4
5166 rows in this segment!
290370
http://minio:9080/spooling/01JJY37MQFACMW980W48YWK7HW.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=fcf5ef4375faee9f587a12a04e323d96d5fedf56ee1356c0bf7b8ab680d2268b
5320 rows in this segment!
295690
http://minio:9080/spooling/01JJY37MQGF7ADNEB8SWMA4Z4B.json%2Bzstd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20250131T114640Z&X-Amz-SignedHeaders=host&X-Amz-Credential=minio-access-key%2F20250131%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=299&X-Amz-Signature=562be1d360868e77c4d6a067f68ed2c32e64a7bbfcade39bce01cb89ab628028
5185 rows in this segment!
300875

@cla-bot cla-bot bot added the cla-signed label Jan 31, 2025
@wendigo wendigo force-pushed the serafin/cursor-segment branch 3 times, most recently from 36adba5 to a1732ea Compare January 31, 2025 11:55
@wendigo wendigo requested review from hashhar and mdesmet January 31, 2025 11:55
@wendigo wendigo force-pushed the serafin/cursor-segment branch 4 times, most recently from d0170d0 to 4cd5320 Compare January 31, 2025 12:15
This makes the usage of the segment cursor more natural:

```
cur = conn.cursor('segment')
cur.execute(sql)
spooled_result = cur.fetchall()
for spooled_data in spooled_result:
  row_mapper = RowMapperFactory().create(columns=cur._query.columns, legacy_primitive_types=False)
  rows = list(SegmentIterator(spooled_data, row_mapper))
```
@wendigo wendigo force-pushed the serafin/cursor-segment branch from 4cd5320 to c41a3d0 Compare January 31, 2025 12:21
@wendigo wendigo requested a review from damian3031 January 31, 2025 12:31
legacy_primitive_types=self._legacy_primitive_types,
fetch_mode="segments")
self._iterator = iter(self._query.execute())
self._iterator = map(lambda tuple: tuple[0], iter(self._query.execute()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is a breaking change, but i agree it is a nicer API.

However, how can the client retrieve the encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the usage in the integration test

for spooled_data, spooled_segment in rows:
segments = cur.fetchall()
assert len(segments) > 0
row_mapper = trino.mapper.RowMapperFactory().create(columns=cur._query.columns, legacy_primitive_types=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change import to from trino.mapper import RowMapperFactory

def __next__(self) -> Tuple["SpooledData", "Segment"]:
return self, next(self._segments_iterator)
segment = next(self._segments_iterator)
return SpooledData(self._encoding, [segment]), segment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also change SpooledData to contain a single segment, rather than a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SpooledData can contain multiple segments. I think we should not reuse this object as the iterable object but return another class that has all the required fields. Let's get rid of Tuple["SpooledData", "Segment"].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, each response can contain multiple segment URLs, so we need the transfer object to reflect that, but the abstraction we expose to the client doesn't need to mirror that.

@wendigo wendigo closed this Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants