Skip to content

Commit c1b8c19

Browse files
Updated doc.go for exposing Arrow record batches
Signed-off-by: Raymond Cypher <[email protected]>
1 parent 5584dd5 commit c1b8c19

File tree

1 file changed

+73
-0
lines changed

1 file changed

+73
-0
lines changed

doc.go

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,79 @@ Example usage:
233233
234234
See the documentation for dbsql/errors for more information.
235235
236+
# Retrieving Arrow Batches
237+
238+
The driver supports the ability to retrieve Apache Arrow record batches.
239+
To work with record batches it is necessary to use sql.Conn.Raw() to access the underlying driver connection to retrieve a driver.Rows instance.
240+
The driver exposes two public interfaces for working with record batches:
241+
242+
type DBSQLRows interface {
243+
GetArrowBatches(context.Context) (DBSQLArrowBatchIterator, error)
244+
}
245+
246+
type DBSQLArrowBatchIterator interface {
247+
// Retrieve the next arrow.Record.
248+
// Will return io.EOF if there are no more records
249+
Next() (arrow.Record, error)
250+
251+
// Return true if the iterator contains more batches, false otherwise.
252+
HasNext() bool
253+
254+
// Release any resources in use by the iterator.
255+
Close()
256+
}
257+
258+
The driver.Rows instance retrieved using Conn.Raw() can be converted to DBSQLRows via a type assertion, then use GetArrowBatches() to retrieve a batch iterator.
259+
If the DBSQLArrowBatchIterator is not closed it will leak resources, such as the underlying connection.
260+
261+
Example usage:
262+
263+
import (
264+
...
265+
dbsqlrows "github.com/databricks/databricks-sql-go/rows"
266+
)
267+
268+
func main() {
269+
...
270+
db := sql.OpenDB(connector)
271+
defer db.Close()
272+
273+
conn, _ := db.Conn(context.BackGround())
274+
defer conn.Close()
275+
276+
query := `select * from hive_metastore.main.taxi_trip_data`
277+
278+
var rows driver.Rows
279+
var err error
280+
err = conn.Raw(func(d interface{}) error {
281+
rows, err = d.(driver.QueryerContext).QueryContext(ctx, query, nil)
282+
return err
283+
})
284+
285+
if err != nil {
286+
log.Fatalf("unable to run the query. err: %v", err)
287+
}
288+
defer rows.Close()
289+
290+
batches, err := rows.(dbsqlrows.DBSQLRows).GetArrowBatches(context.BackGround())
291+
if err != nil {
292+
log.Fatalf("unable to get arrow batches. err: %v", err)
293+
}
294+
295+
var iBatch, nRows int
296+
for batches.HasNext() {
297+
b, err := batches.Next()
298+
if err != nil {
299+
log.Fatalf("Failure retrieving batch. err: %v", err)
300+
}
301+
302+
log.Printf("batch %v: nRecords=%v\n", iBatch, b.NumRows())
303+
iBatch += 1
304+
nRows += int(b.NumRows())
305+
}
306+
log.Printf("NRows: %v\n", nRows)
307+
}
308+
236309
# Supported Data Types
237310
238311
==================================

0 commit comments

Comments
 (0)