Skip to content

Commit ae1d7c7

Browse files
feat: Add cancel and status queries to server-side async execution (#192)
* Added async_execution to async_db/cursor and db/cursor. Added an error, . Added a slot and getter for query_id to BaseCursor. (Temporarily?) edited setup.cfg to ignore flake8 C901: function is too complex. * Removed ignore C901 from flake8 settings in setup.cfg. * Fixed a couple of missing arguments in async_db/cursor.py on execute and execute_many calls. * mypy and black cleanup. * Removed set_parameters argument from all(?) functions. Started adding logic for SET async_execution. * Added a bunch of callbacks to cursor tests. * Pulled out a couple more set_parameters variables from function signatures. Edited callbacks and various tiny things in async test_cursor.py. * Added, and commented out, server_side_async_url to unit/conftests.py. * Removed test_set_parameters() from tests/async/cursor.py. * Removed test_set_parameters() from tests/async/cursor.py. * Added ability to see which exectute is failing (execute or executemany) on a unit test run. * Added more explicit error messages to async and sync test_cursor.py modules. * Added some periods. * Replaced cursor reset call in async _do_execute(). * Updated query/message tuple decomposition in test_cursor to be more human-readable. * Fixed a typo and function signature for db/test_cursor_server_side_async_execute() to remove async. * Removed second hard-coding of query_id in server-side async id callback. * Needed to add an await to an _api_request() call. * Used InternalError to error out on no response to async server-side query. Changed AsyncExecutionUnavailableError to generically accept messages. * Added additional checks on rowcount and description in test_cursor_server_side_async_execute(). * Added QueryResponse class. * Minor changes requested on PR. * Added an OperationalError is asynchronous query response is missing query_id. * Had a typo. * Added a warning if asyc_execution is set via a SET parameter rather than being sent in as an argument to execute(). Moved set parameter validation to its own function to deal with flake8 complaints re _do_exectute() being too complex. * Started adding test_cursor_async_execute_error(). * Updated query_url argument in test_cursor_async_execute_error(). * Added AsyncExecutionUnavailableError on server-side async query execution for multi-statement queries. * Seem to have dealt with auth issues in test_cursor_async_execute_error, but getting invalid set parameter on use_standard_sql. Also added # noqa: C901 to parse_type() definition in _types.py because flake8 was suddenly freaking out about it. * Added all necessary set params to url string in test_cursor_async_execute_error(). * Cleaned up string input in test_cursor_async_execute_error(). * Now no token error. * Multi-statement queries now error out correctly. * Reworked a string to try to get commit/push to work. * Had to add an extra auth callback to get all cursor.execute() calls to work. All error tests for async_execution should now be tested correctly. * Removed some parameters from various fns in unit/async_db/test_cursor that I noticed were extraneous. Too bad mypy or flake8 isn't catching these :-/. * Added error check for missing query_id on async_execution. A little bit of cleanup, also Black seems to have made a change or two. * Fixed error for empty response.json on asynch execution. Also changed the use of SET async_execution to an error and included test for that error. * Fixed error for empty response.json on asynch execution. Also changed the use of SET async_execution to an error and included test for that error. * Fixed error for empty response.json on asynch execution. Also changed the use of SET async_execution to an error and included test for that error. * Added a test to check that an server-side asynchronous execution returns a string, as a non-sync-execution query would return rowcount as an int. * Added an integration test to check that an server-side asynchronous execution returns a string, as a non-sync-execution query would return rowcount as an int.~ * Added cancel() to async/cursor.py. Also fixed an error where I was getting empty query ids back from server-side async exectutions, and added an error for that case. * Forgot that I'd commented out most of test_cursor.py. * Trying to get rid of coroutine 'BaseCursor.execute' was never awaited warning. Not yet successful. * Added unit tests for cancel and cancel errors. * Fixed a mistake that would have failed the cancel() integration test. * Fixed several imports that had disappeared (maybe during a merge?). Also fixed an error in test_ss_async_execution_cancel(). * get_status() and two unit tests are added. Integration test is failing with json that has correct field names but empty fields. * Added a new QueryStatus, NOT_AVAILABLE, because checking status will return empty result the first few times. Fixed some issues with the unit tests and updated the integration tests for get_status(). * Added a comment. * Updated a comment. * Added stub fn for async execution fetch. * Keep forgetting to uncomment test code and the pre-commit checks are removing imports. Left in some extra calls to time() and sleep() for now. * Removed some extraneous testing code. * Updated test_ss_async_execution_get_status() after Yoni pointed out that DDL operations will always return empty JSON. Now using an INSERT instead. * Had to comment out test_ss_async_execution_get_status(), as it basically entered an infinite loop. * Added ability to specify output_format in _api_request(), as status requests will fail if it is set. * First set of requested changes on PR. * Removed noqa on _do_execute(). * Renamed _find_async_problems() to _validate_ss_async_settings(). Removed test_cursor_server_side_async_cancel_error from integration tests. * Moved call to _validate_ss_async_settings() into try. * Added asyncio_mode=auto to pytest config in config.cfg, because I was tired of the continual warnings from pytest. * Changed long query in test_queries_async integration tests. Paused execution of after cancel() to ensure I pick up the correct status message. Now sending output_format= on some calls to _api_request(). * Updated all unit tests that test SET parameters to not have output_format in the url. * Changed query_loop() in integration tests/async/test_queries to check for more than one status before exiting the loop. * Noticed that test_anyio_backend_import_issue() was commented out in sync/test_queries.py. That was done bc it won't run on my laptop, but it shouldn't have been committed that way. * Added query tests to integration/dbapi/sync/test_queries.py. Changed async_exectution test to not count SET statements as queries when determining whether a query is multi-statment. Trying to get sleepEachRow() to work for long aync execution queries. * Added query tests to integration/dbapi/sync/test_queries.py. Changed async_execution test to not count SET statements as queries when determining whether a query is multi-statment. Trying to get sleepEachRow() to work for long aync execution queries. * Now errors out when use_standard_sql=0 rather than when it equals 1. This is because if it's off no log entries are written to query_history. Still using a long insert for integration tests on server-side async queries. Added and edited to unit tests for use_standard_sql correctness. * Changed order of synchronous unit tests to move all server-side async tests to end. * Changed order of asynchronous cursor unit tests to move all server-side async tests to end. * Reordered integration and unit test modules to move all server-side async tests to end of modules to facilitate merging main. * Moving JSON_OUTPUT_FORMAT outside of _api_request (#196) * Updated docs to include information on server-side async query execution. * Updated external table mention in comments and removed sentence in docs. Updated dictionary update in _api_request() to make mypy happy. * Made a change to server-side execution explanation for clarity and to explain usefullness of that functionality. * Renamed a function and moved table create and drop out of test_queries.py and into conftest.py. Currently name not defined error. Maybe it's in the wrong conftest file? Co-authored-by: Petro Tiurin <[email protected]>
1 parent 77471c2 commit ae1d7c7

File tree

10 files changed

+898
-237
lines changed

10 files changed

+898
-237
lines changed

docsrc/Connecting_and_queries.rst

Lines changed: 123 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -3,23 +3,23 @@
33
Connecting and running queries
44
###############################
55

6-
This topic provides a walkthrough and examples for how to use the Firebolt Python SDK to connect to Firebolt resources to run commands and query data.
6+
This topic provides a walkthrough and examples for how to use the Firebolt Python SDK to connect to Firebolt resources to run commands and query data.
77

88

99
Setting up a connection
1010
=========================
1111

12-
To connect to a Firebolt database to run queries or command, you must provide your account credentials through a connection request.
12+
To connect to a Firebolt database to run queries or command, you must provide your account credentials through a connection request.
1313

14-
To get started, follow the steps below:
14+
To get started, follow the steps below:
1515

1616
**1. Import modules**
1717

18-
The Firebolt Python SDK requires you to import the following modules before making any command or query requests to your Firebolt database.
18+
The Firebolt Python SDK requires you to import the following modules before making any command or query requests to your Firebolt database.
1919

2020
.. _required_connection_imports:
2121

22-
::
22+
::
2323

2424
from firebolt.db import connect
2525
from firebolt.client import DEFAULT_API_URL
@@ -30,9 +30,9 @@ To get started, follow the steps below:
3030
**2. Connect to your database and engine**
3131

3232

33-
Your account information can be provided as parameters in a ``connection()`` function.
33+
Your account information can be provided as parameters in a ``connection()`` function.
3434

35-
A connection requires the following parameters:
35+
A connection requires the following parameters:
3636

3737
+------------------------------------+-------------------------------------------------------------------+
3838
| ``username`` | The email address associated with your Firebolt user. |
@@ -50,9 +50,9 @@ To get started, follow the steps below:
5050

5151
* **Set credentials manually**
5252

53-
You can manually include your account information in a connection object in your code for any queries you want to request.
53+
You can manually include your account information in a connection object in your code for any queries you want to request.
5454

55-
Replace the values in the example code below with your Firebolt account credentials as appropriate.
55+
Replace the values in the example code below with your Firebolt account credentials as appropriate.
5656

5757
::
5858

@@ -61,19 +61,19 @@ To get started, follow the steps below:
6161
engine_name = "your_engine"
6262
database_name = "your_database"
6363

64-
connection = connect(
64+
connection = connect(
6565
engine_name=engine_name,
6666
database=database_name,
6767
username=username,
6868
password=password,
6969
)
70-
70+
7171
cursor = connection.cursor()
7272

7373

7474
* **Use an .env file**
7575

76-
Consolidating all of your Firebolt credentials into a ``.env`` file can help protect sensitive information from exposure. Create an ``.env`` file with the following key-value pairs, and replace the values with your information.
76+
Consolidating all of your Firebolt credentials into a ``.env`` file can help protect sensitive information from exposure. Create an ``.env`` file with the following key-value pairs, and replace the values with your information.
7777

7878
::
7979

@@ -82,9 +82,9 @@ To get started, follow the steps below:
8282
FIREBOLT_ENGINE="your_engine"
8383
FIREBOLT_DB="your_database"
8484

85-
Be sure to place this ``.env`` file into your root directory.
85+
Be sure to place this ``.env`` file into your root directory.
8686

87-
Your connection script can load these environmental variables from the ``.env`` file by using the `python-dotenv <https://pypi.org/project/python-dotenv/>`_ package. Note that the example below imports the ``os`` and ``dotenv`` modules in order to load the environmental variables.
87+
Your connection script can load these environmental variables from the ``.env`` file by using the `python-dotenv <https://pypi.org/project/python-dotenv/>`_ package. Note that the example below imports the ``os`` and ``dotenv`` modules in order to load the environmental variables.
8888

8989
::
9090

@@ -105,35 +105,35 @@ To get started, follow the steps below:
105105

106106
**3. Execute commands using the cursor**
107107

108-
The ``cursor`` object can be used to send queries and commands to your Firebolt database and engine. See below for examples of functions using the ``cursor`` object.
108+
The ``cursor`` object can be used to send queries and commands to your Firebolt database and engine. See below for examples of functions using the ``cursor`` object.
109109

110-
Command and query examples
110+
Server-side synchronous command and query examples
111111
============================
112112

113-
This section includes Python examples of various SQL commands and queries.
113+
This section includes Python examples of various SQL commands and queries.
114114

115115

116116
Inserting and selecting data
117117
-----------------------------
118118

119119
.. _basic_execute_example:
120120

121-
The example below uses ``cursor`` to create a new table called ``test_table``, insert rows into it, and then select the table's contents.
121+
The example below uses ``cursor`` to create a new table called ``test_table``, insert rows into it, and then select the table's contents.
122122

123-
The engine attached to your specified database must be started before executing any queries. For help, see :ref:`starting an engine`.
123+
The engine attached to your specified database must be started before executing any queries. For help, see :ref:`starting an engine`.
124124

125125
::
126126

127127
cursor.execute(
128128
'''CREATE FACT TABLE IF NOT EXISTS test_table (
129-
id INT,
130-
name TEXT
131-
)
129+
id INT,
130+
name TEXT
131+
)
132132
PRIMARY INDEX id;'''
133133
)
134-
134+
135135
cursor.execute(
136-
'''INSERT INTO test_table VALUES
136+
'''INSERT INTO test_table VALUES
137137
(1, 'hello'),
138138
(2, 'world'),
139139
(3, '!');'''
@@ -145,23 +145,23 @@ The engine attached to your specified database must be started before executing
145145

146146
cursor.close()
147147

148-
.. note::
148+
.. note::
149149

150-
For reference documentation on ``cursor`` functions, see :ref:`Db.cursor`
150+
For reference documentation on ``cursor`` functions, see :ref:`Db.cursor`
151151

152152

153153
Fetching query results
154154
-----------------------
155155

156-
After running a query, you can fetch the results using a ``cursor`` object. The examples below use the data queried from ``test_table`` created in the :ref:`Inserting and selecting data`.
156+
After running a query, you can fetch the results using a ``cursor`` object. The examples below use the data queried from ``test_table`` created in the :ref:`Inserting and selecting data`.
157157

158158
.. _fetch_example:
159159

160160
::
161161

162162
print(cursor.fetchone())
163163

164-
**Returns**: ``[2, 'world']``
164+
**Returns**: ``[2, 'world']``
165165

166166
::
167167

@@ -181,25 +181,25 @@ Executing parameterized queries
181181

182182
.. _parameterized_query_execute_example:
183183

184-
Parameterized queries (also known as “prepared statements”) format a SQL query with placeholders and then pass values into those placeholders when the query is run. This protects against SQL injection attacks and also helps manage dynamic queries that are likely to change, such as filter UIs or access control.
184+
Parameterized queries (also known as “prepared statements”) format a SQL query with placeholders and then pass values into those placeholders when the query is run. This protects against SQL injection attacks and also helps manage dynamic queries that are likely to change, such as filter UIs or access control.
185185

186186
To run a parameterized query, use the ``execute()`` cursor method. Add placeholders to your statement using question marks ``?``, and in the second argument pass a tuple of parameters equal in length to the number of ``?`` in the statement.
187187

188188

189-
::
189+
::
190190

191191
cursor.execute(
192192
'''CREATE FACT TABLE IF NOT EXISTS test_table2 (
193193
id INT,
194-
name TEXT,
194+
name TEXT,
195195
date_value DATE
196196
)
197197
PRIMARY INDEX id;'''
198198
)
199199

200200

201201
::
202-
202+
203203
cursor.execute(
204204
"INSERT INTO test_table2 VALUES (?, ?, ?)",
205205
(1, "apple", "2018-01-01"),
@@ -216,8 +216,8 @@ If you need to run the same statement multiple times with different parameter in
216216
cursor.executemany(
217217
"INSERT INTO test_table2 VALUES (?, ?, ?)",
218218
(
219-
(2, "banana", "2019-01-01"),
220-
(3, "carrot", "2020-01-01"),
219+
(2, "banana", "2019-01-01"),
220+
(3, "carrot", "2020-01-01"),
221221
(4, "donut", "2021-01-01")
222222
)
223223
)
@@ -231,7 +231,7 @@ Executing multiple-statement queries
231231

232232
Multiple-statement queries allow you to run a series of SQL statements sequentially with just one method call. Statements are separated using a semicolon ``;``, similar to making SQL statements in the Firebolt UI.
233233

234-
::
234+
::
235235

236236
cursor.execute(
237237
"""
@@ -246,32 +246,110 @@ Multiple-statement queries allow you to run a series of SQL statements sequentia
246246

247247
cursor.close()
248248

249-
**Returns**:
249+
**Returns**:
250250

251-
::
251+
::
252252

253253
First query: [[2, 'banana', datetime.date(2019, 1, 1)], [3, 'carrot', datetime.date(2020, 1, 1)], [1, 'apple', datetime.date(2018, 1, 1)]]
254254
Second query: [[3, 'carrot', datetime.date(2020, 1, 1)], [4, 'donut', datetime.date(2021, 1, 1)]]
255255

256-
.. note::
256+
.. note::
257+
258+
Multiple statement queries are not able to use placeholder values for parameterized queries.
259+
260+
261+
262+
Server-side asynchronous query execution
263+
==========================================
264+
265+
Server-side asynchronous query execution allows you to run a long query in the background while executing other asynchronous or synchronous queries. An additional benefit of server-side async execution that can free up a connection, close a connection while running a query, or potentially even spin down an entire service (such as AWS Lambda) while a long-running database job is still underway. Note that it is not possible to retrieve the results of a server-side asynchronous query, so these queries are best used for running DMLs and DDLs. SELECTs should be used only for warming the cache.
266+
267+
Running DDL commands
268+
-----------------------------
269+
270+
.. _basic_execute_example:
271+
272+
Running queries server-side asynchronously is similar to running server-side asynchronous queries, but the ``execute()`` command receives an extra parameter, ``async_execution=True``. The example below uses ``cursor`` to create a new table called ``test_table``. ``execute(query, async_execution=True)`` will return a query ID, which can subsequently be used to check the query status.
273+
274+
::
275+
276+
query_id = cursor.execute(
277+
'''CREATE FACT TABLE IF NOT EXISTS test_table (
278+
id INT,
279+
name TEXT
280+
)
281+
PRIMARY INDEX id;''',
282+
async_execution=True
283+
)
284+
285+
286+
To check the status of a query, send the query ID to ```get_status()``` to receive a QueryStatus enumeration object. Possible statuses are:
287+
288+
289+
* ``RUNNING``
290+
* ``ENDED_SUCCESSFULLY``
291+
* ``ENDED_UNSUCCESSFULLY``
292+
* ``NOT_READY``
293+
* ``STARTED_EXECUTION``
294+
* ``PARSE_ERROR``
295+
* ``CANCELED_EXECUTION``
296+
* ``EXECUTION_ERROR``
297+
298+
299+
Once the status of the table creation is ``ENDED_SUCCESSFULLY`` created, data can be inserted into it:
300+
301+
::
302+
303+
from firebolt.async_db.cursor import QueryStatus
304+
305+
query_status = cursor.get_status(query_id)
306+
307+
if query_status == QueryStatus.ENDED_SUCCESSFULLY:
308+
cursor.execute(
309+
'''INSERT INTO test_table VALUES
310+
(1, 'hello'),
311+
(2, 'world'),
312+
(3, '!');'''
313+
)
314+
315+
316+
In addition, server-side asynchronous queries can be cancelled calling ``cancel()``.
317+
318+
::
319+
320+
query_id = cursor.execute(
321+
'''CREATE FACT TABLE IF NOT EXISTS test_table (
322+
id INT,
323+
name TEXT
324+
)
325+
PRIMARY INDEX id;''',
326+
async_execution=True
327+
)
328+
329+
cursor.cancel(query_id)
330+
331+
query_status = cursor.get_status(query_id)
332+
333+
print(query_status)
334+
335+
**Returns**: ``CANCELED_EXECUTION``
257336

258-
Multiple statement queries are not able to use placeholder values for parameterized queries.
259337

260338

261339
Using DATE and DATETIME values
262-
---------------------------------
340+
==============================
263341

264-
DATE, DATETIME and TIMESTAMP values used in SQL insertion statements must be provided in a specific format; otherwise they could be read incorrectly.
342+
DATE, DATETIME and TIMESTAMP values used in SQL insertion statements must be provided in a specific format; otherwise they could be read incorrectly.
265343

266-
* DATE values should be formatted as **YYYY-MM-DD**
344+
* DATE values should be formatted as **YYYY-MM-DD**
267345

268346
* DATETIME and TIMESTAMP values should be formatted as **YYYY-MM-DD HH:MM:SS.SSSSSS**
269347

270-
The `datetime <https://docs.python.org/3/library/datetime.html>`_ module from the Python standard library contains various classes and methods to format DATE, TIMESTAMP and DATETIME data types.
348+
The `datetime <https://docs.python.org/3/library/datetime.html>`_ module from the Python standard library contains various classes and methods to format DATE, TIMESTAMP and DATETIME data types.
271349

272-
You can import this module as follows:
350+
You can import this module as follows:
273351

274-
::
352+
::
275353

276354
from datetime import datetime
277355

docsrc/firebolt.async_db.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Async DB
33
==========================
44

5-
The Async DB package enables connecting to a Firebolt database for asynchronous queries.
5+
The Async DB package enables connecting to a Firebolt database for `client-side` asynchronous queries. For running queries in `server-side` asynchronous mode see :ref:`server-side asynchronous query execution`.
66

77
Connect
88
------------------------------------

0 commit comments

Comments
 (0)