Skip to content

Commit 992a3d8

Browse files
betodealmeidamistercrunch
authored andcommitted
Merge druiddb into pydruid (#110)
* Import druiddb * Update requirements and entry points * Remove fstrings * Fix unit test * Move unit test to sub directory * Update docs * Change history filename * Fix name
1 parent 060ae0c commit 992a3d8

File tree

8 files changed

+996
-11
lines changed

8 files changed

+996
-11
lines changed

README.md

Lines changed: 59 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
1-
#pydruid
2-
pydruid exposes a simple API to create, execute, and analyze [Druid](http://druid.io/) queries. pydruid can parse query results into [Pandas](http://pandas.pydata.org/) DataFrame objects for subsequent data analysis -- this offers a tight integration between [Druid](http://druid.io/), the [SciPy](http://www.scipy.org/stackspec.html) stack (for scientific computing) and [scikit-learn](http://scikit-learn.org/stable/) (for machine learning). Additionally, pydruid can export query results into TSV or JSON for further processing with your favorite tool, e.g., R, Julia, Matlab, Excel.
3-
It provides both synchronous and asynchronous clients.
1+
# pydruid
2+
3+
pydruid exposes a simple API to create, execute, and analyze [Druid](http://druid.io/) queries. pydruid can parse query results into [Pandas](http://pandas.pydata.org/) DataFrame objects for subsequent data analysis -- this offers a tight integration between [Druid](http://druid.io/), the [SciPy](http://www.scipy.org/stackspec.html) stack (for scientific computing) and [scikit-learn](http://scikit-learn.org/stable/) (for machine learning). pydruid can export query results into TSV or JSON for further processing with your favorite tool, e.g., R, Julia, Matlab, Excel. It provides both synchronous and asynchronous clients.
4+
5+
Additionally, pydruid implements the [Python DB API 2.0](https://www.python.org/dev/peps/pep-0249/), a [SQLAlchemy dialect](http://docs.sqlalchemy.org/en/latest/dialects/), and a provides a command line interface to interact with Druid.
46

57
To install:
68
```python
@@ -11,10 +13,15 @@ pip install pydruid[async]
1113
pip install pydruid[pandas]
1214
# or, if you intend to do both
1315
pip install pydruid[async, pandas]
16+
# or, if you want to use the SQLAlchemy engine
17+
pip install pydruid[sqlalchemy]
18+
# or, if you want to use the CLI
19+
pip install pydruid[cli]
1420
```
1521
Documentation: https://pythonhosted.org/pydruid/.
1622

17-
#examples
23+
# examples
24+
1825
The following exampes show how to execute and analyze the results of three types of queries: timeseries, topN, and groupby. We will use these queries to ask simple questions about twitter's public data set.
1926

2027
## timeseries
@@ -118,13 +125,13 @@ plot(g, "tweets.png", layout=layout, vertex_size=2, bbox=(400, 400), margin=25,
118125

119126
![alt text](https://github.com/metamx/pydruid/raw/master/docs/figures/twitter_graph.png "Social Network")
120127

121-
#asynchronous client
128+
# asynchronous client
122129
```pydruid.async_client.AsyncPyDruid``` implements an asynchronous client. To achieve that, it utilizes an asynchronous
123130
HTTP client from ```Tornado``` framework. The asynchronous client is suitable for use with async frameworks such as Tornado
124131
and provides much better performance at scale. It lets you serve multiple requests at the same time, without blocking on
125132
Druid executing your queries.
126133

127-
##example
134+
## example
128135
```python
129136
from tornado import gen
130137
from pydruid.async_client import AsyncPyDruid
@@ -153,7 +160,7 @@ def your_asynchronous_method_serving_top10_mentions_for_day(day
153160
```
154161

155162

156-
#thetaSketches
163+
# thetaSketches
157164
Theta sketch Post aggregators are built slightly differently to normal Post Aggregators, as they have different operators.
158165
Note: you must have the ```druid-datasketches``` extension loaded into your Druid cluster in order to use these.
159166
See the [Druid datasketches](http://druid.io/docs/latest/development/extensions-core/datasketches-aggregators.html) documentation for details.
@@ -185,5 +192,48 @@ ts = query.groupby(
185192
postaggregator.ThetaSketch('product_A_users') & postaggregator.ThetaSketch('product_B_users')
186193
)
187194
}
188-
)
189-
```
195+
)
196+
```
197+
198+
# DB API
199+
200+
```python
201+
from pydruid.db import connect
202+
203+
conn = connect(host='localhost', port=8082, path='/druid/v2/sql/', scheme='http')
204+
curs = conn.cursor()
205+
curs.execute("""
206+
SELECT place,
207+
CAST(REGEXP_EXTRACT(place, '(.*),', 1) AS FLOAT) AS lat,
208+
CAST(REGEXP_EXTRACT(place, ',(.*)', 1) AS FLOAT) AS lon
209+
FROM places
210+
LIMIT 10
211+
""")
212+
for row in curs:
213+
print(row)
214+
```
215+
216+
# SQLAlchemy
217+
218+
```python
219+
from sqlalchemy import *
220+
from sqlalchemy.engine import create_engine
221+
from sqlalchemy.schema import *
222+
223+
engine = create_engine('druid://localhost:8082/druid/v2/sql/') # uses HTTP by default :(
224+
# engine = create_engine('druid+http://localhost:8082/druid/v2/sql/')
225+
# engine = create_engine('druid+https://localhost:8082/druid/v2/sql/')
226+
227+
places = Table('places', MetaData(bind=engine), autoload=True)
228+
print(select([func.count('*')], from_obj=places).scalar())
229+
```
230+
231+
# Command line
232+
233+
```bash
234+
$ pydruid http://localhost:8082/druid/v2/sql/
235+
> SELECT COUNT(*) AS cnt FROM places
236+
cnt
237+
-----
238+
12345
239+
```

pydruid/console.py

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
from __future__ import unicode_literals
2+
3+
import os
4+
import sys
5+
6+
from prompt_toolkit import prompt, AbortAction
7+
from prompt_toolkit.history import FileHistory
8+
from prompt_toolkit.contrib.completers import WordCompleter
9+
from pygments.lexers import SqlLexer
10+
from pygments.style import Style
11+
from pygments.token import Token
12+
from pygments.styles.default import DefaultStyle
13+
from six.moves.urllib import parse
14+
from tabulate import tabulate
15+
16+
from pydruid.db.api import connect
17+
18+
19+
keywords = [
20+
'EXPLAIN PLAN FOR',
21+
'WITH',
22+
'SELECT',
23+
'ALL',
24+
'DISTINCT',
25+
'FROM',
26+
'WHERE',
27+
'GROUP BY',
28+
'HAVING',
29+
'ORDER BY',
30+
'ASC',
31+
'DESC',
32+
'LIMIT',
33+
]
34+
35+
aggregate_functions = [
36+
'COUNT',
37+
'SUM',
38+
'MIN',
39+
'MAX',
40+
'AVG',
41+
'APPROX_COUNT_DISTINCT',
42+
'APPROX_QUANTILE',
43+
]
44+
45+
numeric_functions = [
46+
'ABS',
47+
'CEIL',
48+
'EXP',
49+
'FLOOR',
50+
'LN',
51+
'LOG10',
52+
'POW',
53+
'SQRT',
54+
]
55+
56+
string_functions = [
57+
'CHARACTER_LENGTH',
58+
'LOOKUP',
59+
'LOWER',
60+
'REGEXP_EXTRACT',
61+
'REPLACE',
62+
'SUBSTRING',
63+
'TRIM',
64+
'BTRIM',
65+
'RTRIM',
66+
'LTRIM',
67+
'UPPER',
68+
]
69+
70+
time_functions = [
71+
'CURRENT_TIMESTAMP',
72+
'CURRENT_DATE',
73+
'TIME_FLOOR',
74+
'TIME_SHIFT',
75+
'TIME_EXTRACT',
76+
'TIME_PARSE',
77+
'TIME_FORMAT',
78+
'MILLIS_TO_TIMESTAMP',
79+
'TIMESTAMP_TO_MILLIS',
80+
'EXTRACT',
81+
'FLOOR',
82+
'CEIL',
83+
]
84+
85+
other_functions = [
86+
'CAST',
87+
'CASE',
88+
'WHEN',
89+
'THEN',
90+
'END',
91+
'NULLIF',
92+
'COALESCE',
93+
]
94+
95+
96+
class DocumentStyle(Style):
97+
styles = {
98+
Token.Menu.Completions.Completion.Current: 'bg:#00aaaa #000000',
99+
Token.Menu.Completions.Completion: 'bg:#008888 #ffffff',
100+
Token.Menu.Completions.ProgressButton: 'bg:#003333',
101+
Token.Menu.Completions.ProgressBar: 'bg:#00aaaa',
102+
}
103+
styles.update(DefaultStyle.styles)
104+
105+
106+
def get_connection_kwargs(url):
107+
parts = parse.urlparse(url)
108+
if ':' in parts.netloc:
109+
host, port = parts.netloc.split(':', 1)
110+
port = int(port)
111+
else:
112+
host = parts.netloc
113+
port = 8082
114+
115+
return {
116+
'host': host,
117+
'port': port,
118+
'path': parts.path,
119+
'scheme': parts.scheme,
120+
}
121+
122+
123+
def get_tables(connection):
124+
cursor = connection.cursor()
125+
return [
126+
row.TABLE_NAME for row in
127+
cursor.execute('SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES')
128+
]
129+
130+
131+
def get_autocomplete(connection):
132+
return (
133+
keywords +
134+
aggregate_functions +
135+
numeric_functions +
136+
string_functions +
137+
time_functions +
138+
other_functions +
139+
get_tables(connection)
140+
)
141+
142+
143+
def main():
144+
history = FileHistory(os.path.expanduser('~/.pydruid_history'))
145+
146+
try:
147+
url = sys.argv[1]
148+
except IndexError:
149+
url = 'http://localhost:8082/druid/v2/sql/'
150+
kwargs = get_connection_kwargs(url)
151+
connection = connect(**kwargs)
152+
cursor = connection.cursor()
153+
154+
words = get_autocomplete(connection)
155+
sql_completer = WordCompleter(words, ignore_case=True)
156+
157+
while True:
158+
try:
159+
query = prompt(
160+
'> ', lexer=SqlLexer, completer=sql_completer,
161+
style=DocumentStyle, history=history,
162+
on_abort=AbortAction.RETRY)
163+
except EOFError:
164+
break # Control-D pressed.
165+
166+
# run query
167+
if query.strip():
168+
try:
169+
result = cursor.execute(query.rstrip(';'))
170+
except Exception as e:
171+
print(e)
172+
continue
173+
174+
headers = [t[0] for t in cursor.description]
175+
print(tabulate(result, headers=headers))
176+
177+
print('GoodBye!')
178+
179+
180+
if __name__ == '__main__':
181+
main()

pydruid/db/__init__.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from pydruid.db.api import connect
2+
from pydruid.db.exceptions import (
3+
DataError,
4+
DatabaseError,
5+
Error,
6+
IntegrityError,
7+
InterfaceError,
8+
InternalError,
9+
NotSupportedError,
10+
OperationalError,
11+
ProgrammingError,
12+
Warning,
13+
)
14+
15+
16+
__all__ = [
17+
'connect',
18+
'apilevel',
19+
'threadsafety',
20+
'paramstyle',
21+
'DataError',
22+
'DatabaseError',
23+
'Error',
24+
'IntegrityError',
25+
'InterfaceError',
26+
'InternalError',
27+
'NotSupportedError',
28+
'OperationalError',
29+
'ProgrammingError',
30+
'Warning',
31+
]
32+
33+
34+
apilevel = '2.0'
35+
# Threads may share the module and connections
36+
threadsafety = 2
37+
paramstyle = 'pyformat'

0 commit comments

Comments
 (0)