Skip to content

Commit 5fe529e

Browse files
authored
Release 0.4.0
See release notes
2 parents c652976 + 3277e3c commit 5fe529e

23 files changed

+190
-431
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Changelog
22

3+
## 0.4.0 - 2022-09-14
4+
- Upgraded `dbz-python` to `0.1.5`
5+
- Added `map_symbols` option for `.to_df()` (experimental)
36

47
## 0.3.0 - 2022-08-30
58
- Initial release

README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,29 +9,29 @@
99
The official Python client library for [Databento](https://databento.com).
1010

1111
Key features include:
12-
- Fast, lightweight access to both live and historical data from [multiple markets]().
13-
- [Multiple schemas]() such as MBO, MBP, top of book, OHLCV, last sale, and more.
14-
- [Fully normalized](), i.e. identical message schemas for both live and historical data, across multiple asset classes.
15-
- Provides mappings between different symbology systems, including [smart symbology]() for futures rollovers.
12+
- Fast, lightweight access to both live and historical data from [multiple markets](https://docs0.databento.com/knowledge-base/new-users/venues-and-publishers?historical=python&live=python).
13+
- [Multiple schemas](https://docs0.databento.com/knowledge-base/new-users/list-of-supported-market-data-schemas?historical=python&live=python) such as MBO, MBP, top of book, OHLCV, last sale, and more.
14+
- [Fully normalized](https://docs0.databento.com/knowledge-base/new-users/normalization?historical=python&live=python), i.e. identical message schemas for both live and historical data, across multiple asset classes.
15+
- Provides mappings between different symbology systems, including [smart symbology](https://docs0.databento.com/reference-historical/basics/symbology?historical=python&live=python) for futures rollovers.
1616
- [Point-in-time]() instrument definitions, free of look-ahead bias and retroactive adjustments.
17-
- Reads and stores market data in an extremely efficient file format using [Databento Binary Encoding]().
18-
- Event-driven [market replay](), including at high-frequency order book granularity.
19-
- Support for [batch download]() of flat files.
20-
- Support for [pandas](), CSV, and JSON.
17+
- Reads and stores market data in an extremely efficient file format using [Databento Binary Encoding](https://docs0.databento.com/knowledge-base/new-users/dbz-format?historical=python&live=python).
18+
- Event-driven [market replay](https://docs0.databento.com/reference-historical/helpers/bento-replay?historical=python&live=python), including at high-frequency order book granularity.
19+
- Support for [batch download](https://docs0.databento.com/knowledge-base/new-users/historical-data-streaming-vs-batch-download?historical=python&live=python) of flat files.
20+
- Support for [pandas](https://pandas.pydata.org/docs/), CSV, and JSON.
2121

2222
## Documentation
2323
The best place to begin is with our [Getting started](https://docs.databento.com/getting-started?historical=python&live=python) guide.
2424

2525
You can find our full client API reference on the [Historical Reference](https://docs.databento.com/reference-historical?historical=python&live=python) and
2626
[Live Reference](https://docs.databento.com/reference-live?historical=python&live=python) sections of our documentation. See also the
27-
[Examples]() section for various tutorials and code samples.
27+
[Examples](https://docs0.databento.com/examples?historical=python&live=python) section for various tutorials and code samples.
2828

2929
## Requirements
3030
The library is fully compatible with the latest distribution of Anaconda 3.7 and above.
3131
The minimum dependencies as found in the `requirements.txt` are also listed below:
3232
- Python (>=3.7)
3333
- aiohttp (>=3.7.2)
34-
- dbz-lib (>=0.1.1)
34+
- dbz-python (>=0.1.5)
3535
- numpy (>=1.17.0)
3636
- pandas (>=1.1.3)
3737
- requests (>=2.24.0)
@@ -56,10 +56,11 @@ import databento as db
5656
client = db.Historical('YOUR_API_KEY')
5757
data = client.timeseries.stream(
5858
dataset='GLBX.MDP3',
59-
start='2020-11-02T14:30',
60-
end='2020-11-02T14:40')
59+
start='2022-06-10T14:30',
60+
end='2022-06-10T14:40',
61+
)
6162

62-
data.replay(callback=print) # market replay, with `print` as event handler
63+
data.replay(callback=print) # market replay, with `print` as event handler
6364
```
6465

6566
Replace `YOUR_API_KEY` with an actual API key, then run this program.
@@ -70,7 +71,7 @@ and dispatch each data event to an event handler. You can also use
7071

7172
```python
7273
df = data.to_df(pretty_ts=True, pretty_px=True) # to DataFrame, with pretty formatting
73-
array = data.to_ndarray() # to ndarray
74+
array = data.to_ndarray() # to ndarray
7475
```
7576

7677
Note that the API key was also passed as a parameter, which is
@@ -81,7 +82,7 @@ Instead, you can leave out this parameter to pass your API key via the `DATABENT
8182
import databento as db
8283

8384
client = db.Historical('YOUR_API_KEY') # pass as parameter
84-
client = db.Historical() # pass as `DATABENTO_API_KEY` environment variable
85+
client = db.Historical() # pass as `DATABENTO_API_KEY` environment variable
8586
```
8687

8788
## License

databento/common/bento.py

Lines changed: 65 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import datetime as dt
12
import io
23
import os.path
34
from typing import Any, BinaryIO, Callable, Dict, List, Optional, Tuple
@@ -9,6 +10,7 @@
910
from databento.common.enums import Compression, Encoding, Schema, SType
1011
from databento.common.logging import log_debug
1112
from databento.common.metadata import MetadataDecoder
13+
from databento.common.symbology import ProductIdMappingInterval
1214

1315

1416
class Bento:
@@ -17,6 +19,7 @@ class Bento:
1719
def __init__(self):
1820
self._metadata: Dict[str, Any] = {}
1921
self._dtype: Optional[np.dtype] = None
22+
self._product_id_index: Dict[dt.date, Dict[int, str]] = {}
2023

2124
self._dataset: Optional[str] = None
2225
self._schema: Optional[Schema] = None
@@ -353,13 +356,13 @@ def shape(self) -> Tuple:
353356
return self._shape
354357

355358
@property
356-
def mappings(self) -> List[Dict[str, List[Dict[str, str]]]]:
359+
def mappings(self) -> Dict[str, List[Dict[str, Any]]]:
357360
"""
358361
Return the symbology mappings for the data.
359362
360363
Returns
361364
-------
362-
List[Dict[str, List[Dict[str, str]]]]
365+
Dict[str, List[Dict[str, Any]]]
363366
364367
"""
365368
self._check_metadata()
@@ -369,7 +372,7 @@ def mappings(self) -> List[Dict[str, List[Dict[str, str]]]]:
369372
@property
370373
def symbology(self) -> Dict[str, Any]:
371374
"""
372-
Return the symbology resolution information for the query.
375+
Return the symbology resolution mappings for the data.
373376
374377
Returns
375378
-------
@@ -378,30 +381,18 @@ def symbology(self) -> Dict[str, Any]:
378381
"""
379382
self._check_metadata()
380383

381-
status = 0
382-
if self._metadata["partial"]:
383-
status = 1
384-
message = "Partially resolved"
385-
elif self._metadata["not_found"]:
386-
status = 2
387-
message = "Not found"
388-
else:
389-
message = "OK"
390-
391-
response: Dict[str, Any] = {
392-
"result": self.mappings,
384+
symbology: Dict[str, Any] = {
393385
"symbols": self.symbols,
394386
"stype_in": self.stype_in.value,
395387
"stype_out": self.stype_out.value,
396388
"start_date": str(self.start.date()),
397389
"end_date": str(self.end.date()),
398390
"partial": self._metadata["partial"],
399391
"not_found": self._metadata["not_found"],
400-
"message": message,
401-
"status": status,
392+
"mappings": self.mappings,
402393
}
403394

404-
return response
395+
return symbology
405396

406397
def to_ndarray(self) -> np.ndarray:
407398
"""
@@ -415,7 +406,12 @@ def to_ndarray(self) -> np.ndarray:
415406
data: bytes = self.reader(decompress=True).read()
416407
return np.frombuffer(data, dtype=DBZ_STRUCT_MAP[self.schema])
417408

418-
def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFrame:
409+
def to_df(
410+
self,
411+
pretty_ts: bool = False,
412+
pretty_px: bool = False,
413+
map_symbols: bool = False,
414+
) -> pd.DataFrame:
419415
"""
420416
Return the data as a `pd.DataFrame`.
421417
@@ -427,6 +423,10 @@ def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFram
427423
pretty_px : bool, default False
428424
If all price columns should be converted from `int` to `float` at
429425
the correct scale (using the fixed precision scalar 1e-9).
426+
map_symbols : bool, default False
427+
If symbology mappings from the metadata should be used to create
428+
a 'symbol' column, mapping the product ID to its native symbol for
429+
every record.
430430
431431
Returns
432432
-------
@@ -467,6 +467,20 @@ def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFram
467467
):
468468
df[column] = df[column] * 1e-9
469469

470+
if map_symbols:
471+
# Build product ID index
472+
if not self._product_id_index:
473+
self._product_id_index = self._build_product_id_index()
474+
475+
# Map product IDs to native symbols
476+
if self._product_id_index:
477+
df_index = df.index if pretty_ts else pd.to_datetime(df.index, utc=True)
478+
dates = [ts.date() for ts in df_index]
479+
df["symbol"] = [
480+
self._product_id_index[dates[i]][p]
481+
for i, p in enumerate(df["product_id"])
482+
]
483+
470484
return df
471485

472486
def replay(self, callback: Callable[[Any], None]) -> None:
@@ -643,6 +657,38 @@ def request_full_definitions(
643657
path=path,
644658
)
645659

660+
def _build_product_id_index(self) -> Dict[dt.date, Dict[int, str]]:
661+
intervals: List[ProductIdMappingInterval] = []
662+
for native, i in self.mappings.items():
663+
for row in i:
664+
symbol = row["symbol"]
665+
if symbol == "":
666+
continue
667+
intervals.append(
668+
ProductIdMappingInterval(
669+
start_date=row["start_date"],
670+
end_date=row["end_date"],
671+
native=native,
672+
product_id=int(row["symbol"]),
673+
)
674+
)
675+
676+
product_id_index: Dict[dt.date, Dict[int, str]] = {}
677+
for interval in intervals:
678+
for ts in pd.date_range(
679+
start=interval.start_date,
680+
end=interval.end_date,
681+
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
682+
**{"inclusive" if pd.__version__ >= "1.4.0" else "closed": "left"},
683+
):
684+
d: dt.date = ts.date()
685+
date_map: Dict[int, str] = product_id_index.get(d, {})
686+
if not date_map:
687+
product_id_index[d] = date_map
688+
date_map[interval.product_id] = interval.native
689+
690+
return product_id_index
691+
646692

647693
class MemoryBento(Bento):
648694
"""

databento/common/metadata.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from typing import Any, Dict
22

33
from databento.common.parsing import int_to_compression, int_to_schema, int_to_stype
4-
from dbz_lib import decode_metadata
4+
from dbz_python import decode_metadata
55

66

77
class MetadataDecoder:
@@ -37,6 +37,8 @@ def enum_value(fn):
3737
"stype_in": enum_value(int_to_stype),
3838
"stype_out": enum_value(int_to_stype),
3939
}
40+
4041
for key, conv_fn in conversion_mapping.items():
4142
metadata[key] = conv_fn(metadata[key])
43+
4244
return metadata

databento/common/symbology.py

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
import datetime as dt
2+
3+
4+
class ProductIdMappingInterval:
5+
"""
6+
Represents a product ID to native symbol mapping over a start and end date
7+
range interval.
8+
9+
Parameters
10+
----------
11+
start_date : dt.date
12+
The start of the mapping period.
13+
end_date : dt.date
14+
The end of the mapping period.
15+
native : str
16+
The native symbol value.
17+
product_id : int
18+
The product ID value.
19+
"""
20+
21+
def __init__(
22+
self,
23+
start_date: dt.date,
24+
end_date: dt.date,
25+
native: str,
26+
product_id: int,
27+
):
28+
self.start_date = start_date
29+
self.end_date = end_date
30+
self.native = native
31+
self.product_id = product_id
32+
33+
def __repr__(self):
34+
return (
35+
f"{type(self).__name__}("
36+
f"start_date={self.start_date}, "
37+
f"end_date={self.end_date}, "
38+
f"native='{self.native}', "
39+
f"product_id={self.product_id})"
40+
)

databento/historical/api/symbology.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ def resolve(
3939
The dataset code (string identifier) for the request.
4040
symbols : List[Union[str, int]] or str, optional
4141
The symbols to resolve. Takes up to 2,000 symbols per request.
42-
If `*` or ``None`` then will be for **all** symbols.
4342
stype_in : SType or str, default 'native'
4443
The input symbology type to resolve from.
4544
stype_out : SType or str, default 'product_id'

databento/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.3.0"
1+
__version__ = "0.4.0"

examples/historical_batch.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,11 @@
1111

1212
response = client.batch.submit_job(
1313
dataset="GLBX.MDP3",
14-
symbols=["ESH1"],
14+
symbols=["ESM2"],
1515
schema="mbo",
16-
start="2020-12-27T12:00",
17-
end="2020-12-29",
16+
start="2022-06-10T12:00",
17+
end="2022-06-10T14:00",
18+
limit=1000, # <-- limiting batch request to 1000 records only
1819
encoding="csv",
1920
compression="zstd",
2021
delivery="download",

examples/historical_metadata_get_billable_size.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99

1010
size: int = client.metadata.get_billable_size(
1111
dataset="GLBX.MDP3",
12-
symbols=["ESH1"],
12+
symbols=["ESM2"],
1313
schema="mbo",
14-
start="2020-12-28T12:00",
15-
end="2020-12-29",
14+
start="2022-06-10T12:00",
15+
end="2022-06-10T14:00",
1616
)
1717

1818
print(size)

examples/historical_metadata_get_cost.py

Lines changed: 5 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,12 @@
77
key = "YOUR_API_KEY"
88
client = db.Historical(key=key)
99

10-
cost1: float = client.metadata.get_cost(
10+
cost: float = client.metadata.get_cost(
1111
dataset="GLBX.MDP3",
12-
symbols="*",
12+
symbols="ESM2",
1313
schema="mbo",
14-
start="2020-12-27T12:00",
15-
end="2020-12-29",
14+
start="2022-06-10",
15+
end="2022-06-15",
1616
)
1717

18-
print(cost1)
19-
20-
cost2: float = client.metadata.get_cost(
21-
dataset="XNAS.ITCH",
22-
symbols=["MSFT"],
23-
schema="trades",
24-
start="2015-04-22",
25-
end="2015-04-22T12:10",
26-
)
27-
28-
print(cost2)
18+
print(cost)

0 commit comments

Comments
 (0)