Skip to content

Commit 4f2ee5c

Browse files
authored
VER: Release 0.13.0
See release notes.
2 parents c42604a + eb20cd0 commit 4f2ee5c

19 files changed

+536
-756
lines changed

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
# Changelog
22

3+
## 0.13.0 - 2023-06-02
4+
- Added support for `statistics` schema
5+
- Added batch download support data files (`condition.json` and `symbology.json`)
6+
- Upgraded `databento-dbn` to 0.6.1
7+
- Renamed `booklevel` MBP field to `levels` for brevity and consistent naming
8+
- Changed `flags` field to an unsigned int
9+
- Changed default of `ts_out` to `False` for `Live` client
10+
- Changed `instrument_class` DataFrame representation to be consistent with other `char` types
11+
- Removed `open_interest_qty` and `cleared_volume` fields that were always unset from definition schema
12+
- Removed sunset `timeseries.stream` method
13+
314
## 0.12.0 - 2023-05-01
415
- Added `Live` client for connecting to Databento's live service
516
- Upgraded `databento-dbn` to 0.5.0
@@ -11,6 +22,8 @@
1122
- Removed `bad` condition variant from `batch.get_dataset_condition`
1223
- Added `degraded`, `pending` and `missing` condition variants for `batch.get_dataset_condition`
1324
- Added `last_modified_date` field to `batch.get_dataset_condition` response
25+
- Renamed `product_id` field to `instrument_id`
26+
- Renamed `symbol` field in definitions to `raw_symbol`
1427
- Deprecated `SType.PRODUCT_ID` to `SType.INSTRUMENT_ID`
1528
- Deprecated `SType.NATIVE` to `SType.RAW_SYMBOL`
1629
- Deprecated `SType.SMART` to `SType.PARENT` and `SType.CONTINUOUS`

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,12 @@ You can find our full client API reference on the [Historical Reference](https:/
3030
The library is fully compatible with the latest distribution of Anaconda 3.7 and above.
3131
The minimum dependencies as found in the `requirements.txt` are also listed below:
3232
- Python (>=3.7)
33-
- aiohttp (>=3.7.2)
34-
- databento-dbn (==0.5.0)
33+
- aiohttp (>=3.7.2,<4.0.0)
34+
- databento-dbn (==0.6.1)
3535
- numpy (>=1.17.0)
3636
- pandas (>=1.1.3)
3737
- requests (>=2.24.0)
38-
- zstandard (>=0.20.0)
38+
- zstandard (>=0.21.0)
3939

4040
## Installation
4141
To install the latest stable version of the package from PyPI:
@@ -57,7 +57,7 @@ client = db.Historical('YOUR_API_KEY')
5757
data = client.timeseries.get_range(
5858
dataset='GLBX.MDP3',
5959
symbols='ES.FUT',
60-
stype_in='smart',
60+
stype_in='parent',
6161
start='2022-06-10T14:30',
6262
end='2022-06-10T14:40',
6363
)
@@ -72,7 +72,7 @@ and dispatch each data event to an event handler. You can also use
7272
`.to_df()` or `.to_ndarray()` to cast the data into a Pandas `DataFrame` or numpy `ndarray`:
7373

7474
```python
75-
df = data.to_df(pretty_ts=True, pretty_px=True) # to DataFrame, with pretty formatting
75+
df = data.to_df() # to DataFrame
7676
array = data.to_ndarray() # to ndarray
7777
```
7878

databento/common/data.py

Lines changed: 39 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
4848
("order_id", np.uint64),
4949
("price", np.int64),
5050
("size", np.uint32),
51-
("flags", np.int8),
51+
("flags", np.uint8),
5252
("channel_id", np.uint8),
5353
("action", "S1"), # 1 byte chararray
5454
("side", "S1"), # 1 byte chararray
@@ -62,7 +62,7 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
6262
("size", np.uint32),
6363
("action", "S1"), # 1 byte chararray
6464
("side", "S1"), # 1 byte chararray
65-
("flags", np.int8),
65+
("flags", np.uint8),
6666
("depth", np.uint8),
6767
("ts_recv", np.uint64),
6868
("ts_in_delta", np.int32),
@@ -93,7 +93,7 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
9393
("price_ratio", np.int64),
9494
("inst_attrib_value", np.int32),
9595
("underlying_id", np.uint32),
96-
("cleared_volume", np.int32),
96+
("_reserved1", "S4"),
9797
("market_depth_implied", np.int32),
9898
("market_depth", np.int32),
9999
("market_segment_id", np.uint32),
@@ -102,11 +102,11 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
102102
("min_lot_size_block", np.int32),
103103
("min_lot_size_round_lot", np.int32),
104104
("min_trade_vol", np.uint32),
105-
("open_interest_qty", np.int32),
105+
("_reserved2", "S4"),
106106
("contract_multiplier", np.int32),
107107
("decay_quantity", np.int32),
108108
("original_contract_size", np.int32),
109-
("reserved1", "S4"),
109+
("_reserved3", "S4"),
110110
("trading_reference_date", np.uint16),
111111
("appl_id", np.int16),
112112
("maturity_year", np.uint16),
@@ -125,9 +125,9 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
125125
("underlying", "S21"), # 21 byte chararray
126126
("strike_price_currency", "S4"),
127127
("instrument_class", "S1"),
128-
("reserved2", "S2"),
128+
("_reserved4", "S2"),
129129
("strike_price", np.int64),
130-
("reserved3", "S6"),
130+
("_reserved5", "S6"),
131131
("match_algorithm", "S1"), # 1 byte chararray
132132
("md_security_trading_status", np.uint8),
133133
("main_fraction", np.uint8),
@@ -170,6 +170,20 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
170170
("dummy", "S1"),
171171
]
172172

173+
STATISTICS_MSG: List[Tuple[str, Union[type, str]]] = RECORD_HEADER + [
174+
("ts_recv", np.uint64),
175+
("ts_ref", np.uint64),
176+
("price", np.int64),
177+
("quantity", np.int32),
178+
("sequence", np.uint32),
179+
("ts_in_delta", np.int32),
180+
("stat_type", np.uint16),
181+
("channel_id", np.uint16),
182+
("update_action", np.uint8),
183+
("stat_flags", np.uint8),
184+
("dummy", "S6"),
185+
]
186+
173187

174188
STRUCT_MAP: Dict[Schema, List[Tuple[str, Union[type, str]]]] = {
175189
Schema.MBO: MBO_MSG,
@@ -193,6 +207,7 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
193207
Schema.OHLCV_1D: OHLCV_MSG,
194208
Schema.DEFINITION: DEFINITION_MSG,
195209
Schema.IMBALANCE: IMBALANCE_MSG,
210+
Schema.STATISTICS: STATISTICS_MSG,
196211
}
197212

198213

@@ -208,20 +223,21 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
208223
"security_type",
209224
"unit_of_measure",
210225
"underlying",
226+
"strike_price_currency",
227+
"instrument_class",
211228
"match_algorithm",
212229
"security_update_action",
213230
"user_defined_instrument",
214-
"strike_price_currency",
215231
]
216232

217233
DEFINITION_PRICE_COLUMNS = [
218234
"min_price_increment",
219-
"display_factor",
220235
"high_limit_price",
221236
"low_limit_price",
222237
"max_price_variation",
223238
"trading_reference_price",
224239
"min_price_increment_amount",
240+
"price_ratio",
225241
"strike_price",
226242
]
227243

@@ -288,6 +304,13 @@ def get_deriv_ba_fields(level: int) -> List[str]:
288304
"dummy",
289305
]
290306

307+
STATISTICS_DROP_COLUMNS = [
308+
"ts_recv",
309+
"length",
310+
"rtype",
311+
"dummy",
312+
]
313+
291314
DEFINITION_COLUMNS = [
292315
x
293316
for x in (np.dtype(DEFINITION_MSG).names or ())
@@ -298,6 +321,12 @@ def get_deriv_ba_fields(level: int) -> List[str]:
298321
x for x in (np.dtype(IMBALANCE_MSG).names or ()) if x not in IMBALANCE_DROP_COLUMNS
299322
]
300323

324+
STATISTICS_COLUMNS = [
325+
x
326+
for x in (np.dtype(STATISTICS_MSG).names or ())
327+
if x not in STATISTICS_DROP_COLUMNS
328+
]
329+
301330
COLUMNS = {
302331
Schema.MBO: [
303332
"ts_event",
@@ -333,4 +362,5 @@ def get_deriv_ba_fields(level: int) -> List[str]:
333362
Schema.OHLCV_1D: OHLCV_HEADER_COLUMNS,
334363
Schema.DEFINITION: DEFINITION_COLUMNS,
335364
Schema.IMBALANCE: IMBALANCE_COLUMNS,
365+
Schema.STATISTICS: STATISTICS_COLUMNS,
336366
}

databento/common/dbnstore.py

Lines changed: 49 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import abc
44
import datetime as dt
55
import logging
6+
from collections.abc import Generator
67
from io import BytesIO
78
from os import PathLike
89
from pathlib import Path
@@ -12,7 +13,6 @@
1213
Any,
1314
Callable,
1415
Dict,
15-
Generator,
1616
List,
1717
Optional,
1818
Union,
@@ -21,20 +21,25 @@
2121
import numpy as np
2222
import pandas as pd
2323
import zstandard
24-
from databento.common.data import (
25-
COLUMNS,
26-
DEFINITION_CHARARRAY_COLUMNS,
27-
DEFINITION_PRICE_COLUMNS,
28-
DEFINITION_TYPE_MAX_MAP,
29-
DERIV_SCHEMAS,
30-
STRUCT_MAP,
31-
)
32-
from databento.common.enums import Compression, Schema, SType
24+
from databento_dbn import DBNDecoder
25+
from databento_dbn import ErrorMsg
26+
from databento_dbn import Metadata
27+
from databento_dbn import SymbolMappingMsg
28+
from databento_dbn import SystemMsg
29+
30+
from databento.common.data import COLUMNS
31+
from databento.common.data import DEFINITION_CHARARRAY_COLUMNS
32+
from databento.common.data import DEFINITION_PRICE_COLUMNS
33+
from databento.common.data import DEFINITION_TYPE_MAX_MAP
34+
from databento.common.data import DERIV_SCHEMAS
35+
from databento.common.data import STRUCT_MAP
36+
from databento.common.enums import Compression
37+
from databento.common.enums import Schema
38+
from databento.common.enums import SType
3339
from databento.common.error import BentoError
3440
from databento.common.symbology import InstrumentIdMappingInterval
3541
from databento.common.validation import validate_maybe_enum
3642
from databento.live.data import DBNStruct
37-
from databento_dbn import DbnDecoder, ErrorMsg, Metadata, SymbolMappingMsg, SystemMsg
3843

3944

4045
NON_SCHEMA_RECORD_TYPES = [
@@ -264,7 +269,7 @@ class DBNStore:
264269
The data compression format (if any).
265270
dataset : str
266271
The dataset code.
267-
end : pd.Timestamp
272+
end : pd.Timestamp or None
268273
The query end for the data.
269274
limit : int | None
270275
The query limit for the data.
@@ -282,7 +287,7 @@ class DBNStore:
282287
The data record schema.
283288
start : pd.Timestamp
284289
The query start for the data.
285-
stype_in : SType
290+
stype_in : SType or None
286291
The query input symbology type for the data.
287292
stype_out : SType
288293
The query output symbology type for the data.
@@ -354,7 +359,7 @@ def __init__(self, data_source: DataSource) -> None:
354359

355360
def __iter__(self) -> Generator[DBNStruct, None, None]:
356361
reader = self.reader
357-
decoder = DbnDecoder()
362+
decoder = DBNDecoder()
358363
while True:
359364
raw = reader.read(DBNStore.DBN_READ_SIZE)
360365
if raw:
@@ -363,7 +368,7 @@ def __iter__(self) -> Generator[DBNStruct, None, None]:
363368
records = decoder.decode()
364369
except ValueError:
365370
continue
366-
for record, _ in records:
371+
for record in records:
367372
yield record
368373
else:
369374
if len(decoder.buffer()) > 0:
@@ -380,11 +385,19 @@ def _apply_pretty_ts(self, df: pd.DataFrame) -> pd.DataFrame:
380385
df.index = pd.to_datetime(df.index, utc=True)
381386
for column in df.columns:
382387
if column.startswith("ts_") and "delta" not in column:
383-
df[column] = pd.to_datetime(df[column], utc=True)
388+
df[column] = pd.to_datetime(df[column], errors="coerce", utc=True)
384389

385390
if self.schema == Schema.DEFINITION:
386-
df["expiration"] = pd.to_datetime(df["expiration"], utc=True)
387-
df["activation"] = pd.to_datetime(df["activation"], utc=True)
391+
df["expiration"] = pd.to_datetime(
392+
df["expiration"],
393+
errors="coerce",
394+
utc=True,
395+
)
396+
df["activation"] = pd.to_datetime(
397+
df["activation"],
398+
errors="coerce",
399+
utc=True,
400+
)
388401

389402
return df
390403

@@ -479,8 +492,7 @@ def _map_symbols(self, df: pd.DataFrame, pretty_ts: bool) -> pd.DataFrame:
479492
df_index = df.index if pretty_ts else pd.to_datetime(df.index, utc=True)
480493
dates = [ts.date() for ts in df_index]
481494
df["symbol"] = [
482-
self._instrument_id_index[dates[i]][p]
483-
for i, p in enumerate(df["instrument_id"])
495+
self._instrument_id_index[dates[i]][p] for i, p in enumerate(df["instrument_id"])
484496
]
485497

486498
return df
@@ -511,20 +523,24 @@ def dataset(self) -> str:
511523
return str(self._metadata.dataset)
512524

513525
@property
514-
def end(self) -> pd.Timestamp:
526+
def end(self) -> Optional[pd.Timestamp]:
515527
"""
516528
Return the query end for the data.
529+
If None, the end time was not known when the data was generated.
517530
518531
Returns
519532
-------
520-
pd.Timestamp
533+
pd.Timestamp or None
521534
522535
Notes
523536
-----
524537
The data timestamps will not occur after `end`.
525538
526539
"""
527-
return pd.Timestamp(self._metadata.end, tz="UTC")
540+
end = self._metadata.end
541+
if end:
542+
return pd.Timestamp(self._metadata.end, tz="UTC")
543+
return None
528544

529545
@property
530546
def limit(self) -> Optional[int]:
@@ -625,7 +641,7 @@ def schema(self) -> Optional[Schema]:
625641
626642
"""
627643
schema = self._metadata.schema
628-
if schema is not None:
644+
if schema:
629645
return Schema(self._metadata.schema)
630646
return None
631647

@@ -646,16 +662,20 @@ def start(self) -> pd.Timestamp:
646662
return pd.Timestamp(self._metadata.start, tz="UTC")
647663

648664
@property
649-
def stype_in(self) -> SType:
665+
def stype_in(self) -> Optional[SType]:
650666
"""
651667
Return the query input symbology type for the data.
668+
If None, the records may contain mixed STypes.
652669
653670
Returns
654671
-------
655-
SType
672+
SType or None
656673
657674
"""
658-
return SType(self._metadata.stype_in)
675+
stype = self._metadata.stype_in
676+
if stype:
677+
return SType(self._metadata.stype_in)
678+
return None
659679

660680
@property
661681
def stype_out(self) -> SType:
@@ -774,7 +794,7 @@ def request_full_definitions(
774794
"""
775795
Request full instrument definitions based on the metadata properties.
776796
777-
Makes a `GET /timeseries.stream` HTTP request.
797+
Makes a `GET /timeseries.get_range` HTTP request.
778798
779799
Parameters
780800
----------
@@ -792,7 +812,7 @@ def request_full_definitions(
792812
Calling this method will incur a cost.
793813
794814
"""
795-
return client.timeseries.stream(
815+
return client.timeseries.get_range(
796816
dataset=self.dataset,
797817
symbols=self.symbols,
798818
schema=Schema.DEFINITION,

0 commit comments

Comments
 (0)