databento
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 16 additions & 15 deletions b/‎README.md‎
Lines changed: 16 additions & 15 deletions
diff --git a/‎databento/common/bento.py‎
Lines changed: 65 additions & 19 deletions b/‎databento/common/bento.py‎
Lines changed: 65 additions & 19 deletions
diff --git a/‎databento/common/metadata.py‎
Lines changed: 3 additions & 1 deletion b/‎databento/common/metadata.py‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎databento/common/symbology.py‎
Lines changed: 40 additions & 0 deletions b/‎databento/common/symbology.py‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎databento/historical/api/symbology.py‎
Lines changed: 0 additions & 1 deletion b/‎databento/historical/api/symbology.py‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎databento/version.py‎
Lines changed: 1 addition & 1 deletion b/‎databento/version.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/historical_batch.py‎
Lines changed: 4 additions & 3 deletions b/‎examples/historical_batch.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎examples/historical_metadata_get_billable_size.py‎
Lines changed: 3 additions & 3 deletions b/‎examples/historical_metadata_get_billable_size.py‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/historical_metadata_get_cost.py‎
Lines changed: 5 additions & 15 deletions b/‎examples/historical_metadata_get_cost.py‎
Lines changed: 5 additions & 15 deletions
@@ -1,5 +1,8 @@
 # Changelog
 
+## 0.4.0 - 2022-09-14
+ - Upgraded `dbz-python` to `0.1.5`
+ - Added `map_symbols` option for `.to_df()` (experimental)
 
 ## 0.3.0 - 2022-08-30
  - Initial release
@@ -9,29 +9,29 @@
 The official Python client library for [Databento](https://databento.com).
 
 Key features include:
-- Fast, lightweight access to both live and historical data from [multiple markets]().
-- [Multiple schemas]() such as MBO, MBP, top of book, OHLCV, last sale, and more.
-- [Fully normalized](), i.e. identical message schemas for both live and historical data, across multiple asset classes.
-- Provides mappings between different symbology systems, including [smart symbology]() for futures rollovers.
+- Fast, lightweight access to both live and historical data from [multiple markets](https://docs0.databento.com/knowledge-base/new-users/venues-and-publishers?historical=python&live=python).
+- [Multiple schemas](https://docs0.databento.com/knowledge-base/new-users/list-of-supported-market-data-schemas?historical=python&live=python) such as MBO, MBP, top of book, OHLCV, last sale, and more.
+- [Fully normalized](https://docs0.databento.com/knowledge-base/new-users/normalization?historical=python&live=python), i.e. identical message schemas for both live and historical data, across multiple asset classes.
+- Provides mappings between different symbology systems, including [smart symbology](https://docs0.databento.com/reference-historical/basics/symbology?historical=python&live=python) for futures rollovers.
 - [Point-in-time]() instrument definitions, free of look-ahead bias and retroactive adjustments.
-- Reads and stores market data in an extremely efficient file format using [Databento Binary Encoding]().
-- Event-driven [market replay](), including at high-frequency order book granularity.
-- Support for [batch download]() of flat files.
-- Support for [pandas](), CSV, and JSON.
+- Reads and stores market data in an extremely efficient file format using [Databento Binary Encoding](https://docs0.databento.com/knowledge-base/new-users/dbz-format?historical=python&live=python).
+- Event-driven [market replay](https://docs0.databento.com/reference-historical/helpers/bento-replay?historical=python&live=python), including at high-frequency order book granularity.
+- Support for [batch download](https://docs0.databento.com/knowledge-base/new-users/historical-data-streaming-vs-batch-download?historical=python&live=python) of flat files.
+- Support for [pandas](https://pandas.pydata.org/docs/), CSV, and JSON.
 
 ## Documentation
 The best place to begin is with our [Getting started](https://docs.databento.com/getting-started?historical=python&live=python) guide.
 
 You can find our full client API reference on the [Historical Reference](https://docs.databento.com/reference-historical?historical=python&live=python) and
 [Live Reference](https://docs.databento.com/reference-live?historical=python&live=python) sections of our documentation. See also the
-[Examples]() section for various tutorials and code samples.
+[Examples](https://docs0.databento.com/examples?historical=python&live=python) section for various tutorials and code samples.
 
 ## Requirements
 The library is fully compatible with the latest distribution of Anaconda 3.7 and above.
 The minimum dependencies as found in the `requirements.txt` are also listed below:
 - Python (>=3.7)
 - aiohttp (>=3.7.2)
-- dbz-lib (>=0.1.1)
+- dbz-python (>=0.1.5)
 - numpy (>=1.17.0)
 - pandas (>=1.1.3)
 - requests (>=2.24.0)
@@ -56,10 +56,11 @@ import databento as db
 client = db.Historical('YOUR_API_KEY')
 data = client.timeseries.stream(
     dataset='GLBX.MDP3',
-    start='2020-11-02T14:30',
-    end='2020-11-02T14:40')
+    start='2022-06-10T14:30',
+    end='2022-06-10T14:40',
+)
 
-data.replay(callback=print)    # market replay, with `print` as event handler
+data.replay(callback=print)  # market replay, with `print` as event handler
 ```
 
 Replace `YOUR_API_KEY` with an actual API key, then run this program.
@@ -70,7 +71,7 @@ and dispatch each data event to an event handler. You can also use
 
 ```python
 df = data.to_df(pretty_ts=True, pretty_px=True)  # to DataFrame, with pretty formatting
-array = data.to_ndarray()                        # to ndarray
+array = data.to_ndarray()  # to ndarray
 ```
 
 Note that the API key was also passed as a parameter, which is
@@ -81,7 +82,7 @@ Instead, you can leave out this parameter to pass your API key via the `DATABENT
 import databento as db
 
 client = db.Historical('YOUR_API_KEY')  # pass as parameter
-client = db.Historical()                # pass as `DATABENTO_API_KEY` environment variable
+client = db.Historical()  # pass as `DATABENTO_API_KEY` environment variable
 ```
 
 ## License
 
@@ -1,3 +1,4 @@
+import datetime as dt
 import io
 import os.path
 from typing import Any, BinaryIO, Callable, Dict, List, Optional, Tuple
@@ -9,6 +10,7 @@
 from databento.common.enums import Compression, Encoding, Schema, SType
 from databento.common.logging import log_debug
 from databento.common.metadata import MetadataDecoder
+from databento.common.symbology import ProductIdMappingInterval
 
 
 class Bento:
@@ -17,6 +19,7 @@ class Bento:
     def __init__(self):
         self._metadata: Dict[str, Any] = {}
         self._dtype: Optional[np.dtype] = None
+        self._product_id_index: Dict[dt.date, Dict[int, str]] = {}
 
         self._dataset: Optional[str] = None
         self._schema: Optional[Schema] = None
@@ -353,13 +356,13 @@ def shape(self) -> Tuple:
         return self._shape
 
     @property
-    def mappings(self) -> List[Dict[str, List[Dict[str, str]]]]:
+    def mappings(self) -> Dict[str, List[Dict[str, Any]]]:
         """
         Return the symbology mappings for the data.
 
         Returns
         -------
-        List[Dict[str, List[Dict[str, str]]]]
+        Dict[str, List[Dict[str, Any]]]
 
         """
         self._check_metadata()
@@ -369,7 +372,7 @@ def mappings(self) -> List[Dict[str, List[Dict[str, str]]]]:
     @property
     def symbology(self) -> Dict[str, Any]:
         """
-        Return the symbology resolution information for the query.
+        Return the symbology resolution mappings for the data.
 
         Returns
         -------
@@ -378,30 +381,18 @@ def symbology(self) -> Dict[str, Any]:
         """
         self._check_metadata()
 
-        status = 0
-        if self._metadata["partial"]:
-            status = 1
-            message = "Partially resolved"
-        elif self._metadata["not_found"]:
-            status = 2
-            message = "Not found"
-        else:
-            message = "OK"
-
-        response: Dict[str, Any] = {
-            "result": self.mappings,
+        symbology: Dict[str, Any] = {
             "symbols": self.symbols,
             "stype_in": self.stype_in.value,
             "stype_out": self.stype_out.value,
             "start_date": str(self.start.date()),
             "end_date": str(self.end.date()),
             "partial": self._metadata["partial"],
             "not_found": self._metadata["not_found"],
-            "message": message,
-            "status": status,
+            "mappings": self.mappings,
         }
 
-        return response
+        return symbology
 
     def to_ndarray(self) -> np.ndarray:
         """
@@ -415,7 +406,12 @@ def to_ndarray(self) -> np.ndarray:
         data: bytes = self.reader(decompress=True).read()
         return np.frombuffer(data, dtype=DBZ_STRUCT_MAP[self.schema])
 
-    def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFrame:
+    def to_df(
+        self,
+        pretty_ts: bool = False,
+        pretty_px: bool = False,
+        map_symbols: bool = False,
+    ) -> pd.DataFrame:
         """
         Return the data as a `pd.DataFrame`.
 
@@ -427,6 +423,10 @@ def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFram
         pretty_px : bool, default False
             If all price columns should be converted from `int` to `float` at
             the correct scale (using the fixed precision scalar 1e-9).
+        map_symbols : bool, default False
+            If symbology mappings from the metadata should be used to create
+            a 'symbol' column, mapping the product ID to its native symbol for
+            every record.
 
         Returns
         -------
@@ -467,6 +467,20 @@ def to_df(self, pretty_ts: bool = False, pretty_px: bool = False) -> pd.DataFram
                 ):
                     df[column] = df[column] * 1e-9
 
+        if map_symbols:
+            # Build product ID index
+            if not self._product_id_index:
+                self._product_id_index = self._build_product_id_index()
+
+            # Map product IDs to native symbols
+            if self._product_id_index:
+                df_index = df.index if pretty_ts else pd.to_datetime(df.index, utc=True)
+                dates = [ts.date() for ts in df_index]
+                df["symbol"] = [
+                    self._product_id_index[dates[i]][p]
+                    for i, p in enumerate(df["product_id"])
+                ]
+
         return df
 
     def replay(self, callback: Callable[[Any], None]) -> None:
@@ -643,6 +657,38 @@ def request_full_definitions(
             path=path,
         )
 
+    def _build_product_id_index(self) -> Dict[dt.date, Dict[int, str]]:
+        intervals: List[ProductIdMappingInterval] = []
+        for native, i in self.mappings.items():
+            for row in i:
+                symbol = row["symbol"]
+                if symbol == "":
+                    continue
+                intervals.append(
+                    ProductIdMappingInterval(
+                        start_date=row["start_date"],
+                        end_date=row["end_date"],
+                        native=native,
+                        product_id=int(row["symbol"]),
+                    )
+                )
+
+        product_id_index: Dict[dt.date, Dict[int, str]] = {}
+        for interval in intervals:
+            for ts in pd.date_range(
+                start=interval.start_date,
+                end=interval.end_date,
+                # https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
+                **{"inclusive" if pd.__version__ >= "1.4.0" else "closed": "left"},
+            ):
+                d: dt.date = ts.date()
+                date_map: Dict[int, str] = product_id_index.get(d, {})
+                if not date_map:
+                    product_id_index[d] = date_map
+                date_map[interval.product_id] = interval.native
+
+        return product_id_index
+
 
 class MemoryBento(Bento):
     """
 
@@ -1,7 +1,7 @@
 from typing import Any, Dict
 
 from databento.common.parsing import int_to_compression, int_to_schema, int_to_stype
-from dbz_lib import decode_metadata
+from dbz_python import decode_metadata
 
 
 class MetadataDecoder:
@@ -37,6 +37,8 @@ def enum_value(fn):
             "stype_in": enum_value(int_to_stype),
             "stype_out": enum_value(int_to_stype),
         }
+
         for key, conv_fn in conversion_mapping.items():
             metadata[key] = conv_fn(metadata[key])
+
         return metadata
@@ -0,0 +1,40 @@
+import datetime as dt
+
+
+class ProductIdMappingInterval:
+    """
+    Represents a product ID to native symbol mapping over a start and end date
+    range interval.
+
+    Parameters
+    ----------
+    start_date : dt.date
+        The start of the mapping period.
+    end_date : dt.date
+        The end of the mapping period.
+    native : str
+        The native symbol value.
+    product_id : int
+        The product ID value.
+    """
+
+    def __init__(
+        self,
+        start_date: dt.date,
+        end_date: dt.date,
+        native: str,
+        product_id: int,
+    ):
+        self.start_date = start_date
+        self.end_date = end_date
+        self.native = native
+        self.product_id = product_id
+
+    def __repr__(self):
+        return (
+            f"{type(self).__name__}("
+            f"start_date={self.start_date}, "
+            f"end_date={self.end_date}, "
+            f"native='{self.native}', "
+            f"product_id={self.product_id})"
+        )
@@ -39,7 +39,6 @@ def resolve(
             The dataset code (string identifier) for the request.
         symbols : List[Union[str, int]] or str, optional
             The symbols to resolve. Takes up to 2,000 symbols per request.
-            If `*` or ``None`` then will be for **all** symbols.
         stype_in : SType or str, default 'native'
             The input symbology type to resolve from.
         stype_out : SType or str, default 'product_id'
 
@@ -1 +1 @@
-__version__ = "0.3.0"
+__version__ = "0.4.0"
@@ -11,10 +11,11 @@
 
     response = client.batch.submit_job(
         dataset="GLBX.MDP3",
-        symbols=["ESH1"],
+        symbols=["ESM2"],
         schema="mbo",
-        start="2020-12-27T12:00",
-        end="2020-12-29",
+        start="2022-06-10T12:00",
+        end="2022-06-10T14:00",
+        limit=1000,  # <-- limiting batch request to 1000 records only
         encoding="csv",
         compression="zstd",
         delivery="download",
 
@@ -9,10 +9,10 @@
 
     size: int = client.metadata.get_billable_size(
         dataset="GLBX.MDP3",
-        symbols=["ESH1"],
+        symbols=["ESM2"],
         schema="mbo",
-        start="2020-12-28T12:00",
-        end="2020-12-29",
+        start="2022-06-10T12:00",
+        end="2022-06-10T14:00",
     )
 
     print(size)
@@ -7,22 +7,12 @@
     key = "YOUR_API_KEY"
     client = db.Historical(key=key)
 
-    cost1: float = client.metadata.get_cost(
+    cost: float = client.metadata.get_cost(
         dataset="GLBX.MDP3",
-        symbols="*",
+        symbols="ESM2",
         schema="mbo",
-        start="2020-12-27T12:00",
-        end="2020-12-29",
+        start="2022-06-10",
+        end="2022-06-15",
     )
 
-    print(cost1)
-
-    cost2: float = client.metadata.get_cost(
-        dataset="XNAS.ITCH",
-        symbols=["MSFT"],
-        schema="trades",
-        start="2015-04-22",
-        end="2015-04-22T12:10",
-    )
-
-    print(cost2)
+    print(cost)
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.3.0"`
	`1`	`+__version__ = "0.4.0"`