|
| 1 | +# DataJoint 2.0 Fetch API Specification |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +DataJoint 2.0 replaces the complex `fetch()` method with a set of explicit, composable output methods. This provides better discoverability, clearer intent, and more efficient iteration. |
| 6 | + |
| 7 | +## Design Principles |
| 8 | + |
| 9 | +1. **Explicit over implicit**: Each output format has its own method |
| 10 | +2. **Composable**: Use existing `.proj()` for column selection |
| 11 | +3. **Lazy iteration**: Single cursor streaming instead of fetch-all-keys |
| 12 | +4. **Modern formats**: First-class support for polars and Arrow |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## New API Reference |
| 17 | + |
| 18 | +### Output Methods |
| 19 | + |
| 20 | +| Method | Returns | Description | |
| 21 | +|--------|---------|-------------| |
| 22 | +| `to_dicts()` | `list[dict]` | All rows as list of dictionaries | |
| 23 | +| `to_pandas()` | `DataFrame` | pandas DataFrame with primary key as index | |
| 24 | +| `to_polars()` | `polars.DataFrame` | polars DataFrame (requires `datajoint[polars]`) | |
| 25 | +| `to_arrow()` | `pyarrow.Table` | PyArrow Table (requires `datajoint[arrow]`) | |
| 26 | +| `to_arrays()` | `np.ndarray` | numpy structured array (recarray) | |
| 27 | +| `to_arrays('a', 'b')` | `tuple[array, array]` | Tuple of arrays for specific columns | |
| 28 | +| `keys()` | `list[dict]` | Primary key values only | |
| 29 | +| `fetch1()` | `dict` | Single row as dict (raises if not exactly 1) | |
| 30 | +| `fetch1('a', 'b')` | `tuple` | Single row attribute values | |
| 31 | + |
| 32 | +### Common Parameters |
| 33 | + |
| 34 | +All output methods accept these optional parameters: |
| 35 | + |
| 36 | +```python |
| 37 | +table.to_dicts( |
| 38 | + order_by=None, # str or list: column(s) to sort by, e.g. "KEY", "name DESC" |
| 39 | + limit=None, # int: maximum rows to return |
| 40 | + offset=None, # int: rows to skip |
| 41 | + squeeze=False, # bool: remove singleton dimensions from arrays |
| 42 | + download_path="." # str: path for downloading external data |
| 43 | +) |
| 44 | +``` |
| 45 | + |
| 46 | +### Iteration |
| 47 | + |
| 48 | +```python |
| 49 | +# Lazy streaming - yields one dict per row from database cursor |
| 50 | +for row in table: |
| 51 | + process(row) # row is a dict |
| 52 | +``` |
| 53 | + |
| 54 | +--- |
| 55 | + |
| 56 | +## Migration Guide |
| 57 | + |
| 58 | +### Basic Fetch Operations |
| 59 | + |
| 60 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 61 | +|-------------------|-------------------| |
| 62 | +| `table.fetch()` | `table.to_arrays()` or `table.to_dicts()` | |
| 63 | +| `table.fetch(format="array")` | `table.to_arrays()` | |
| 64 | +| `table.fetch(format="frame")` | `table.to_pandas()` | |
| 65 | +| `table.fetch(as_dict=True)` | `table.to_dicts()` | |
| 66 | + |
| 67 | +### Attribute Fetching |
| 68 | + |
| 69 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 70 | +|-------------------|-------------------| |
| 71 | +| `table.fetch('a')` | `table.to_arrays('a')` | |
| 72 | +| `a, b = table.fetch('a', 'b')` | `a, b = table.to_arrays('a', 'b')` | |
| 73 | +| `table.fetch('a', 'b', as_dict=True)` | `table.proj('a', 'b').to_dicts()` | |
| 74 | + |
| 75 | +### Primary Key Fetching |
| 76 | + |
| 77 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 78 | +|-------------------|-------------------| |
| 79 | +| `table.fetch('KEY')` | `table.keys()` | |
| 80 | +| `table.fetch(dj.key)` | `table.keys()` | |
| 81 | +| `keys, a = table.fetch('KEY', 'a')` | See note below | |
| 82 | + |
| 83 | +For mixed KEY + attribute fetch: |
| 84 | +```python |
| 85 | +# Old: keys, a = table.fetch('KEY', 'a') |
| 86 | +# New: Combine keys() with to_arrays() |
| 87 | +keys = table.keys() |
| 88 | +a = table.to_arrays('a') |
| 89 | +# Or use to_dicts() which includes all columns |
| 90 | +``` |
| 91 | + |
| 92 | +### Ordering, Limiting, Offset |
| 93 | + |
| 94 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 95 | +|-------------------|-------------------| |
| 96 | +| `table.fetch(order_by='name')` | `table.to_arrays(order_by='name')` | |
| 97 | +| `table.fetch(limit=10)` | `table.to_arrays(limit=10)` | |
| 98 | +| `table.fetch(order_by='KEY', limit=10, offset=5)` | `table.to_arrays(order_by='KEY', limit=10, offset=5)` | |
| 99 | + |
| 100 | +### Single Row Fetch (fetch1) |
| 101 | + |
| 102 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 103 | +|-------------------|-------------------| |
| 104 | +| `table.fetch1()` | `table.fetch1()` (unchanged) | |
| 105 | +| `a, b = table.fetch1('a', 'b')` | `a, b = table.fetch1('a', 'b')` (unchanged) | |
| 106 | +| `table.fetch1('KEY')` | `table.fetch1()` then extract pk columns | |
| 107 | + |
| 108 | +### Configuration |
| 109 | + |
| 110 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 111 | +|-------------------|-------------------| |
| 112 | +| `dj.config['fetch_format'] = 'frame'` | Use `.to_pandas()` explicitly | |
| 113 | +| `with dj.config.override(fetch_format='frame'):` | Use `.to_pandas()` in the block | |
| 114 | + |
| 115 | +### Iteration |
| 116 | + |
| 117 | +| Old Pattern (1.x) | New Pattern (2.0) | |
| 118 | +|-------------------|-------------------| |
| 119 | +| `for row in table:` | `for row in table:` (same syntax, now lazy!) | |
| 120 | +| `list(table)` | `table.to_dicts()` | |
| 121 | + |
| 122 | +### Column Selection with proj() |
| 123 | + |
| 124 | +Use `.proj()` for column selection, then apply output method: |
| 125 | + |
| 126 | +```python |
| 127 | +# Select specific columns |
| 128 | +table.proj('col1', 'col2').to_pandas() |
| 129 | +table.proj('col1', 'col2').to_dicts() |
| 130 | + |
| 131 | +# Computed columns |
| 132 | +table.proj(total='price * quantity').to_pandas() |
| 133 | +``` |
| 134 | + |
| 135 | +--- |
| 136 | + |
| 137 | +## Removed Features |
| 138 | + |
| 139 | +### Removed Methods and Parameters |
| 140 | + |
| 141 | +- `fetch()` method - use explicit output methods |
| 142 | +- `fetch('KEY')` - use `keys()` |
| 143 | +- `dj.key` class - use `keys()` method |
| 144 | +- `format=` parameter - use explicit methods |
| 145 | +- `as_dict=` parameter - use `to_dicts()` |
| 146 | +- `config['fetch_format']` setting - use explicit methods |
| 147 | + |
| 148 | +### Removed Imports |
| 149 | + |
| 150 | +```python |
| 151 | +# Old (removed) |
| 152 | +from datajoint import key |
| 153 | +result = table.fetch(dj.key) |
| 154 | + |
| 155 | +# New |
| 156 | +result = table.keys() |
| 157 | +``` |
| 158 | + |
| 159 | +--- |
| 160 | + |
| 161 | +## Examples |
| 162 | + |
| 163 | +### Example 1: Basic Data Retrieval |
| 164 | + |
| 165 | +```python |
| 166 | +# Get all data as DataFrame |
| 167 | +df = Experiment().to_pandas() |
| 168 | + |
| 169 | +# Get all data as list of dicts |
| 170 | +rows = Experiment().to_dicts() |
| 171 | + |
| 172 | +# Get all data as numpy array |
| 173 | +arr = Experiment().to_arrays() |
| 174 | +``` |
| 175 | + |
| 176 | +### Example 2: Filtered and Sorted Query |
| 177 | + |
| 178 | +```python |
| 179 | +# Get recent experiments, sorted by date |
| 180 | +recent = (Experiment() & 'date > "2024-01-01"').to_pandas( |
| 181 | + order_by='date DESC', |
| 182 | + limit=100 |
| 183 | +) |
| 184 | +``` |
| 185 | + |
| 186 | +### Example 3: Specific Columns |
| 187 | + |
| 188 | +```python |
| 189 | +# Fetch specific columns as arrays |
| 190 | +names, dates = Experiment().to_arrays('name', 'date') |
| 191 | + |
| 192 | +# Or with primary key included |
| 193 | +names, dates = Experiment().to_arrays('name', 'date', include_key=True) |
| 194 | +``` |
| 195 | + |
| 196 | +### Example 4: Primary Keys for Iteration |
| 197 | + |
| 198 | +```python |
| 199 | +# Get keys for restriction |
| 200 | +keys = Experiment().keys() |
| 201 | +for key in keys: |
| 202 | + process(Session() & key) |
| 203 | +``` |
| 204 | + |
| 205 | +### Example 5: Single Row |
| 206 | + |
| 207 | +```python |
| 208 | +# Get one row as dict |
| 209 | +row = (Experiment() & key).fetch1() |
| 210 | + |
| 211 | +# Get specific attributes |
| 212 | +name, date = (Experiment() & key).fetch1('name', 'date') |
| 213 | +``` |
| 214 | + |
| 215 | +### Example 6: Lazy Iteration |
| 216 | + |
| 217 | +```python |
| 218 | +# Stream rows efficiently (single database cursor) |
| 219 | +for row in Experiment(): |
| 220 | + if should_process(row): |
| 221 | + process(row) |
| 222 | + if done: |
| 223 | + break # Early termination - no wasted fetches |
| 224 | +``` |
| 225 | + |
| 226 | +### Example 7: Modern DataFrame Libraries |
| 227 | + |
| 228 | +```python |
| 229 | +# Polars (fast, modern) |
| 230 | +import polars as pl |
| 231 | +df = Experiment().to_polars() |
| 232 | +result = df.filter(pl.col('value') > 100).group_by('category').agg(pl.mean('value')) |
| 233 | + |
| 234 | +# PyArrow (zero-copy interop) |
| 235 | +table = Experiment().to_arrow() |
| 236 | +# Can convert to pandas or polars with zero copy |
| 237 | +``` |
| 238 | + |
| 239 | +--- |
| 240 | + |
| 241 | +## Performance Considerations |
| 242 | + |
| 243 | +### Lazy Iteration |
| 244 | + |
| 245 | +The new iteration is significantly more efficient: |
| 246 | + |
| 247 | +```python |
| 248 | +# Old (1.x): N+1 queries |
| 249 | +# 1. fetch("KEY") gets ALL keys |
| 250 | +# 2. fetch1() for EACH key |
| 251 | + |
| 252 | +# New (2.0): Single query |
| 253 | +# Streams rows from one cursor |
| 254 | +for row in table: |
| 255 | + ... |
| 256 | +``` |
| 257 | + |
| 258 | +### Memory Efficiency |
| 259 | + |
| 260 | +- `to_dicts()`: Returns full list in memory |
| 261 | +- `for row in table:`: Streams one row at a time |
| 262 | +- `to_arrays(limit=N)`: Fetches only N rows |
| 263 | + |
| 264 | +### Format Selection |
| 265 | + |
| 266 | +| Use Case | Recommended Method | |
| 267 | +|----------|-------------------| |
| 268 | +| Data analysis | `to_pandas()` or `to_polars()` | |
| 269 | +| JSON API responses | `to_dicts()` | |
| 270 | +| Numeric computation | `to_arrays()` | |
| 271 | +| Large datasets | `for row in table:` (streaming) | |
| 272 | +| Interop with other tools | `to_arrow()` | |
| 273 | + |
| 274 | +--- |
| 275 | + |
| 276 | +## Error Messages |
| 277 | + |
| 278 | +When attempting to use removed methods, users see helpful error messages: |
| 279 | + |
| 280 | +```python |
| 281 | +>>> table.fetch() |
| 282 | +AttributeError: fetch() has been removed in DataJoint 2.0. |
| 283 | +Use to_dicts(), to_pandas(), to_arrays(), or keys() instead. |
| 284 | +See table.fetch.__doc__ for details. |
| 285 | +``` |
| 286 | + |
| 287 | +--- |
| 288 | + |
| 289 | +## Optional Dependencies |
| 290 | + |
| 291 | +Install optional dependencies for additional output formats: |
| 292 | + |
| 293 | +```bash |
| 294 | +# For polars support |
| 295 | +pip install datajoint[polars] |
| 296 | + |
| 297 | +# For PyArrow support |
| 298 | +pip install datajoint[arrow] |
| 299 | + |
| 300 | +# For both |
| 301 | +pip install datajoint[polars,arrow] |
| 302 | +``` |
0 commit comments