Skip to content

Commit bf750cb

Browse files
henryiiiHDembinski
authored andcommitted
Reduce in Python (#259)
This replaces reduce with a new dictionary UHI syntax. Changes: * Adds the missing axis=0 shortcut in cpp mode * Adds a warning to the developer Regular axis shortcut * Removes the boost_histogram.algorithm module * Implements a Python interface to indexing using a dict * Adds a bh.tag.Slicer() shortcut for slicing in the dict
1 parent 3f67fb6 commit bf750cb

File tree

12 files changed

+189
-62
lines changed

12 files changed

+189
-62
lines changed

CHANGELOG.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
1-
### IN DEVELOPMENT
1+
### Version 0.6
22

33
This version fills out most of the remaining features missing from the 0.5.x
4-
series. The API has changed considerably, providing a more consistent
5-
experience in Python. The classic API still works in this release, but issues a
6-
warning and will be removed from the next release.
4+
series. You can now use all the storages without the original caveats; even
5+
the accumulators can be accessed array-at-a-time without copy, pickled quickly,
6+
and set array-at-a-time, as well.
7+
8+
The API has changed considerably, providing a more consistent experience in
9+
Python. Most of the classic API still works in this release, but will issue a
10+
warning and will be removed from the next release. Please use this release to
11+
transition existing 0.5.x code to the new API.
12+
713

814
#### User changes
915

@@ -18,7 +24,9 @@ warning and will be removed from the next release.
1824
* You can now sum over a range with endpoints [#185][]
1925
* `h.axes` now has the functions from axis as well. [#183][]
2026
* `bh.project` has become `bh.sum` [#185][]
21-
* Added `hist.copy()` [#218][]
27+
* `.reduce(...)` and the reducers in `bh.algorithm` have been removed in favor of dictionary based UHI slicing [#259][]
28+
* `bh.numpy` module interface updates, `histogram=bh.Histogram` replaces cryptic `bh=True`, and `density=True` is now supported in Numpy mode [#256][]
29+
* Added `hist.copy()` [#218][] and `hist.shape` [#264][]
2230
* Signatures are much nicer in Python 3 [#188][]
2331
* Reprs are better, various properties like `__module__` are now set correctly [#200][]
2432

@@ -27,6 +35,7 @@ warning and will be removed from the next release.
2735
* `.view()` now no longer makes a copy [#194][]
2836
* Fixes related to string category axis fills [#233][], [#230][]
2937
* Axes are no longer copies, support setting metadata [#238][], [#246][]
38+
* Pickling accumulator storages is now comparable in performance simple storages [#258][]
3039

3140
#### Developer changes
3241

@@ -58,6 +67,9 @@ warning and will be removed from the next release.
5867
[#246]: https://github.com/scikit-hep/boost-histogram/pull/246
5968
[#250]: https://github.com/scikit-hep/boost-histogram/pull/250
6069
[#255]: https://github.com/scikit-hep/boost-histogram/pull/255
70+
[#258]: https://github.com/scikit-hep/boost-histogram/pull/258
71+
[#259]: https://github.com/scikit-hep/boost-histogram/pull/259
72+
[#264]: https://github.com/scikit-hep/boost-histogram/pull/264
6173

6274

6375
### Version 0.5.2

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ If you are on a Linux system that is not part of the "many" in manylinux, such a
150150

151151
#### Conda-Forge
152152

153-
The boost-histogram package is available on Conda-Forge, as well. All supported versions are available with the exception of Windows + Python 2.7, which cannot build due to the age of the compiler. Please use Pip if you *really* need Python 2.7 on Windows. You will also need the VS 2015 distributable, as described above.
153+
The boost-histogram package is available on Conda-Forge, as well. All supported versions are available with the exception of Windows + Python 2.7, which cannot built due to the age of the compiler. Please use Pip if you *really* need Python 2.7 on Windows. You will also need the VS 2015 distributable, as described above.
154154

155155
```
156156
conda install -c conda-forge boost-histogram
@@ -162,7 +162,7 @@ For a source build, for example from an "sdist" package, the only requirements a
162162

163163
If you are using Python 2.7 on Windows, you will need to use a recent version of Visual studio and force distutils to use it, or just upgrade to Python 3.6 or newer. Check the PyBind11 documentation for [more help](https://pybind11.readthedocs.io/en/stable/faq.html#working-with-ancient-visual-studio-2009-builds-on-windows). On some Linux systems, you may need to use a newer compiler than the one your distribution ships with.
164164

165-
Having Numpy before building is recommended (enables multithreaded builds). Boost 1.71 is not required or needed (this only depends on included header-only dependencies).This library is under active development; you can install directly from GitHub if you would like.
165+
Having Numpy before building is recommended (enables multithreaded builds). Boost is not required or needed (this only depends on included header-only dependencies).This library is under active development; you can install directly from GitHub if you would like.
166166

167167
```bash
168168
python -m pip install git+https://github.com/scikit-hep/boost-histogram.git@develop

boost_histogram/__init__.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@
1010
"axis",
1111
"storage",
1212
"accumulators",
13-
"algorithm",
1413
"utils",
1514
"numpy",
1615
"loc",
@@ -31,7 +30,7 @@
3130

3231

3332
from ._internal.hist import Histogram
34-
from . import axis, storage, accumulators, algorithm, utils, numpy
33+
from . import axis, storage, accumulators, utils, numpy
3534
from .tag import loc, rebin, sum, underflow, overflow
3635

3736
from .version import __version__

boost_histogram/_internal/hist.py

Lines changed: 32 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,12 @@
2626

2727

2828
def _arg_shortcut(item):
29+
msg = "Developer shortcut: will be removed in a future version"
2930
if isinstance(item, tuple) and len(item) == 3:
31+
warnings.warn(msg, FutureWarning)
3032
return _core.axis.regular_uoflow(item[0], item[1], item[2], None)
3133
elif isinstance(item, tuple) and len(item) == 4:
34+
warnings.warn(msg, FutureWarning)
3235
return _core.axis.regular_uoflow(*item)
3336
elif isinstance(item, Axis):
3437
return item._ax
@@ -123,7 +126,7 @@ def __init__(self, *axes, **kwargs):
123126
raise TypeError("Unsupported storage")
124127

125128
def __array__(self):
126-
return self.view()
129+
return _to_view(self._hist.view(False))
127130

128131
def __add__(self, other):
129132
return self.__class__(self._hist + other._hist)
@@ -220,6 +223,9 @@ def _axis(self, i):
220223
def _storage_type(self):
221224
return cast(self, self._hist._storage_type, Storage)
222225

226+
def _reduce(self, *args):
227+
return self.__class__(self._hist.reduce(*args))
228+
223229

224230
# C++ version of histogram
225231
@set_family(CPP_FAMILY)
@@ -263,9 +269,6 @@ def _empty(self, flow=False):
263269
def _sum(self, flow=False):
264270
return self._hist.sum(flow)
265271

266-
def _reduce(self, *args):
267-
return self.__class__(self._hist.reduce(*args))
268-
269272
def _project(self, *args):
270273
return self.__class__(self._hist.project(*args))
271274

@@ -327,18 +330,25 @@ def __repr__(self):
327330
ret += " ({0} with flow)".format(outer)
328331
return ret
329332

330-
def _compute_commonindex(self, index, expand_ellipsis):
333+
def _compute_commonindex(self, index):
334+
"""
335+
Takes indices and returns two iterables; one is a tuple or dict of the
336+
original, Ellipsis expanded index, and the other returns index,
337+
operation value pairs.
338+
"""
331339
# Shorten the computations with direct access to raw object
332340
hist = self._hist
341+
342+
# Support dict access
343+
if hasattr(index, "items"):
344+
return index, index.items()
345+
333346
# Normalize -> h[i] == h[i,]
334-
if not isinstance(index, tuple):
347+
elif not isinstance(index, tuple):
335348
index = (index,)
336349

337350
# Now a list
338-
if expand_ellipsis:
339-
indexes = _expand_ellipsis(index, hist.rank())
340-
else:
341-
indexes = list(index)
351+
indexes = _expand_ellipsis(index, hist.rank())
342352

343353
if len(indexes) != hist.rank():
344354
raise IndexError("Wrong number of indices for histogram")
@@ -357,7 +367,7 @@ def _compute_commonindex(self, index, expand_ellipsis):
357367
raise IndexError("histogram index is out of range")
358368
indexes[i] %= hist.axis(i).size
359369

360-
return indexes
370+
return indexes, enumerate(indexes)
361371

362372
def axis(self, i):
363373
"""
@@ -452,21 +462,23 @@ def shape(self):
452462

453463
def __getitem__(self, index):
454464

455-
indexes = self._compute_commonindex(index, expand_ellipsis=True)
465+
indexes, iterator = self._compute_commonindex(index)
456466

457467
# If this is (now) all integers, return the bin contents
458-
try:
459-
return self._hist.at(*indexes)
460-
except RuntimeError:
461-
pass
468+
# But don't try *dict!
469+
if not hasattr(indexes, "items"):
470+
try:
471+
return self._hist.at(*indexes)
472+
except RuntimeError:
473+
pass
462474

463475
integrations = set()
464476
slices = []
465477
zeroes_start = []
466478
zeroes_stop = []
467479

468480
# Compute needed slices and projections
469-
for i, ind in enumerate(indexes):
481+
for i, ind in iterator:
470482
if not isinstance(ind, slice):
471483
raise IndexError(
472484
"Invalid arguments as an index, use all integers "
@@ -503,7 +515,7 @@ def __getitem__(self, index):
503515

504516
slices.append(_core.algorithm.slice_and_rebin(i, begin, end, merge))
505517

506-
reduced = self.reduce(*slices)
518+
reduced = self._reduce(*slices)
507519
if not integrations:
508520
return self.__class__(reduced)
509521
else:
@@ -546,7 +558,7 @@ def __setitem__(self, index, value):
546558
is 2 larger). Bin edges must be a close match, as well. If you don't
547559
want this level of type safety, just use ``h[...] = h2.view()``.
548560
"""
549-
indexes = self._compute_commonindex(index, expand_ellipsis=True)
561+
indexes, iterator = self._compute_commonindex(index)
550562

551563
if isinstance(value, BaseHistogram):
552564
raise TypeError("Not supported yet")
@@ -566,8 +578,7 @@ def __setitem__(self, index, value):
566578
"Setting a histogram with an array must have a matching number of dimensions"
567579
)
568580

569-
for n in range(len(indexes)):
570-
request = indexes[n]
581+
for n, request in iterator:
571582
has_underflow = self.axes[n].options.underflow
572583
has_overflow = self.axes[n].options.overflow
573584

@@ -605,14 +616,6 @@ def __setitem__(self, index, value):
605616

606617
view[tuple(indexes)] = value
607618

608-
def reduce(self, *args):
609-
"""
610-
Reduce based on one or more reduce_option's. If you are operating on most
611-
or all of your axis, consider slicing with [] notation.
612-
"""
613-
614-
return self.__class__(self._hist.reduce(*args))
615-
616619
def project(self, *args):
617620
"""
618621
Project to a single axis or several axes on a multidiminsional histogram.

boost_histogram/algorithm.py

Lines changed: 0 additions & 10 deletions
This file was deleted.

boost_histogram/cpp/algorithm.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,13 @@
1616
"project",
1717
)
1818

19-
from ..algorithm import shrink_and_rebin, slice_and_rebin, rebin, shrink, slice
19+
from .._core.algorithm import shrink_and_rebin, slice_and_rebin, rebin, shrink, slice
20+
21+
shrink_and_rebin.__module__ = "boost_histogram.cpp"
22+
slice_and_rebin.__module__ = "boost_histogram.cpp"
23+
rebin.__module__ = "boost_histogram.cpp"
24+
shrink.__module__ = "boost_histogram.cpp"
25+
slice.__module__ = "boost_histogram.cpp"
2026

2127

2228
def sum(histogram, flow=False):

boost_histogram/tag.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,26 @@
22

33
del absolute_import, division, print_function
44

5-
__all__ = ("Locator", "at", "loc", "overflow", "underflow", "rebin", "sum")
5+
__all__ = ("Slicer", "Locator", "at", "loc", "overflow", "underflow", "rebin", "sum")
66

77
import numpy as _np
88

99

10+
class Slicer(object):
11+
"""
12+
This is a simple class to make slicing inside dictionaries simpler.
13+
This is how it should be used:
14+
15+
s = bh.tag.Slicer()
16+
17+
h[{0: s[::bh.rebin(2)]}] # rebin axis 0 by two
18+
19+
"""
20+
21+
def __getitem__(self, item):
22+
return item
23+
24+
1025
class Locator(object):
1126
__slots__ = ("offset",)
1227

docs/usage/indexing.rst

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,14 @@
33
Indexing
44
========
55

6-
This is the design document for Unified Histogram Indexing (UHI). Much of the original plan is now implemented in boost-histogram.
7-
Other histogramming libraries can implement support for this as well, and the "tag" functors, like ``sum`` and ``loc`` can be
8-
used between libraries.
6+
This is the design document for Unified Histogram Indexing (UHI). Much of the
7+
original plan is now implemented in boost-histogram. Other histogramming
8+
libraries can implement support for this as well, and the "tag" functors, like
9+
``sum`` and ``loc`` can be used between libraries.
910

10-
The following examples assume you have imported ``loc``, ``sum``, ``rebin``, ``end``, ``underflow``, and ``overflow`` from boost-histogram or any other
11-
library that implements UHI.
11+
The following examples assume you have imported ``loc``, ``sum``, ``rebin``,
12+
``underflow``, and ``overflow`` from boost-histogram or any other library that
13+
implements UHI.
1214

1315
Access:
1416
^^^^^^^
@@ -31,9 +33,9 @@ Slicing:
3133
h2 = h[loc(v):] # Slices can be in data coordinates, too
3234
h2 = h[::rebin(2)] # Modification operations (rebin)
3335
h2 = h[a:b:rebin(2)] # Modifications can combine with slices
34-
h2 = h[::sum] # Projection operations # (name may change)
35-
h2 = h[a:b:sum] # Adding endpoints to projection operations
36-
h2 = h[0:end:sum] # removes under or overflow from the calculation
36+
h2 = h[::sum] # Projection operations # (name may change)
37+
h2 = h[a:b:sum] # Adding endpoints to projection operations
38+
h2 = h[0:len:sum] # removes under or overflow from the calculation
3739
h2 = h[a:b, ...] # Ellipsis work just like normal numpy
3840
3941
Setting
@@ -48,6 +50,7 @@ Setting
4850
4951
h[...] = array(...) # Setting with an array or histogram sets the contents if the sizes match
5052
# Overflow can optionally be included if endpoints are left out
53+
# The number of dimensions for non-scalars should match (broadcasting works normally otherwise)
5154
5255
All of this generalizes to multiple dimensions. ``loc(v)`` could return
5356
categorical bins, but slicing on categories would (currently) not be
@@ -59,9 +62,33 @@ will case the relevant flow bin to be excluded (not currently supported).
5962

6063
``loc``, ``project``, and ``rebin`` all live inside the histogramming
6164
package (like boost-histogram), but are completely general and can be created by a
62-
user using an explicit API (below). ``end``, ``underflow`` and ``overflow`` also
65+
user using an explicit API (below). ``underflow`` and ``overflow`` also
6366
follow a general API.
6467

68+
One drawback of the syntax listed above is that it is hard to select an action
69+
to run on an axis or a few axes out of many. For this use case, you can pass a
70+
dictionary to the index, and that has the syntax ``{axis:action}``. The actions
71+
are slices, and follow the rules listed above. This looks like:
72+
73+
.. code:: python
74+
75+
h[{0: slice(None, None, bh.rebin(2))}] # rebin axis 0 by two
76+
h[{1: slice(0, bh.loc(3.5))}] # slice axis 1 from 0 to the data coordinate 3.5
77+
h[{7: slice(0, 2, bh.rebin(4))}] # slice and rebin axis 7
78+
79+
80+
If you don't like manually building slices, you can use the `Slicer()` utility to recover the original slicing syntax inside the dict:
81+
82+
.. code:: python
83+
84+
s = bh.tag.Slicer()
85+
86+
h[{0: s[::bh.rebin(2)]}] # rebin axis 0 by two
87+
h[{1: s[0:bh.loc(3.5)]}] # slice axis 1 from 0 to the data coordinate 3.5
88+
h[{7: s[0:2:bh.rebin(4)]}] # slice and rebin axis 7
89+
90+
91+
6592
Invalid syntax:
6693
^^^^^^^^^^^^^^^
6794

0 commit comments

Comments
 (0)