Skip to content

Commit 78459ff

Browse files
authored
Merge pull request #3 from halprin/parallel
Parallel Version
2 parents 1d795ae + 36040b6 commit 78459ff

File tree

7 files changed

+1044
-20
lines changed

7 files changed

+1044
-20
lines changed

README.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,17 @@ import iterator_chain
3232
```
3333

3434
### Start the chain
35-
To start the chain, use the `from_iterable` function. It takes an iterable.
35+
To start the chain, use the `from_iterable` or `from_iterable_parallel` function. They take an iterable.
3636
```python
3737
an_iterable = [5, 78, 12, 26]
38-
iterator_chain.from_iterable(an_iterable)
38+
chain = iterator_chain.from_iterable(an_iterable)
39+
parallel_chain = iterator_chain.from_iterable_parallel(an_iterable)
3940
```
4041

41-
| Function | Arguments | Description |
42+
| Function | Arguments | Description |
4243
| --- | --- | --- |
4344
| `from_iterable` |`iterable` - An iterable to be used in the iterator chain | Starts the iterator chain with the supplied iterable. Chaining and terminating methods can now be called on the result. |
45+
| `from_iterable_parallel` |`iterable` - An iterable to be used in the iterator chain<br/>• `chunksize` - Keyword. How big of chunks to split the iterator up across the parallel execution units. If unspecified or None, the chunk size will start at 1 and send that many elements to each execution unit. The chunk size will then increment in powers of two and send that many items to each execution unit. This is repeated until the iterator is exhausted. This value is used as the default chunksize for all the following parallel based methods. A specific parallel based method's chunksize can be overrided by supplying the `chunksize` keyword to that method. | Starts the iterator chain with the supplied iterable. Chaining and terminating methods can now be called on the result. Certain chaining and terminating methods will occur in parallel. Parallel means separate processes to get around Python's GIL. |
4446

4547

4648
### Continuing the chain
@@ -55,33 +57,44 @@ called.
5557
an actual value. This value will depend on all the previous chaining methods being executed first.
5658

5759
#### Chaining methods
58-
| Method | Arguments | Description |
60+
| Method | Arguments | Description |
5961
| --- | --- | --- |
6062
| `map` |`function` - A function that takes a single argument | Will run the `function` across all the elements in the iterator. |
6163
| `filter` |`function` - A function that takes a single argument | Will run the `function` on every element. `function` should return a truthy or falsy value. On true, the element will stay; on false, the element will be removed. |
6264
| `skip` |`number` - An integer | The `number` number of elements will be skipped over and effectively removed. |
6365
| `distinct` | | Any duplicates will be removed. |
6466
| `limit` |`max_size` - An integer | The iterator will stop after `max_size` elements. Any elements afterward are effectively removed. |
6567
| `flatten` | | Any element that is an iterable itself will have its elements iterated over first before continuing with the remaining elements. Strings (`str`) do not count as an iterable for this method. Dictionaries flatten to its item tuples. |
66-
| `sort` |`key` - Keyword. A function of one argument that is used to extract a comparison key from each element<br/>• `cmp` - Keyword. A Python 2.x "cmp" function that takes two arguments<br/>• `reverse` - Keyword. If set to `True`, the elements will be sorted in the reverse order. | Sorts the iterator based on the elements' values. Use `key` or `cmp` to make a custom comparison. If `key` is specified, `cmp` cannot be used. This method is expensive because it must serialize all the values into a sequence. |
67-
| `reverse` | | Reverses the iterator. The last time will be first, and the first item will be last. This method is expensive because it must serialize all the values into a list. |
68+
| `sort` |`key` - Keyword. A function of one argument that is used to extract a comparison key from each element<br/>• `cmp` - Keyword. A Python 2.x "cmp" function that takes two arguments<br/>• `reverse` - Keyword. If set to `True`, the elements will be sorted in the reverse order | Sorts the iterator based on the elements' values. Use `key` or `cmp` to make a custom comparison. If `key` is specified, `cmp` cannot be used. This method is expensive because it must serialize all the values into a sequence. |
69+
| `reverse` | | Reverses the iterator. The last item will be first, and the first item will be last. This method is expensive because it must serialize all the values into a list. |
70+
71+
##### Parallel Versions
72+
| Method | Arguments | Description |
73+
| --- | --- | --- |
74+
| `map` |`function` - A function that takes a single argument<br/>• `chunksize` - Keyword. Overrides the chunksize supplied to the original `from_iterable_parallel` | Will run the `function` across all the elements in the iterator in parallel. |
75+
| `filter` |`function` - A function that takes a single argument<br/>• `chunksize` - Keyword. Overrides the chunksize supplied to the original `from_iterable_parallel` | Will run the `function` on every element in parallel. `function` should return a truthy or falsy value. On true, the element will stay; on false, the element will be removed. |
6876

6977
#### Terminating methods
70-
| Method | Arguments | Description |
78+
| Method | Arguments | Description |
7179
| --- | --- | --- |
7280
| `list` | | Serializes the iterator chain into a `list` and returns it. |
73-
| `count` | | Returns the number of elements in the iterator |
81+
| `count` | | Returns the number of elements in the iterator. |
7482
| `first` |`default` - Keyword. Any value. | Returns just the first item in the iterator. If the iterator is empty, the `default` is returned. |
7583
| `last` |`default` - Keyword. Any value. | Returns just the last item in the iterator. If the iterator is empty, the `default` is returned. |
7684
| `max` |`default` - Keyword. Any value. | Returns the largest valued element in the iterator. If the iterator is empty, the `default` is returned. |
7785
| `min` |`default` - Keyword. Any value. | Returns the smallest valued element in the iterator. If the iterator is empty, the `default` is returned. |
7886
| `sum` |`default` - Keyword. Any value. | Sums all the elements in the iterator together. If any of the elements are un-summable, the `default` is returned. |
79-
| `reduce` |`function` - A function that takes two arguments | Applies the function to two elements in the iterator cumulatively. Subsequent calls to `function` uses the previous return value from `function` as the first argument and the next element in the iterator as the second argument. The final value is returned. |
87+
| `reduce` |`function` - A function that takes two arguments<br/>• `initial` - Keyword. Any value. | Applies the function to two elements in the iterator cumulatively. Subsequent calls to `function` uses the previous return value from `function` as the first argument and the next element in the iterator as the second argument. The final value is returned. If `initial` is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty. |
8088
| `for_each` |`function` - A function that takes one argument and returns nothing | Executes `function` on every element in the iterator. There is no return value. If you are wanting to return a list of values based on the function, use `.map(_function_).list()`. |
8189
| `all_match` |`function` - A function that takes one argument and returns a boolean | Returns `True` only if _all_ the elements return `True` after applying the `function` to them. Else returns `False`. |
8290
| `any_match` |`function` - A function that takes one argument and returns a boolean | Returns `True` if just one element return `True` after applying the `function` to it. If all elements result in `False`, `False` is returned. |
8391
| `none_match` |`function` - A function that takes one argument and returns a boolean | Returns `True` only if _all_ the elements return `False` after applying the `function` to them. Else returns `True`. |
8492

93+
##### Parallel Versions
94+
| Method | Arguments | Description |
95+
| --- | --- | --- |
96+
| `for_each` |`function` - A function that takes one argument and returns nothing<br/>• `chunksize` - Keyword. Overrides the chunksize supplied to the original `from_iterable_parallel` | Executes `function` on every element in the iterator in parallel. There is no return value. If you are wanting to return a list of values based on the function, use `.map(function).list()`. |
97+
8598
## Examples
8699
```python
87100
import iterator_chain

iterator_chain/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
__version__ = '1.0.0'
22
from iterator_chain.begin import from_iterable
3+
from iterator_chain.begin import from_iterable_parallel

iterator_chain/begin.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,27 @@
1+
from concurrent.futures import ProcessPoolExecutor
12
from iterator_chain.intermediate import _IntermediateIteratorChain
3+
from iterator_chain.parallel_intermediate import _IntermediateParallelIteratorChain
24

35

46
def from_iterable(iterable):
7+
"""
8+
Starts the iterator chain with the supplied iterable. Chaining and terminating methods can now be called on the result.
9+
10+
:param iterable: An iterable to be used in the iterator chain.
11+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
12+
"""
513
iterator = iter(iterable)
614
return _IntermediateIteratorChain(iterator)
15+
16+
17+
def from_iterable_parallel(iterable, chunksize=None):
18+
"""
19+
Starts the iterator chain with the supplied iterable. Chaining and terminating methods can now be called on the result. Certain chaining and terminating methods will occur in parallel. Parallel means separate processes to get around Python's GIL.
20+
21+
:param iterable: An iterable to be used in the iterator chain.
22+
:param chunksize: How big of chunks to split the iterator up across the parallel execution units. If unspecified or None, the chunk size will start at 1 and send that many elements to each execution unit. The chunk size will then increment in powers of two and send that many items to each execution unit. This is repeated until the iterator is exhausted. This value is used as the default chunksize for all the following parallel based methods. A specific parallel based method's chunksize can be overrided by supplying the `chunksize` keyword to that method.
23+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
24+
"""
25+
iterator = iter(iterable)
26+
executor = ProcessPoolExecutor()
27+
return _IntermediateParallelIteratorChain(iterator, executor, chunksize=chunksize)

iterator_chain/intermediate.py

Lines changed: 143 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,44 @@ def __init__(self, iterator):
99

1010
# Chain methods
1111
def map(self, function):
12-
iterator = map(function, self._iterator)
13-
return _IntermediateIteratorChain(iterator)
12+
"""
13+
Will run the `function` across all the elements in the iterator.
1414
15-
def skip(self, number):
16-
iterator = itertools.islice(self._iterator, number, None)
15+
:param function: A function that takes a single argument.
16+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
17+
"""
18+
iterator = map(function, self._iterator)
1719
return _IntermediateIteratorChain(iterator)
1820

1921
def filter(self, function):
22+
"""
23+
Will run the `function` on every element. `function` should return a truthy or falsy value. On true, the element will stay; on false, the element will be removed.
24+
25+
:param function: A function that takes a single argument.
26+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
27+
"""
2028
iterator = filter(function, self._iterator)
2129
return _IntermediateIteratorChain(iterator)
2230

31+
def skip(self, number):
32+
"""
33+
The `number` number of elements will be skipped over and effectively removed.
34+
35+
:param number: An integer.
36+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
37+
"""
38+
iterator = self._skip(number)
39+
return _IntermediateIteratorChain(iterator)
40+
41+
def _skip(self, number):
42+
return itertools.islice(self._iterator, number, None)
43+
2344
def distinct(self):
45+
"""
46+
Any duplicates will be removed.
47+
48+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
49+
"""
2450
iterator = self._distinct()
2551
return _IntermediateIteratorChain(iterator)
2652

@@ -31,9 +57,18 @@ def _distinct(self):
3157
yield item
3258

3359
def limit(self, max_size):
34-
iterator = itertools.islice(self._iterator, max_size)
60+
"""
61+
The iterator will stop after `max_size` elements. Any elements afterward are effectively removed.
62+
63+
:param max_size: An integer.
64+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
65+
"""
66+
iterator = self._limit(max_size)
3567
return _IntermediateIteratorChain(iterator)
3668

69+
def _limit(self, max_size):
70+
return itertools.islice(self._iterator, max_size)
71+
3772
@staticmethod
3873
def _is_dict(something):
3974
return isinstance(something, dict)
@@ -54,62 +89,159 @@ def _flatten(self, iterable, force_stop=False):
5489
yield item
5590

5691
def flatten(self):
92+
"""
93+
Any element that is an iterable itself will have its elements iterated over first before continuing with the remaining elements. Strings (`str`) do not count as an iterable for this method. Dictionaries flatten to its item tuples.
94+
95+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
96+
"""
5797
iterator = self._flatten(self._iterator)
5898
return _IntermediateIteratorChain(iterator)
5999

60100
def sort(self, key=None, cmp=None, reverse=False):
101+
"""
102+
Sorts the iterator based on the elements' values. Use `key` or `cmp` to make a custom comparison. If `key` is specified, `cmp` cannot be used. This method is expensive because it must serialize all the values into a sequence.
103+
104+
:param key: Keyword. A function of one argument that is used to extract a comparison key from each element.
105+
:param cmp: Keyword. A Python 2.x "cmp" function that takes two arguments.
106+
:param reverse: Keyword. If set to `True`, the elements will be sorted in the reverse order.
107+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
108+
"""
109+
iterator = self._sort(key=key, cmp=cmp, reverse=reverse)
110+
return _IntermediateIteratorChain(iterator)
111+
112+
def _sort(self, key=None, cmp=None, reverse=False):
61113
if key is None and cmp is not None:
62114
key = functools.cmp_to_key(cmp)
63-
iterator = iter(sorted(self._iterator, key=key, reverse=reverse))
64-
return _IntermediateIteratorChain(iterator)
115+
return iter(sorted(self._iterator, key=key, reverse=reverse))
65116

66117
def reverse(self):
67-
forward = list(self._iterator)
68-
iterator = reversed(forward)
118+
"""
119+
Reverses the iterator. The last item will be first, and the first item will be last. This method is expensive because it must serialize all the values into a list.
120+
121+
:return: An intermediate object that subsequent chaining and terminating methods can be called on.
122+
"""
123+
iterator = self._reverse()
69124
return _IntermediateIteratorChain(iterator)
70125

126+
def _reverse(self):
127+
forward = list(self._iterator)
128+
return reversed(forward)
129+
71130
# Termination methods
72131
def list(self):
132+
"""
133+
Serializes the iterator chain into a `list` and returns it.
134+
135+
:return: A list whose elements come from the iterator.
136+
"""
73137
return list(self._iterator)
74138

75139
def count(self):
140+
"""
141+
Returns the number of elements in the iterator
142+
143+
:return: An integer.
144+
"""
76145
return sum(1 for _ in self._iterator)
77146

78147
def first(self, default=None):
148+
"""
149+
Returns just the first item in the iterator. If the iterator is empty, the `default` is returned.
150+
151+
:param default: Keyword. Any value.
152+
:return: The first element.
153+
"""
79154
return next(itertools.islice(self._iterator, 1), default)
80155

81156
def last(self, default=None):
157+
"""
158+
Returns just the last item in the iterator. If the iterator is empty, the `default` is returned.
159+
160+
:param default: Keyword. Any value.
161+
:return: The last element.
162+
"""
82163
try:
83164
end = collections.deque(self._iterator, maxlen=1).pop()
84165
except IndexError:
85166
end = default
86167
return end
87168

88169
def max(self, default=None):
170+
"""
171+
Returns the largest valued element in the iterator. If the iterator is empty, the `default` is returned.
172+
173+
:param default: Keyword. Any value.
174+
:return: The largest element.
175+
"""
89176
return max(self._iterator, default=default)
90177

91178
def min(self, default=None):
179+
"""
180+
Returns the smallest valued element in the iterator. If the iterator is empty, the `default` is returned.
181+
182+
:param default: Keyword. Any value.
183+
:return: The smallest element.
184+
"""
92185
return min(self._iterator, default=default)
93186

94187
def sum(self, default=None):
188+
"""
189+
Sums all the elements in the iterator together. If any of the elements are un-summable, the `default` is returned.
190+
191+
:param default: Keyword. Any value.
192+
:return: The sum of all the elements.
193+
"""
95194
try:
96195
total = sum(self._iterator)
97196
except TypeError:
98197
total = default
99198
return total
100199

101-
def reduce(self, function):
102-
return functools.reduce(function, self._iterator)
200+
def reduce(self, function, initial=None):
201+
"""
202+
Applies the function to two elements in the iterator cumulatively. Subsequent calls to `function` uses the previous return value from `function` as the first argument and the next element in the iterator as the second argument. The final value is returned. If `initial` is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.
203+
204+
:param function: A function that takes two arguments.
205+
:param initial: Keyword. Any value.
206+
:return: The final reduced value.
207+
"""
208+
if initial is None:
209+
return functools.reduce(function, self._iterator)
210+
else:
211+
return functools.reduce(function, self._iterator, initial)
103212

104213
def for_each(self, function):
214+
"""
215+
Executes `function` on every element in the iterator. There is no return value. If you are wanting to return a list of values based on the function, use `.map(function).list()`.
216+
217+
:param function: A function that takes one argument and returns nothing.
218+
"""
105219
for item in self._iterator:
106220
function(item)
107221

108222
def all_match(self, function):
223+
"""
224+
Returns `True` only if all the elements return `True` after applying the `function` to them. Else returns `False`.
225+
226+
:param function: A function that takes one argument and returns a boolean.
227+
:return: True or False
228+
"""
109229
return all(map(function, self._iterator))
110230

111231
def any_match(self, function):
232+
"""
233+
Returns `True` if just one element return `True` after applying the `function` to it. If all elements result in `False`, `False` is returned.
234+
235+
:param function: A function that takes one argument and returns a boolean.
236+
:return: True or False
237+
"""
112238
return any(map(function, self._iterator))
113239

114240
def none_match(self, function):
241+
"""
242+
Returns `True` only if all the elements return `False` after applying the `function` to them. Else returns `True`.
243+
244+
:param function: A function that takes one argument and returns a boolean.
245+
:return: True or False
246+
"""
115247
return not self.any_match(function)

0 commit comments

Comments
 (0)