You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/optimisation-numpy.md
+13-9Lines changed: 13 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -295,7 +295,7 @@ For about 500k points and 1000 polygons, the initial version of the code took ab
295
295
Luckily, Shapely is built on top of NumPy, so she was able to apply functions to an array of points instead and wrote an improved version, which took just 20 minutes:
296
296
297
297
```Python
298
-
# Extract points and corresponding names as two separate NumPy arrays from a larger data frame
298
+
#1) Extract points and corresponding names as two separate NumPy arrays from a larger data frame
299
299
# This will make it easier to apply vectorised functions below
To vectorise this efficiently, the logic of the code had to be changed slightly.
315
+
To vectorise this efficiently, the logic of the code had to be changed slightly:
316
316
317
-
The improved code starts by extracting the `shapely.Point`s and corresponding point names as two separate NumPy arrays from a larger data frame.
318
-
We then pass that array of points to `current_polygon.contains()`, which uses vectorisation to speed up the calculation and returns a NumPy array of booleans, describing for each `Point` in the input array whether it is contained in `current_polygon`.
319
-
This boolean array is then [passed as an index](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing) to the `point_names_list` array. This returns a new array with the names of all points that are contained in the polygon (i.e. where the boolean array had the value `True`).
317
+
1. The improved code starts by extracting the `shapely.Point`s and corresponding point names as two separate NumPy arrays from a larger data frame.
318
+
2. It then passes that array of points to `current_polygon.contains()`, which uses vectorisation to speed up the calculation. It returns a NumPy array of booleans (`True` or `False`), describing for each `Point` in the input array whether it is contained in `current_polygon`.
319
+
3. This boolean array is then [passed as an index](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing) to the `point_names_list` array. This returns a new array with the names of all points that are contained in the polygon (i.e. where the boolean array had the value `True`).
320
+
4. Finally, the contained points are stored as a Python list. (In this particular case, later parts of the data analysis code expected a list instead of a NumPy array. Since those parts of the code were "fast enough"—remember Donald Knuth’s quote in the earlier episode?—the researcher decided not to spend more time to rewrite them.)
321
+
322
+
::::::::::::::::::::::::::::::::::::: instructor
320
323
321
-
The following code snippet demonstrates how this works for a simplified example:
324
+
The following code snippet demonstrates how this works for a simplified example.
325
+
If you want to run this as a live demo, you need to `pip install shapely` first.
0 commit comments