-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Description
Feature or enhancement
Proposal:
Currently, the heapq module in CPython uses a fixed implementation based on the default comparison behavior of Python objects. However, this design restricts the usability of heapq in scenarios where users need a custom ordering for their data.
Proposal
Introduce an optional comparator parameter to the heapq module to allow greater flexibility in heap operations. This would eliminate the need for users to create additional wrapper objects or manage unnecessary cognitive overhead for simple use cases (e.g., custom ordering for coordinates, tuples, or complex data types).
Motivation
1 Reduce Cognitive Load: Users currently must wrap data in objects with custom dunder methods (lt, etc.) or use workarounds like storing tuples with artificially ordered keys. This is an unnecessary complexity for scenarios that require only custom sorting logic.
2 Consistency: Other priority queue implementations in programming languages (e.g., Java's PriorityQueue or C++'s std::priority_queue) support custom comparators directly.
3 Maybe Flexibility
4 Instead of explicitly including extra information like distance in the tuple (as in the Dijkstra example), we could pass the raw data along with a comparator. The comparator would determine which element to pop and where to place it in the heap based on the desired priority.
Add an optional key or cmp argument to the following functions in the heapq module:
1 heapify
2 heappush
3 heappop
4 heappushpop
5 heapreplace
6 merge
The comparator could function similarly to the key argument in sorted(), where users provide a callable to define custom ordering logic.
Example Usage: Coordinates
import heapq
data = [(3, 4), (1, 2), (5, 6)]
heap = []
heapq.heapify(data, key=lambda x: x[0] + x[1])
heapq.heappush(heap, (2, 3), key=lambda x: x[0] + x[1])
min_item = heapq.heappop(heap)
print(min_item) # Expected: (1, 2)
Example Usage: Dijkstra’s Algorithm
Dijkstra’s algorithm for finding the shortest path in a graph often requires a priority queue with custom ordering based on the distance from the source node. The lack of a built-in comparator in heapq makes the implementation less intuitive.
import heapq
def dijkstra(graph, start):
# Priority queue for nodes based on their current shortest distance
heap = []
heapq.heappush(heap, (0, start)) # (distance, node)
distances = {node: float('inf') for node in graph}
distances[start] = 0
while heap:
current_distance, current_node = heapq.heappop(heap)
# Skip processing if a shorter path has already been found
if current_distance > distances[current_node]:
continue
for neighbor, weight in graph[current_node].items():
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(heap, (distance, neighbor))
return distances
graph = {
'A': {'B': 1, 'C': 4},
'B': {'A': 1, 'C': 2, 'D': 5},
'C': {'A': 4, 'B': 2, 'D': 1},
'D': {'B': 5, 'C': 1}
}
distances = dijkstra(graph, 'A')
print(distances) # Output: {'A': 0, 'B': 1, 'C': 3, 'D': 4}
In this example:
The heap uses a tuple of (distance, node) to prioritize nodes based on their distance from the source.
Adding an optional comparator would eliminate the need for tuples and make the code cleaner and more intuitive.
The discussion on Discourse is quite messy, so I created this issue here to gather more focused feedback from developers involved in CPython development, especially the professors(I guess they will agree to it :)). If there’s agreement on this approach, I’m happy to work on implementing it, as I’ve already tested it by modifying the .py file.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
Discussion happened here
https://discuss.python.org/t/create-new-package-similar-to-heapq-but-be-able-to-pass-custom-comparator-through-a-constructor/72136