Skip to content

Conversation

alexandre-daubois
Copy link
Owner

@alexandre-daubois alexandre-daubois commented Aug 28, 2025

====== PHP RFC: Add SplMultiMap Data Structure ======

  * Version: 1.0
  * Date: 2025-09-xx
  * Author: Alexandre Daubois, [email protected]
  * Status: Under Discussion
  * Implementation: https://github.com/php/php-src/pull/XXXXX

===== Introduction =====

This RFC proposes the addition of a new data structure: <php>SplMultiMap</php>.

A multimap is a data structure that allows multiple values to be associated with the same key, similar to implementations found in Google Guava (Java lib), Java, C++, Dart, Scala and other languages. This fills a common need in PHP applications where developers currently have to manage arrays of arrays manually.

Current approach:
<PHP>
<?php
// Manual multimap using arrays
$tags = [];
$tags['post1'] = $tags['post1'] ?? [];
$tags['post1'][] = 'php';
$tags['post1'][] = 'web';

// Retrieving values
$phpTags = $tags['post1']; // ['php', 'web']

// Removing a specific value requires manual iteration
$tags['post1'] = array_filter($tags['post1'], fn($v) => $v !== 'web');
if (empty($tags['post1'])) {
    unset($tags['post1']);
}
?>
</PHP>

With SplMultiMap:
<PHP>
<?php
$tags = new SplMultiMap();
$tags->put('post1', 'php');
$tags->put('post1', 'web');

// Retrieving values
$phpTags = $tags->get('post1'); // ['php', 'web']

$tags->remove('post1', 'web'); // Only removes 'web'
?>
</PHP>

===== Motivation =====

There are many common scenarios where multiple values need to be associated with a single key. Examples include:

1. **HTTP Headers**: Multiple values per header name (e.g., Set-Cookie)
2. **Form Data**: Multiple values for the same field name (e.g., checkboxes)
3. **Database Results**: Grouping related records by a common field
4. **Caching**: Multiple cache tags per cache entry
6. **Configuration**: Multiple values per configuration key

Currently, developers must implement this pattern manually using arrays, leading to verbose, error-prone code. This can also lead to inefficient implementations and inconsistent APIs across projects.

===== Proposal =====

Add <php>SplMultiMap</php> class to the SPL extension with the following API:

<PHP>
<?php
class SplMultiMap implements IteratorAggregate, Countable
{
    public function __construct()

    public function put(string $key, mixed $value): void
    public function putAll(string $key, iterable $values): void
    public function get(string $key): array
    public function remove(string $key, mixed $value): bool
    public function removeAll(string $key): bool
    public function replaceAll(string $key, iterable $values): void

    public function containsKey(string $key): bool
    public function containsValue(string $key, mixed $value): bool
    public function keys(): array
    public function values(): array

    public function isEmpty(): bool
    public function clear(): void

    public function count(): int
    public function getIterator(): Iterator
}
?>
</PHP>

==== Method Behavior ====

**put(string $key, mixed $value): void**
- Adds a value to the collection of values associated with the key
- If the key doesn't exist, creates a new collection
- Does not check for duplicate values
- Time complexity: O(1)

**putAll(string $key, iterable $values): void**
- Adds all values from an iterable to the collection of values associated with the key
- If the key doesn't exist, creates a new collection
- Does not check for duplicate values
- Preserves existing values for the key
- Time complexity: O(n) where n is the number of values in the iterable

**get(string $key): array**
- Returns an array of all values associated with the key
- Returns empty array if key doesn't exist
- Returned array is a copy
- Time complexity: O(n) where n is the number of values for the key

**remove(string $key, mixed $value): bool**
- Returns true if something was removed, false otherwise
- Uses strict comparison (===) for value matching
- Removes only one occurrence of the value if duplicates exist, respecting the principle of least astonishment
- Time complexity: O(n) where n is the number of values for the key

**removeAll(string $key): bool**
- Removes all values associated with the key
- Returns true if the key existed, false otherwise
- Time complexity: O(1)

**replaceAll(string $key, iterable $values): void**
- Removes all existing values associated with the key
- Adds all values from the iterable to the key
- Creates a new collection if the key doesn't exist
- Time complexity: O(n) where n is the number of values in the iterable

**containsKey(string $key): bool**
- Returns true if the key exists (has at least one value)
- Time complexity: O(1)

**containsValue(string $key, mixed $value): bool**
- Returns true if the key exists and contains the specific value
- Uses strict comparison (===) for value matching
- Time complexity: O(n) where n is the number of values for the key

**keys(): array**
- Returns array of all keys that have at least one value
- Order is undefined (depends on internal hash table iteration)
- Time complexity: O(k) where k is the number of keys

**values(): array**
- Returns array of all values from all keys (flattened)
- Order is undefined
- Time complexity: O(n) where n is the total number of values

**count(): int**
- Returns total number of values across all keys
- Time complexity: O(1)

**isEmpty(): bool**
- Returns true if no keys exist
- Time complexity: O(1)

**clear(): void**
- Removes all keys and values
- Time complexity: O(n) where n is total number of values

===== Detailed Examples =====

<PHP>
<?php

$mm = new SplMultiMap();

// Basic operations
$mm->put('colors', 'red');
$mm->put('colors', 'blue');
$mm->put('colors', 'red'); // Duplicates allowed
$mm->put('numbers', 42);
$mm->put('numbers', 84);
$mm->put('arrays', [1, 2]);

var_dump($mm->get('colors'));  // ['red', 'blue', 'red']
var_dump($mm->get('numbers')); // [42, 84]
var_dump($mm->get('missing')); // []

// Size operations
echo $mm->count();     // 5
echo count($mm);       // 5
var_dump($mm->isEmpty()); // false

// Query operations
var_dump($mm->containsKey('colors'));        // true
var_dump($mm->containsKey('missing'));       // false
var_dump($mm->containsValue('colors', 'red')); // true
var_dump($mm->containsValue('colors', 'green')); // false

var_dump($mm->keys());   // ['colors', 'numbers']
var_dump($mm->values()); // ['red', 'blue', 'red', 42, 84, [1, 2]]

// Removal operations
var_dump($mm->remove('colors', 'blue')); // true, removes only 'blue'
var_dump($mm->get('colors'));            // ['red', 'red']
var_dump($mm->remove('colors', 'yellow')); // false, doesn't exist
var_dump($mm->remove('colors', 'red'));  // true, removes one 'red'
var_dump($mm->get('colors'));            // ['red']

var_dump($mm->removeAll('numbers')); // true
var_dump($mm->get('numbers'));       // []
var_dump($mm->containsKey('numbers')); // false

// Iterator support
foreach ($mm as $key => $value) {
    echo "Key: $key, Value: $value\n";
}
// Output:
// Key: colors, Value: red
// Key: colors, Value: red

// Bulk operations
$mm->putAll('fruits', ['apple', 'banana', 'cherry']);
var_dump($mm->get('fruits')); // ['apple', 'banana', 'cherry']

$mm->putAll('fruits', ['date', 'elderberry']); // Adds to existing values
var_dump($mm->get('fruits')); // ['apple', 'banana', 'cherry', 'date', 'elderberry']

$mm->replaceAll('fruits', ['grape', 'honeydew']); // Replaces all existing values
var_dump($mm->get('fruits')); // ['grape', 'honeydew']

$mm->clear();
var_dump($mm->isEmpty()); // true
?>
</PHP>

===== Edge Cases and Behavior =====

**Key Type Restrictions:**
- Only string keys are supported
- Numeric strings are treated as strings, not converted to integers
- Empty string "" is a valid key
- Keys containing null bytes (\0) are supported

<PHP>
<?php
$mm = new SplMultiMap();
$mm->put('123', 'value');     // Key is string "123"
$mm->put('', 'empty key');    // Empty string key is valid
$mm->put("null\0byte", 'ok'); // Null bytes in keys are allowed
?>
</PHP>

**Value Type Handling:**
- All PHP types are supported as values (scalars, arrays, objects, resources, null)
- Values are stored by reference when appropriate (objects, arrays)
- Comparison uses strict equality (===)

<PHP>
<?php
$mm = new SplMultiMap();
$mm->put('mixed', null);
$mm->put('mixed', false);
$mm->put('mixed', 0);
$mm->put('mixed', '');
$mm->put('mixed', []);
$mm->put('mixed', new stdClass());

// All different values, strict comparison
var_dump($mm->count()); // 6
?>
</PHP>

**Iterator Behavior:**
- Iteration order is undefined (hash table dependent)
- Each value is yielded separately with its key
- Iterator is not reentrant
- By-reference iteration throws an Error

<PHP>
<?php
$mm = new SplMultiMap();
$mm->put('key', 'value1');
$mm->put('key', 'value2');

// Valid iteration
foreach ($mm as $k => $v) {
    echo "$k => $v\n";
}

// Invalid - throws Error
try {
    foreach ($mm as $k => &$v) {
        // Error: An iterator cannot be used with foreach by reference
    }
} catch (Error $e) {
    echo $e->getMessage();
}
?>
</PHP>

**Serialization:**
- SplMultiMap supports PHP's serialization mechanism
- All keys and values are serialized/unserialized properly
- Maintains object identity for value objects

**Cloning:**

The multimap itself is cloned, but internal data structures are not deeply cloned.

===== Backward Incompatible Changes =====

Declaring a class name SplMultiMap in the global namespace will fatal error. A search on GitHub shows no conflicts.

===== Proposed PHP Version(s) =====

PHP 8.6 (next minor version)

===== Voting Choices =====

<doodle title="Add SplMultiMap class as outlined in this RFC?" voteType="single" closed="true">
   * Yes
   * No  
</doodle>

===== Patches and Tests =====

Implementation available at: https://github.com/php/php-src/pull/XXXXX

===== References =====

[1] Google Guava Multimap: https://guava.dev/releases/23.0/api/docs/com/google/common/collect/Multimap.html
[2] C++ std::multimap: https://en.cppreference.com/w/cpp/container/multimap
[3] Java Multimap Interface: https://docs.oracle.com/javase/8/docs/api/java/util/Map.html

@alexandre-daubois alexandre-daubois force-pushed the spl-multi-map branch 2 times, most recently from faad3df to a2d5930 Compare August 29, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant