|
| 1 | +Hashing and associative data structures in Yosys |
| 2 | +------------------------------------------------ |
| 3 | + |
| 4 | +Container classes based on hashing |
| 5 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 6 | + |
| 7 | +Yosys uses ``dict<K, T>`` and ``pool<T>`` as main container classes. |
| 8 | +``dict<K, T>`` is essentially a replacement for ``std::unordered_map<K, T>`` |
| 9 | +and ``pool<T>`` is a replacement for ``std::unordered_set<T>``. |
| 10 | +The main characteristics are: |
| 11 | + |
| 12 | +* ``dict<K, T>`` and ``pool<T>`` are about 2x faster than the std containers |
| 13 | + (though this claim hasn't been verified for over 10 years) |
| 14 | + |
| 15 | +* references to elements in a ``dict<K, T>`` or ``pool<T>`` are invalidated by |
| 16 | + insert and remove operations (similar to ``std::vector<T>`` on ``push_back()``). |
| 17 | + |
| 18 | +* some iterators are invalidated by ``erase()``. specifically, iterators |
| 19 | + that have not passed the erased element yet are invalidated. (``erase()`` |
| 20 | + itself returns valid iterator to the next element.) |
| 21 | + |
| 22 | +* no iterators are invalidated by ``insert()``. elements are inserted at |
| 23 | + ``begin()``. i.e. only a new iterator that starts at ``begin()`` will see the |
| 24 | + inserted elements. |
| 25 | + |
| 26 | +* the method ``.count(key, iterator)`` is like ``.count(key)`` but only |
| 27 | + considers elements that can be reached via the iterator. |
| 28 | + |
| 29 | +* iterators can be compared. ``it1 < it2`` means that the position of ``t2`` |
| 30 | + can be reached via ``t1`` but not vice versa. |
| 31 | + |
| 32 | +* the method ``.sort()`` can be used to sort the elements in the container |
| 33 | + the container stays sorted until elements are added or removed. |
| 34 | + |
| 35 | +* ``dict<K, T>`` and ``pool<T>`` will have the same order of iteration across |
| 36 | + all compilers, standard libraries and architectures. |
| 37 | + |
| 38 | +In addition to ``dict<K, T>`` and ``pool<T>`` there is also an ``idict<K>`` that |
| 39 | +creates a bijective map from ``K`` to the integers. For example: |
| 40 | + |
| 41 | +:: |
| 42 | + |
| 43 | + idict<string, 42> si; |
| 44 | + log("%d\n", si("hello")); // will print 42 |
| 45 | + log("%d\n", si("world")); // will print 43 |
| 46 | + log("%d\n", si.at("world")); // will print 43 |
| 47 | + log("%d\n", si.at("dummy")); // will throw exception |
| 48 | + log("%s\n", si[42].c_str())); // will print hello |
| 49 | + log("%s\n", si[43].c_str())); // will print world |
| 50 | + log("%s\n", si[44].c_str())); // will throw exception |
| 51 | + |
| 52 | +It is not possible to remove elements from an idict. |
| 53 | + |
| 54 | +Finally ``mfp<K>`` implements a merge-find set data structure (aka. disjoint-set |
| 55 | +or union-find) over the type ``K`` ("mfp" = merge-find-promote). |
| 56 | + |
| 57 | +The hash function |
| 58 | +~~~~~~~~~~~~~~~~~ |
| 59 | + |
| 60 | +The hash function generally used in Yosys is the XOR version of DJB2: |
| 61 | + |
| 62 | +:: |
| 63 | + |
| 64 | + state = ((state << 5) + state) ^ value |
| 65 | + |
| 66 | +This is an old-school hash designed to hash ASCII characters. Yosys doesn't hash |
| 67 | +a lot of ASCII text, but it still happens to be a local optimum due to factors |
| 68 | +described later. |
| 69 | + |
| 70 | +Hash function quality is multi-faceted and highly dependent on what is being |
| 71 | +hashed. Yosys isn't concerned by any cryptographic qualities, instead the goal |
| 72 | +is minimizing total hashing collision risk given the data patterns within Yosys. |
| 73 | +In general, a good hash function typically folds values into a state accumulator |
| 74 | +with a mathematical function that is fast to compute and has some beneficial |
| 75 | +properties. One of these is the avalanche property, which demands that a small |
| 76 | +change such as flipping a bit or incrementing by one in the input produces a |
| 77 | +large, unpredictable change in the output. Additionally, the bit independence |
| 78 | +criterion states that any pair of output bits should change independently when |
| 79 | +any single input bit is inverted. These properties are important for avoiding |
| 80 | +hash collision on data patterns like the hash of a sequence not colliding with |
| 81 | +its permutation, not losing from the state the information added by hashing |
| 82 | +preceding elements, etc. |
| 83 | + |
| 84 | +DJB2 lacks these properties. Instead, since Yosys hashes large numbers of data |
| 85 | +structures composed of incrementing integer IDs, Yosys abuses the predictability |
| 86 | +of DJB2 to get lower hash collisions, with regular nature of the hashes |
| 87 | +surviving through the interaction with the "modulo prime" operations in the |
| 88 | +associative data structures. For example, some most common objects in Yosys are |
| 89 | +interned ``IdString``\ s of incrementing indices or ``SigBit``\ s with bit |
| 90 | +offsets into wire (represented by its unique ``IdString`` name) as the typical |
| 91 | +case. This is what makes DJB2 a local optimum. Additionally, the ADD version of |
| 92 | +DJB2 (like above but with addition instead of XOR) is used to this end for some |
| 93 | +types, abandoning the general pattern of folding values into a state value. |
| 94 | + |
| 95 | +Making a type hashable |
| 96 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 97 | + |
| 98 | +Let's first take a look at the external interface on a simplified level. |
| 99 | +Generally, to get the hash for ``T obj``, you would call the utility function |
| 100 | +``run_hash<T>(const T& obj)``, corresponding to ``hash_top_ops<T>::hash(obj)``, |
| 101 | +the default implementation of which is ``hash_ops<T>::hash_into(Hasher(), obj)``. |
| 102 | +``Hasher`` is the class actually implementing the hash function, hiding its |
| 103 | +initialized internal state, and passing it out on ``hash_t yield()`` with |
| 104 | +perhaps some finalization steps. |
| 105 | + |
| 106 | +``hash_ops<T>`` is the star of the show. By default it pulls the ``Hasher h`` |
| 107 | +through a ``Hasher T::hash_into(Hasher h)`` method. That's the method you have to |
| 108 | +implement to make a record (class or struct) type easily hashable with Yosys |
| 109 | +hashlib associative data structures. |
| 110 | + |
| 111 | +``hash_ops<T>`` is specialized for built-in types like ``int`` or ``bool`` and |
| 112 | +treats pointers the same as integers, so it doesn't dereference pointers. Since |
| 113 | +many RTLIL data structures like ``RTLIL::Wire`` carry their own unique index |
| 114 | +``Hasher::hash_t hashidx_;``, there are specializations for ``hash_ops<Wire*>`` |
| 115 | +and others in ``kernel/hashlib.h`` that actually dereference the pointers and |
| 116 | +call ``hash_into`` on the instances pointed to. |
| 117 | + |
| 118 | +``hash_ops<T>`` is also specialized for simple compound types like |
| 119 | +``std::pair<U>`` by calling hash_into in sequence on its members. For flexible |
| 120 | +size containers like ``std::vector<U>`` the size of the container is hashed |
| 121 | +first. That is also how implementing hashing for a custom record data type |
| 122 | +should be - unless there is strong reason to do otherwise, call ``h.eat(m)`` on |
| 123 | +the ``Hasher h`` you have received for each member in sequence and ``return |
| 124 | +h;``. If you do have a strong reason to do so, look at how |
| 125 | +``hash_top_ops<RTLIL::SigBit>`` is implemented in ``kernel/rtlil.h``. |
| 126 | + |
| 127 | +Porting plugins from the legacy interface |
| 128 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 129 | + |
| 130 | +Previously, the interface to implement hashing on custom types was just |
| 131 | +``unsigned int T::hash() const``. This meant hashes for members were computed |
| 132 | +independently and then ad-hoc combined with the hash function with some xorshift |
| 133 | +operations thrown in to mix bits together somewhat. A plugin can stay compatible |
| 134 | +with both versions prior and after the break by implementing both interfaces |
| 135 | +based on the existance and value of `YS_HASHING_VERSION`. |
| 136 | + |
| 137 | +.. code-block:: cpp |
| 138 | + :caption: Example hash compatibility wrapper |
| 139 | + :name: hash_plugin_compat |
| 140 | +
|
| 141 | + #ifndef YS_HASHING_VERSION |
| 142 | + unsigned int T::hash() const { |
| 143 | + return mkhash(a, b); |
| 144 | + } |
| 145 | + #elif YS_HASHING_VERSION == 1 |
| 146 | + Hasher T::hash_into(Hasher h) const { |
| 147 | + h.eat(a); |
| 148 | + h.eat(b); |
| 149 | + return h; |
| 150 | + } |
| 151 | + #else |
| 152 | + #error "Unsupported hashing interface" |
| 153 | + #endif |
| 154 | +
|
| 155 | +Feel free to contact Yosys maintainers with related issues. |
0 commit comments