Jean-Louis Leroy’s “perfect typeid hash”
For his yomm2 library Jean-Louis Leroy developed an fast algorithm to find a pointer from the address of a std::type_info. Because he published it under the permissive BOOST license, we could use the algorithm for this library and refactor it for our purposes.
We will provide here a short walk through:
The goal of the hash index is, to compute as fast as possible a hash value from the std::type_info , that can be directly used as an index in an array (std::vector) containig the searched target, where all possible values for the *std::type_info * (=type_id from now on) are known.
This function shall be a multiplcation with mult and a right shift with shift:
index_t apply_formula(type_id type) const {
return (reinterpret_cast<std::size_t>(type) * mult) >> shift;
}
So the art is, to find perfect values for mult and shift in a given range of elements. Each elenent is a pair of type_id and a target, witch we wount to find as fast as possible for a given type_id. To make this a task easier, the generated target table leaves spare entries. The algorithm starts with litle spare and tries to find values for mult and shift, so that the result of apply_formula(type_id) is unique for every type_id. If this fails, the spare space is increased, and the search for mult and shift is repeated.
This table shows the initial spare_base value for some sizes of elements:
auto static inital_sparse_base(std::size_t element_count) {
std::size_t sparse_base = 1;
for (auto size = element_count * 5 / 4; size >>= 1;) ++sparse_base;
return sparse_base;
}
| size | inital_sparse_base |
|---|---|
| 10 | 4 |
| 100 | 7 |
| 1000 | 11 |
| 10000 | 14 |
| 100000 | 17 |
| 1000000 | 21 |
This table shows, how the spare_base relates to the table size. The values should be familar
static auto size_for_sparse_base(std::size_t sparse_base) {
return std::size_t(1) << sparse_base;
}
| sparse_base | table.size |
|---|---|
| 4 | 16 |
| 5 | 32 |
| 6 | 64 |
| 7 | 128 |
| 8 | 256 |
| 9 | 512 |
| 10 | 1024 |
| 11 | 2048 |
| 12 | 4096 |
| 13 | 8192 |
| 14 | 16384 |
| 15 | 32768 |
| 16 | 65536 |
| 17 | 131072 |
| 18 | 262144 |
| 19 | 524288 |
| 20 | 1048576 |
For a given sparse_base the shift is set to
hash_index.shift = 8 * sizeof(type_id) - sparse_base;
Then the algorithm trys to find via an random number generator a mult that maps each type_id to its own index.
static std::optional<hash_index> find_hash_for_sparse_base(
const auto& elements, std::size_t sparse_base) {
hash_index hash_index;
hash_index.shift = 8 * sizeof(type_id) - sparse_base;
hash_index.table.resize(size_for_sparse_base(sparse_base));
hash_index.length = 0;
std::default_random_engine random_engine(13081963);
std::uniform_int_distribution<std::uintptr_t> uniform_dist;
for (std::size_t attempts = 0; attempts < 100000; ++attempts)
if (can_hash_input_values(elements, hash_index,
(uniform_dist(random_engine) | 1)))
return hash_index;
return {};
}
The spare factor is increasd, if this fails 100000 times.
test test test