Insight on how to improve the search performance of arrays

Fact on arrays:
Array access is (very) fast if access uses an array index
Fact on dictionaries:
Entries in a dictionary are looked up using its key
The problem with the ArrayMap implementation of the dictionary:
Entries of the dictionary are stored using an index that is unrelated to the key

Insight on how to improve the search performance of arrays

How to improve the search operation for a dictionary stored in an array:

Find a way to relate (= map) the key k to an index h of the array:
h = hashFunction( k )
Store the entry (k, v) at index h in the array

Example: how to store a map (dictionary) using hashing

This way of storing data into a array is called hashing

Hash functions

Hash function H( ):

Hash function is a function that maps a key k to a number h in the
range [0, M-1] where M = length of the array
I.e.: h = H(k) where h ∈ [0..(M-1)]
is consistent (= always gives the same answer for a given key)
is uniform (= function values are distributed "evenly" across [0..(M-1)])

A hash function is usually specified as the composition of 2 functions:

H(k) = H₂( H₁(k) )

where:

H₁(k) = the hash code function that returns the integer value of the key k
H₂(x) = a compression function that maps a value x uniformly to [0..(M-1)]

The hash code of a key

Fact:
All data inside a computer is stored as a binary number
The Object class in Java contains a hashCode() method that returns the data stored in the Object as an integer

Examples:

Integers (byte, short, int, long): stored as binary numbers Floating point numbers (float, double): stored as 2 binary numbers Characters: stored as binary numbers in Unicode

We can use the hashCode() method as our h₁(k) function

DEMO: 15-hashing/03-hashcode/HashCode.java

The compression function h₂(x)

Notice from the previous discussion on the hash code H₁(k):
H₁(k) uses the data stored in the key k to compute (deterministically) a hash code value

The compression function H₂(x) has 2 purpose:

Make sure that the return value is in the range [0..(M-1)] (where M = size of array)
Scatter/randomize the input value x = H₁(k), so that the value H₂(x) is evenly/uniformly distributed over the range [0..(M-1)]

Why use uniform randomization ?

The array element used to store the dictionary entry (k, v) is:
array index = H(k) = H₂( H₁( k ) )
Uniform randomization will minimize the likelihood/chance that 2 different keys being hashed to the same value (= array index) (a.k.a. collision)

Commonly used compression function

A commonly used compression function is the Multiply Add Divide (MAD) function:

H₂(x) = ( ( ax + b ) % p ) % M where p = a prime number ^^^^^^^^^^^^^^ randomizes

In my examples, I will use:
p = 109345121 a = 123 b = 456
Note:
p must be greater than M (i.e.: p > M) -- otherwise, you will not use the full capacity of the array

DEMO: 15-hashing/03-hashcode/HashValue.java

Summary on the hashing technique

How to improve the search operation for a dictionary stored in an array:

Compute the hash value for a given key k:

h = H₂( H₁( k ) ) H₁(k) = k.hashCode() H₂(x) = Math.abs( a*x + b ) ) % p % M

Store the entry (k, v) at index h in the array

Example: how to store a map (dictionary) using hashing

❮ ❯